Table of Contents

Introduction

You can find all the C# code samples here: Embeddings GitHub repository

Welcome to another post that lays the foundation for further issues related to the creation of applications using AI. Before exploring more advanced topics, it’s worth stepping back and understanding one of the most fundamental concepts behind modern AI systems: embeddings. This article serves as an introduction to embeddings.

If you have ever learned object‑oriented programming, you know that everything starts with understanding what a class is. Without that concept, nothing else makes sense. Embeddings play a similar role in today’s AI landscape. You simply cannot think about building modern AI‑powered applications without a solid grasp of what embeddings are and why they matter.

In the following chapters we will go through the subsequent phases in a way that (I hope) will allow you to understand this topic better.

Basics

Let’s not waste time and start with a specific example!

Objective: Find similarity between the following terms:

  • Mars
  • Apollo 11
  • Neil Armstrong
  • Curiosity Rover

At first glance, these four items feel related but not in the same way. Mars is a planet. Apollo 11 is a mission. Neil Armstrong is an astronaut. Curiosity Rover is a robotic explorer (deployed on Mars).

Without any math, and based purely on intuition, we may conclude that these are the most similar items:

  • Mars – Curiosity Rover: Curiosity Rover is the mission sent specifically to explore Mars.
  • Apollo 11 – Neil Armstrong: Neil Armstrong is the astronaut most closely associated with the Apollo 11 mission.
  • Neil Armstrong – Apollo 11: Apollo 11 is the mission that defines Neil Armstrong’s historic achievement.
  • Curiosity Rover – Mars: Curiosity Rover’s purpose is to study the surface of Mars.

The question is: How can we express these relationships in a way a machine can understand?

Attribute based filtering

One way to address this challenge is to assign some specific attributes to each of these keywords, such as ["Planet", "Mission", "Astronaut"].

Now imagine assigning a simple set of attributes to each of our four entities:

private Dictionary<string, string[]> _items = new()
{
    ["Mars"] =              ["Planet"],
    ["Apollo 11"] =         ["Mission", "Astronaut"],
    ["Neil Armstrong"] =    ["Mission", "Astronaut"],
    ["Curiosity Rover"] =   ["Planet", "Mission"]
};

Next, let’s define a method that measures how similar a given keyword is to all the others by counting how many attributes they share:

private IReadOnlyCollection<(string Keyword, int Similarity)> GetTheMostSimilarItems(string searchKeyword)
{
    var searchKeywordAttributes = _items[searchKeyword];

    return _items
        .Where(kvp => !string.Equals(kvp.Key, searchKeyword, StringComparison.OrdinalIgnoreCase))
        .Select(kvp =>
        {
            var otherKeywordAttributes = kvp.Value;
            var intersectionCount = searchKeywordAttributes.Intersect(otherKeywordAttributes).Count();

            var percentage = (int)Math.Round(intersectionCount * 100.0 / NUMBER_OF_ATTRIBUTES);

            return (Keyword: kvp.Key, Similarity: percentage);
        })
        .OrderByDescending(x => x.Similarity)
        .ToList();
}

And here are the results produced by this method for each keyword:

Similar items to "Mars":
- Curiosity Rover: 33%
- Apollo 11: 0%
- Neil Armstrong: 0%

Similar items to "Apollo 11":
- Neil Armstrong: 67%
- Curiosity Rover: 33%
- Mars: 0%

Similar items to "Neil Armstrong":
- Apollo 11: 67%
- Curiosity Rover: 33%
- Mars: 0%

Similar items to "Curiosity Rover":
- Mars: 33%
- Apollo 11: 33%
- Neil Armstrong: 33%

If you compare these results with our earlier intuition you’ll notice they line up almost perfectly, with one interesting exception: Curiosity Rover, which ends up being equally similar to everyone. This mismatch is a great example of why simple attribute matching isn’t enough.

From Text to Binary Vector

Before we address the challenge above, let’s make one small adjustment to our previous method which is an adjustment that, as you’ll soon see, is essential for understanding embeddings.

The key change is to convert out array of string into a numerical representation – a primitive vector.

// [Planet, Mission, Astronaut]
private Dictionary<string, bool[]> _items = new()
{
    ["Mars"] =              [true,  false, false],
    ["Apollo 11"] =         [false, true,  true ],
    ["Neil Armstrong"] =    [false, true,  true ],
    ["Curiosity Rover"] =   [true,  true,  false]
};

Now, let’s adjust our method that finds the most similar items:

private IReadOnlyCollection<(string Keyword, int Similarity)> GetTheMostSimilarItems(string searchKeyword)
{
    var searchVector = _items[searchKeyword];

    return _items
        .Where(kvp => !string.Equals(kvp.Key, searchKeyword, StringComparison.OrdinalIgnoreCase))
        .Select(kvp =>
        {
            var otherVector = kvp.Value;

            var intersectionCount = 0;
            for (var i = 0; i < NUMBER_OF_DIMENSIONS; i++)
            {
                if (searchVector[i] && otherVector[i])
                {
                    intersectionCount++;
                }
            }

            var percentage = (int)Math.Round(intersectionCount * 100.0 / NUMBER_OF_DIMENSIONS);

            return (Keyword: kvp.Key, Similarity: percentage);
        })
        .OrderByDescending(x => x.Similarity)
        .ToList();
}

The results remain exactly the same as in the previous method so let me skip pasting them again.

The results remain the same but…

this change is significant because we have just created our first vectors and implemented our first vector‑based similarity search.

ℹ️ By the way, when you hear people talk about a vector, a vector embedding, or simply an embedding, they usually mean the same thing but you can also think about it like this:

  • A vector is just a list of numbers. It doesn’t mean anything on its own, it’s simply an array of floating‑point values (however in our example we used bool[] array but let’s turn a blind eye to it, we will fix it in a second!).
  • An embedding is a vector that has meaning. It’s produced by a model that has learned patterns, so the numbers capture relationships like similarity or context.

Okay, let’s address the challenge we noticed with the Curiosity Rover. As a careful reader, you might be wondering: can we nuance these assignments a bit to express how strongly a given aspect applies to a given keyword?

This is exactly where soft label encoding comes into play.

Soft label encoding

Let’s look at each keyword one more time and try to assign a value from 0 to 1, where 0 means no relation at all and 1 represents the strongest possible match. This is how it could look like:

// [Planet, Mission, Astronaut]
private Dictionary<string, double[]> _items = new()
{
    ["Mars"] =              [1.0, 0.2, 0.0],   // Mostly a planet, slightly related to missions
    ["Apollo 11"] =         [0.0, 1.0, 0.9],   // A mission, strongly tied to astronauts
    ["Neil Armstrong"] =    [0.0, 0.8, 1.0],   // An astronaut, strongly tied to missions
    ["Curiosity Rover"] =   [0.9, 1.0, 0.0]    // A rover on a planet, fully a mission asset
};

Now let’s adjust our method again:

private IReadOnlyCollection<(string Keyword, double Similarity)> GetTheMostSimilarItems(string searchKeyword)
{
    var searchVector = _items[searchKeyword];

    return _items
        .Where(kvp => !string.Equals(kvp.Key, searchKeyword, StringComparison.OrdinalIgnoreCase))
        .Select(kvp =>
        {
            var otherVector = kvp.Value;

            var dotProduct = 0.0;
            var sumSquaresA = 0.0;
            var sumSquaresB = 0.0;

            for (var i = 0; i < searchVector.Length; i++)
            {
                var a = searchVector[i];
                var b = otherVector[i];
                dotProduct += a * b;
                sumSquaresA += a * a;
                sumSquaresB += b * b;
            }

            var vectorALength = Math.Sqrt(sumSquaresA);
            var vectorBLength = Math.Sqrt(sumSquaresB);
            var denominator = vectorALength * vectorBLength;
            var cosine = Math.Round(denominator == 0.0 ? 0.0 : dotProduct / denominator, 2);

            return (Keyword: kvp.Key, Similarity: cosine);
        })
        .OrderByDescending(x => x.Similarity)
        .ToList();
}

As you can see, the main change is not in what we compare, but in how we compare it.

Previously, we were just counting how many attributes overlapped. Now, we use cosine similarity to measure how close two vectors are in this tiny 3‑dimensional space.

We will discuss cosine similarity in a second but let’s focus on the results first:

Similar items to "Mars":
- Curiosity Rover: 0.80
- Apollo 11: 0.15
- Neil Armstrong: 0.12

Similar items to "Apollo 11":
- Neil Armstrong: 0.99
- Curiosity Rover: 0.55
- Mars: 0.15

Similar items to "Neil Armstrong":
- Apollo 11: 0.99
- Curiosity Rover: 0.46
- Mars: 0.12

Similar items to "Curiosity Rover":
- Mars: 0.80
- Apollo 11: 0.55
- Neil Armstrong: 0.46

What you can already see from these numbers is that our soft label vectors capture relationships far better than simple overlaps ever could. Items that intuitively belong together like Apollo 11 and Neil Armstrong, naturally cluster with very high similarity, while loosely related items drift further apart. This is the first glimpse of how vector‑based representations start to reveal structure in our data long before we introduce real embeddings.

Similarity

When we represent items as vectors, we need a way to measure how close those vectors are. Here are the four you’ll encounter very often.

Cosine

3D plot showing relationships between different vectors.

Cosine similarity measures the angle between two vectors, and because of that it always returns a value between –1 and 1, with each part of that range having a clear interpretation:

  • 1 – the vectors point in exactly the same direction This means the items are as similar as they can possibly be. In practice, they share the same pattern of values, even if the magnitudes differ.
  • 0 – the vectors are perpendicular This means there is no meaningful relationship between them. Their patterns don’t align at all.
  • –1 – the vectors point in opposite directions This represents the strongest possible negative relationship.

ℹ️In our example, all values were positive, so none of the vectors could point in opposite directions. That’s why the cosine scores stayed between 0 and 1. But with real embeddings which often contain negative values, you can absolutely see cosine similarities drop below zero.

Other methods

There are other methods used as well, each offering a different way to compare vectors depending on the data and use case:

  • Dot Product – measures how strongly two vectors point in the same direction, with larger values indicating stronger alignment.
  • Euclidean Distance – captures the straight‑line distance between two vectors, where smaller distance means greater similarity.
  • Manhattan Distance – sums the absolute differences across dimensions, similar to navigating a city grid instead of taking a direct shortcut.
  • Hamming Distance – counts how many positions differ between two vectors, making it ideal for binary or categorical data.

Below are the methods available in the Azure AI Search service for instance:

Similarity metric options available in the Azure AI Search service.

Dimensions

Up to this point, we’ve been playing in a tidy little sandbox: a 3‑dimensional space where each axis represents one attribute, and every keyword becomes a vector living somewhere inside that cube. It’s a great mental model becauce it is clean, visual, intuitive. But as soon as you try to scale it beyond our tiny space‑themed example, the cracks start to show.

Let’s call out the biggest limitations of our current design:

  1. What if we need more than three features? Real concepts aren’t defined by just 3 attributes. Even something as simple as Mars could involve dozens of characteristics e.g.: physical, historical, cultural, scientific. Three axes simply can’t capture that richness.
  2. What if we want to describe many more keywords and not just space‑related ones? Our little coordinate system works only because we hand‑picked a tiny, coherent set of terms. The moment we mix in unrelated concepts like “coffee”, “democracy”, “neural networks”, “jazz” our 3D space collapses. There’s no room to represent everything meaningfully.
  3. How do we assign the values in the first place? We manually typed numbers into our vectors. That’s fine for a toy example, but completely impossible at scale. You can’t manually encode millions of words, sentences, or documents. And even if you tried, your choices would be subjective and inconsistent.

Our little 3‑dimensional example is great for building intuition, but real‑world language is far too rich to squeeze into just three axes. Modern embedding models don’t use 3 dimensions, they use hundreds or even thousands. Common sizes include:

  • 384 dimensions
  • 768 dimensions
  • 1536 dimensions
An embedding consisting of 1536 dimensions represented as floating-point numbers.
An embedding consisting of 1536 dimensions represented as floating-point numbers

At first glance, that sounds absurd. How can anything live in a 1536-dimensional space? And how could a human possibly understand what each dimension means?

Here’s the trick: we don’t need to interpret the dimensions because the model learns them automatically.

Model? What kind of model… you may wonder, and that brings us to the idea of an embedding model.

Embedding model

An embedding model is a neural network trained to convert raw data into vectors in a high-dimensional space. Instead of manually assigning numbers (like we did in our 3D example), the model learns how to position concepts by analyzing massive amounts of real-world examples and discovering patterns on its own.

Single‑modal embedding models

A single‑modal embedding model works with just one type of data: only text, only images, only audio. It learns the patterns within that specific area and builds a vector space where similar items end up close together. Because it focuses on a single input type, it can capture very detailed, modality‑specific relationships.

Multi‑modal embedding models

A multi‑modal embedding model learns a shared vector space for more than one type of data: for example, text and images. Instead of building separate spaces for each modality, the model aligns them so that related items from different sources end up close together.

A simple example:

  • the sentence “an astronaut walking on the Moon”
  • and an image of an astronaut on the Moon

…are mapped to vectors that sit near each other in the same high-dimensional space. This makes it possible to compare text to images directly, search images using text, or describe images using language.

There is yet another concept which I think is very important: an embedding space.

Embedding space

An embedding space is the high-dimensional coordinate system where all vectors from an embedding model live.

Each embedding model creates its own unique space – its own geometry, scale, and relationships. Even if two models output vectors of the same size, their spaces are not compatible.

❗❗❗That means you can’t embed some keywords with one model and others with a different model and expect meaningful comparisons.

The final example

First of all, we need a model that can create embeddings. I am going to use Microsoft Foundry for this. I will focus on Microsoft Foundry in later posts, but for now the definition below is sufficient:

Microsoft Foundry is a unified Azure Platform as a Service (PaaS) that provides production-ready AI infrastructure and tools so developers can focus on building applications instead of managing infrastructure.

Selecting embedding model in Microsoft Foundry.

As you can see in the screenshot, I filtered the available models to show only those capable of generating embeddings. I will choose text‑embedding‑3‑small for the sake of this exercise.

text-embedding-3-small model deployed in Microsoft Foundry.

We have the model deployed so let’s focus on the C# code now. First of all, we do not have any vectors attached to keywords yet. The whole point of this exercise is to populate these values automatically using the embedding model.

private Dictionary<string, float[]> _items = new()
{
    ["Mars"] = [],
    ["Apollo 11"] = [], 
    ["Neil Armstrong"] = [],
    ["Curiosity Rover"] = []
};

In order to do so we must first add Azure.AI.OpenAI and Azure.Identity NuGet packages. Once we have those installed, we can register an EmbeddingClient :

private readonly EmbeddingClient _embeddingClient = new AzureOpenAIClient(
    new Uri("https://azure-foundry-001-resource.cognitiveservices.azure.com"), 
    new DefaultAzureCredential())
    .GetEmbeddingClient(deploymentName: "text-embedding-3-small");

We need 3 things to register it:

  • URI of our Microsoft Foundry instance
  • DefaultAzureCredential class which allows me to authenticate without a need to provide any passwords using VisualStudioCredential token provider from the Azure.Identity library (using Azure AI User RBAC role).
    • you could see on the screenshot with our model deployed that there is a key. You could use it too by just replacing the DefaultAzureCredential with ApiKeyCredential class (but I hope I’ve convinced you to use RBAC instead after reading this post).
  • DeploymentName, which identifies the specific embedding model we want to use

Eveything is configured so let’s generate our first embeddings!

foreach (var keyword in _items.Keys)
{
    var response = await _embeddingClient.GenerateEmbeddingAsync(keyword);
    _items[keyword] = response.Value.ToFloats().ToArray();
}

Embeddings were generated so let’s look at the results:

Similar items to "Mars" (vector length: 1536):
- Curiosity Rover: 0.49
- Neil Armstrong: 0.34
- Apollo 11: 0.32

Similar items to "Apollo 11" (vector length: 1536):
- Neil Armstrong: 0.55
- Curiosity Rover: 0.39
- Mars: 0.32

Similar items to "Neil Armstrong" (vector length: 1536):
- Apollo 11: 0.55
- Curiosity Rover: 0.39
- Mars: 0.34

Similar items to "Curiosity Rover" (vector length: 1536):
- Mars: 0.49
- Apollo 11: 0.39
- Neil Armstrong: 0.39

As you can see, the most similar items match exactly what we expected from our little intuition exercise at the beginning of the post. I also printed the size of each vector to prove we are working now in 1536 dimensional space.

The second observation is that these values don’t differ as much as they did in the soft-label exercise. The reason is simple: all of these items ‘live’ relatively close to one another in that 1536‑dimensional space because they’re all related to the same subject.

You may also wonder… hmm… I understand calculating the angle between vectors in 3D space, but how can we measure the angle in a 1536‑dimensional space?

The key insight is that the number of dimensions doesn’t change how cosine similarity (and other methods) works. We use exactly the same mathematical formula whether the vectors have 384, 768, or 1536 dimensions. The only difference is the computational cost, because higher-dimensional vectors simply contain more numbers to multiply and sum.

I also deliberately included the implementation for that calculation in the soft‑label exercise to show that there’s no magic happening behind the scenes. In this example, however, we don’t need to translate the math formula into C# ourselves. The System.Numerics.Tensors NuGet package can handle cosine similarity efficiently for us (see line 10).

private IReadOnlyCollection<(string Keyword, double Similarity)> GetTheMostSimilarItems(string searchKeyword)
{
    var searchVector = _items[searchKeyword];

    return _items
        .Where(kvp => !string.Equals(kvp.Key, searchKeyword, StringComparison.OrdinalIgnoreCase))
        .Select(kvp =>
        {
            var otherVector = kvp.Value;
            var cosine = TensorPrimitives.CosineSimilarity(searchVector.AsSpan(), otherVector.AsSpan());

            return (Keyword: kvp.Key, Similarity: Math.Round(cosine, 2));
        })
        .OrderByDescending(x => x.Similarity)
        .ToList();
}

Let me finish this post by showing you one more thing. I prepared four sentences, none of which contain any of the four keywords we’ve been using. Let’s see whether the most similar item still matches the one we have in mind.

Sentence: "the planet next to Earth but not Venus"
- Mars: 0.46
- Curiosity Rover: 0.30
- Neil Armstrong: 0.26
- Apollo 11: 0.24

Sentence: "the first mission that carried humans to the Moon"
- Apollo 11: 0.53
- Neil Armstrong: 0.51
- Curiosity Rover: 0.34
- Mars: 0.28

Sentence: "the first person to step onto the lunar surface"
- Neil Armstrong: 0.59
- Apollo 11: 0.49
- Curiosity Rover: 0.36
- Mars: 0.30

Sentence: "the NASA robot exploring Mars since 2012"
- Curiosity Rover: 0.58
- Mars: 0.50
- Neil Armstrong: 0.34
- Apollo 11: 0.32

As you can see, our model was able to identify the most similar keyword correctly.

Summary

Embeddings may seem abstract at first, but as you’ve seen throughout this post, they offer a powerful and intuitive way to capture meaning, structure, and relationships in data, far beyond what handcrafted attributes or simple rules can achieve. By moving from tiny 3‑dimensional examples to real 1536‑dimensional vectors generated by an embedding model, we’ve crossed the bridge from intuition to practical application.

In the next posts, we’ll build on this foundation and explore how embeddings unlock semantic search, retrieval‑augmented generation (RAG), and many other capabilities that make modern AI systems feel intelligent

Staying on the topic of Neil Armstrong and the Apollo 11 mission, let’s end this post with a statement inspired by a hero:

It’s a small array of numbers, but a giant leap in how your applications understand meaning.

If you’ve reached this point, thank you very much. I hope my explanation helps you understand this concept better.

See you in the next post!

Categorized in:

Data Intelligence,