Table of Contents

Introduction

You can find all the C# code samples here: Embeddings GitHub Repository

Before we delve into the topic of vectorizers in Azure AI Search I encourage you to read these posts first to get the most out of the information provided here:

If you followed the previous posts, an idea might have already crossed your mind…

If every text-based vector search requires converting the input into an embedding, couldn’t the vector database provider handle that step for us? It’s such a repetitive task that offloading it would make our C# code cleaner and our architecture simpler.

Ladies and Gentleman…

it’s time to introduce the star of today’s spectacle: Vectorizers in Azure AI Search!

Key Insights

What Vectorizers Are

I think we’ve already answered the WHAT question, so let’s skip the academic formulas and use a simple screenshot instead:

So… vectorizer is the component that lets us get rid of the code highlighted in the screenshot. Instead of using an embedding model which generates embeddings we can delegate this task to Azure AI Search service. That’s it! Let’s continue.

The Vectorizer Types Available in Azure AI Search

Available vectorizer kinds in Azure AI Search service.

As you can see, currently there are 4 options (check the docs for more details) possible when using vectorizers in Azure AI Search service:

  • Azure OpenAI (kind azureOpenAI) – choose this when your embedding model is deployed to either an Azure OpenAI resource or Microsoft Foundry.
  • Azure AI Foundry (kind aml) – choose this when you want to use an Azure Machine Learning (AML) endpoint or one of the supported embedding models from the Cohere-embed embedding models family deployed in Microsoft Foundry.
  • Custom Web API (kind customWebApi) – choose this if none of the other options suit your needs or when you want full control over the vectorization process or you have a vectorizer API endpoint you want to reuse for various workflows (for whatever reason)
  • AI Services Vision (aiServicesVision) – use this when you want to vectorize not only text but also images (via imageUrl or imageBinary vector queries) using Azure Vision service.

Because my text embedding model is deployed in Microsoft Foundry, I will choose the azureOpenAI option.

I am using the text-embedding-3-small embedding model which is one of the 3 types of embedding models which are supported by the azureOpenAI vectorizer, the other 2 are text-embedding-3-large with 3072 dimensions supported and text-embedding-ada-002 with 1536 dimensions supported.

❗❗❗ Please remember that we must use the same model for the vectorizer and the data indexing process, which simply means they must share the same embedding space.

How Vectorizers Integrate with Vector Search Profiles

Ok, we have a new vectorizer added and this is how it looks in the Azure Portal.

Vectorizer added to a vector profile in Azure AI Search service.

Let’s analyze what has changed in our JSON index definition.

{
  "vectorizers": [
        {
          "name": "openai-vectorizer",
          "kind": "azureOpenAI",
          "azureOpenAIParameters": {
            "resourceUri": "https://azure-foundry-001-resource.cognitiveservices.azure.com",
            "deploymentId": "text-embedding-3-small",
            "modelName": "text-embedding-3-small"
          }
        }
  ]
}

Let’s analyze each of the fields in our vectorizer definition:

  • name – we can treat this as an identifier that we can later use to link the vectorizer to a specific vector profile ("vectorizer": "openai-vectorizer")
  • kind – one of the 4 available vectorizer types we have discussed already
  • azureOpenAIParamaters – a set of parameters which are specific to a given vectorizer kind

There are also 2 important fields I deliberately did not configure:

  • apiKey – used to access the model with the master key (and as you know from reading this post, it’s not the ideal option)
  • authIdentity – you should use this when you decide to use managed identities (my sincere congratulations on making the right choice!). This field should specify the identity you want to use when working with a user-assigned managed identity. For a system-managed identity (which I’m using in this example), simply leave this field empty and make sure apiKey is empty as well.

RBAC role assignment

Now let’s pick the right RBAC role to assign so that our Azure AI Search service can call Microsoft Foundry to generate embeddings.

In the C# examples we’ve been using so far, we relied on the Azure AI User RBAC role to generate embeddings, so we’ll assign the same role to the system‑managed identity of our Azure AI Search service.

RBAC assignment of Azure AI User role to a system-managed identity of an Azure AI Search service instance.

Now that we have the RBAC role assigned we can focus on the C# code.

C# sample – VectorizableTextQuery

Let’s analyze what using vectorizers in Azure AI Search means for our code.

private async Task<IReadOnlyCollection<AiSearchVectorSearchResult>> FindSimilarItemsAsync(string keyword, int topK)
{
    var searchOptions = new SearchOptions
    {
        VectorSearch = new VectorSearchOptions
        {
            Queries =
            {
                new VectorizableTextQuery(keyword)
                {
                    KNearestNeighborsCount = topK,
                    Fields = { nameof(AiSearchVectorSearchDocumentModel.Vector) }
                }
            },
        },
        Select =
        {
            nameof(AiSearchVectorSearchDocumentModel.id),
            nameof(AiSearchVectorSearchDocumentModel.Phrase),
            nameof(AiSearchVectorSearchDocumentModel.Tags)
        },
        Size = topK
    };

    var response = await _searchClient.SearchAsync<AiSearchVectorSearchDocumentModel>(searchText: null, searchOptions);

    var results = new List<AiSearchVectorSearchResult>(capacity: topK);
    await foreach (var searchResult in response.Value.GetResultsAsync())
    {
        results.Add(new AiSearchVectorSearchResult
        {
            id = searchResult.Document.id,
            Phrase = searchResult.Document.Phrase,
            Tags = searchResult.Document.Tags,
            SimilarityScore = searchResult.Score.GetValueOrDefault()
        });
    }

    return results;
}

If we compare this to the example from the previous post, we’ll see that the logic for generating embeddings using _embeddingClient has disappeared. This is because we replaced VectorizedQuery (notice the past tense), which accepted a vector, with VectorizableTextQuery (line 9), which accepts a string to be vectorized by the vectorizer on the Azure AI Search service side.

This is how the HTTP query generated by the _searchClient looks like:

{
    "select": "id,Phrase,Tags",
    "vectorQueries": [
        {
            "kind": "text",
            "text": "Mars|Apollo 11|Neil Armstrong|Curiosity Rover",
            "fields": "Vector",
            "k": 5
        }
    ]
}

The most relevant part of this, in the context of this post, is "kind": "text", which tells Azure AI Search that it should use the vectorizer.

You can explore the complete sample for this post in the GitHub repository.

Vectorizer vs integrated vectorization

Let’s clarify one more thing that may cause confusion. There is a difference between a vectorizer and integrated vectorization when speaking about these capabilities in Azure AI Search:

  • Vectorizera component that automatically converts a query into an embedding (the topic of this blog post).
  • Integrated VectorizationA pattern applied to both the indexing phase and the query phase. During the indexing phase, we can use a skillset that automatically converts the source data (usually split into chunks using the Microsoft.Skills.Text.SplitSkill skill) into embeddings, and the query phase is what we are discussing here. The key point is that both processes must use the same embedding space, which means the same embedding model.

We will discuss integrated vectorization in a separate blog post so for now just remember about that difference.

Summary

I hope the topic of vectorizers in Azure AI Search feels much more familiar now. We’ve walked through how vectorizers simplify our C# code, how they replace manual embedding generation, and why using a consistent embedding space across indexing and querying is essential.

Thanks for reading and see you in the next post!

Categorized in:

AI Services,