Table of Contents

Introduction

You can find all the C# code samples here: RAG GitHub Repository

Before reading this post about Semantic Ranking in Azure AI Search, I recommend reading:

Hello all!

In the previous post about Hybrid Search in RAG using Azure AI Search, I mentioned that Hybrid Search delivers the best results as the default option for most scenarios. However, I also noted at the end that there is one more technique we can add to this default setup.

That technique is Semantic Ranking in Azure AI Search, which uses a cross‑encoder behind the scenes.

In today’s blog post, I would like to explain why it is such a valuable option, share a bit of theory, show how to configure it, and clarify why the maximum number of documents that can go through this logic is limited to 50.

By the end, you will know not only where to enable it but also how it works behind the scenes.

Let’s dive in!

Bi-Encoder vs Cross-Encoder

Bi-Encoder vs Cross-Encoder

First of all, let me explain what a Cross-Encoder is and how it differs from a Bi-Encoder. I know it might sound a bit “academic,” but as you will see soon, having even a basic understanding of these concepts will help connect the dots later.

A Bi-Encoder is the type of model used in most vector-based retrieval systems. It independently encodes the query (a.k.a. a query vector) and each document into embeddings (for example, models like text-embedding-ada-002). Because the document embeddings are computed ahead of time and stored in the index, the system only needs to encode the query at runtime. This makes the approach extremely fast and scalable. However, the downside is that the model never sees the query and document together, so it cannot fully capture their interaction. This sometimes leads to suboptimal ranking, especially when the query is ambiguous or when the retrieved documents are very similar to each other.

A Cross-Encoder, on the other hand, takes the query and a document as a pair and processes them jointly. Instead of producing embeddings, it outputs a relevance score ("@search.rerankerScore"). Because the model can attend to both texts at the same time, it understands the relationship between them much more deeply. The trade-off is performance: a Cross-Encoder is far more accurate, but also much more expensive to run.

Hmm… so which one should you choose, you may ask. Will it be another hybrid-like setup (similar to BM25 + Vector)? No, it won’t be a hybrid, but rather a waterfall style flow!

L1 vs L2 retrieval phase

Let’s say you have a very complicated regex that you want to run against a set of 1,000 documents. Only some of those documents have titles starting with "ABC_12345_XYZ", and those are the only ones that matter. If you care about speed and memory consumption, the first thing you would do is narrow the dataset by selecting only the documents whose titles start with "ABC_12345_XYZ". Only then would you apply the expensive regex, right?

This is exactly the same pattern Azure AI Search uses when semantic ranking is enabled.

Once the initial filtering (L1 retrieval phase) is done, Azure AI Search moves to the L2 retrieval phase, where the more expensive and more accurate logic is applied. In our analogy, this is the moment when you finally run the complicated regex but only on the small subset of documents that actually matter.

Simple 2 Phase Retrieval (L1 + L2)

In Azure AI Search, the L1 phase is handled by BM25 + Vector Search (or pure BM25 or pure Vector Search), which quickly retrieves the top candidates (and they are already quite accurate). Then, in the L2 phase, the Cross-Encoder steps in and re-ranks those candidates using a much deeper understanding of the query-document relationship.

This waterfall-style flow ensures that:

  • the fast retrieval stage handles the full index
  • the expensive semantic ranking stage only processes a limited number of documents (max. 50)

And this is exactly why Azure AI Search limits semantic ranking to a maximum of 50 documents, the Cross-Encoder is powerful, but it’s computationally heavy, so it must operate on a small, carefully selected subset.

But that also means that if your “ideal” document isn’t within these top 50 results, the Cross-Encoder won’t magically pull it in. That is simply outside its scope, it can only re-rank what the L1 stage has already retrieved.

Microsoft Bing

The Cross-Encoder used in Azure AI Search is based on deep learning models that are also used within Microsoft Bing. As you can imagine, Bing has access to massive amounts of search-related data, which allows these models to be trained to recognize semantic similarity between a query and a document with very high accuracy.

Ok…

It’s time to step out of the lecture hall and focus on how to actually configure Semantic Ranking in Azure AI Search.

Semantic Ranking in Azure AI Search

First of all, we need to enable semantic ranking in Azure AI Search. When you create a brand new Azure AI Search instance, it might not be enabled automatically (some workloads create a semantic configuration by default).

Semantic search is not enabled for this service.

To enable it, we have to go to the Premium Features tab and enable it there.

Enable Semantic Ranker in Azure AI Search.

Extra Features Beyond Semantic Ranking

Semantic ranking doesn’t just improve the accuracy of your search results. It also unlocks three additional capabilities that can significantly enhance the search experience and the quality of the results returned to your users.

Semantic Captions

Semantic Ranking in Azure AI Search: a response with the 'captions' section

Semantic captions extract short, verbatim snippets from your documents that best summarize their content. They’re especially useful when your fields are long or dense, giving users a quick, meaningful preview of what each result is about. These captions can also include semantic highlights, which emphasize the most relevant phrases so users can immediately see why a result matches their query.

Semantic Answers

Semantic Ranking in Azure AI Search: a response with the 'answers' section

When a query looks like a question, semantic ranking can optionally return a direct answer extracted from your documents. This works only when the content contains text that resembles an answer, but when it does, it provides a much more natural and helpful search experience.

Query Rewrite

If enabled, query rewrite expands the original query into multiple semantically similar variants. This helps correct typos, spelling mistakes, or awkward phrasing. The rewritten queries are executed first, scored using BM25, Vector or Hybrid (BM25 + Vector), then re‑ranked by the semantic ranker, improving recall without requiring the user to phrase their query perfectly.

Query rewriting deserves a separate post, so let’s not go into too much detail about that technique here.

Semantic Configuration definition

Before we create a new semantic configuration, it’s worth remembering a few limitations.

Semantic ranking can be applied only to the following data types (in short, it works only with text):

  • Edm.String
  • Collection(Edm.String)
  • String subfields of Edm.ComplexType

In addition, any field of the above types must be marked as searchable: true and retrievable: true.

I think we can both agree that these constraints make perfect sense once you understand how Semantic Ranking in Azure AI Search works.

Let’s create a new Semantic Configuration now (you can define up to 100 semantic configuration within a single index + it can be added/updated at any time).

Semantic Config in Azure AI Search

As you can see in the screenshot above, a semantic configuration consists of three sections:

  • Title (a single value) – a short string, ideally under 25 words. This can be a document title, product name, or any concise identifier.
  • Content fields (multiple values, ordered by priority) – longer natural-language text such as descriptions or document bodies. These fields provide most of the semantic context.
  • Keyword fields (multiple values, ordered by priority) – short descriptors like tags, categories, or labels that help refine relevance.

Below is a JSON object representing a startship object from my tiny data source so you can see which fields I assigned to a given section:

  {
    "Id": "gv-yacht-010",
    "Title": "Horizon-Class Private Yacht",
    "ProductId": "GV-55-FOXTROT",
    "Category": "Luxury",
    "Overview": "A high-end private vessel designed for elite travelers seeking speed and exclusivity in their transit.",
    "Specifications": {
      "TopSpeed": "Warp 3.5",
      "Fuel": "Dark Matter Injectors",
      "Seats": 10,
      "ArtificialGravity": true
    },
    "Features": [
      "Custom interior",
      "Point-to-point warp drive"
    ],
    "Notes": "Often used by executives for rapid travel between planetary estates."
  }

Each of these sections also has its own token limits (the current total limit of tokens is 2,048), which determine how much text can be included in the summary string that Azure AI Search generates for each document:

Semantic fieldToken limit
Title128 tokens
Keyword fields128 tokens
Content fieldsremaining tokens (up to the overall limit)

This summary string is then sent to the semantic ranker for scoring, and to machine reading comprehension models for generating captions and answers.

This is how such a semantic config looks like in the index json definition.

Semantic config in Azure AI Search, JSON representation

Demo

ℹ️ I will be showing all the examples based on a data source that contains 10 records describing various starships. You can find it in the GitHub repository.

Secure Access to Microsoft Foundry and Azure AI Search using RBAC

public class ReRankingRAGExample
{
    private readonly EmbeddingClient _embeddingClient;
    private readonly ChatClient _chatClient;
    private readonly SearchClient _searchClient;

    public ReRankingRAGExample()
    {
        var credential = new DefaultAzureCredential();

        _searchClient = new SearchClient(
            new Uri(Environment.GetEnvironmentVariable("AZURE_AI_SEARCH_URI")!),
            indexName: Environment.GetEnvironmentVariable("AZURE_AI_SEARCH_INDEX")!,
            credential);

        var openAiClient = new AzureOpenAIClient(
            new Uri(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_CLIENT_URI")!),
            credential);

        _embeddingClient = openAiClient.GetEmbeddingClient(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_EMBEDDING_CLIENT_DEPLOYMENT_NAME")!);
        _chatClient = openAiClient.GetChatClient(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_EMBEDDING_CHAT_CLIENT_DEPLOYMENT_NAME")!);
   }

Let’s start with all the client classes I will be using in this example:

  • EmbeddingClient (Azure.AI.OpenAI NuGet) – used to generate embeddings with an embedding model deployed in Microsoft Foundry (text-embedding-ada-002).
  • ChatClient (Azure.AI.OpenAI NuGet) – used to interact with an LLM deployed in Microsoft Foundry (gpt-4.1-mini).
  • SearchClient (Azure.Search.Documents NuGet) – used to push data to the index and then perform search operations.

As you may notice, I am not using any API keys. Instead, I rely on the DefaultAzureCredential class from the Azure.Identity NuGet package. This is the most secure method of establishing authenticated connections in Azure, and I will encourage you to use it whenever possible (you can read more about it here).

When I run this sample locally, the DefaultAzureCredential class invokes VisualStudioCredential behind the scenes, which can obtain a security token because I am signed in under Tools > Options > Azure Service Authentication. When this code is deployed, the same class will typically use ManagedIdentityCredential instead.

Of course, the fact that DefaultAzureCredential can obtain a security token is not enough, we also need to assign the appropriate RBAC roles. In this example, I use two RBAC roles:

  • Azure AI User – allows interaction with models deployed in Microsoft Foundry.
  • Search Index Data Contributor – allows pushing data to an index and performing search operations.

Performing Semantic Search

var searchOptions = new SearchOptions()
{
    QueryType = selectedSearchMethod.Contains("Semantic") ? SearchQueryType.Semantic : SearchQueryType.Simple,
    Size = 10,
    IncludeTotalCount = true,
    Select =
    {
        nameof(StarshipSemanticSearchDocumentResult.Id),
        nameof(StarshipSemanticSearchDocumentResult.Title),
        nameof(StarshipSemanticSearchDocumentResult.Category),
        nameof(StarshipSemanticSearchDocumentResult.Overview),
        nameof(StarshipSemanticSearchDocumentResult.Features)
    }
};

if (selectedSearchMethod.Contains("Semantic"))
{
    searchOptions.SemanticSearch = new SemanticSearchOptions
    {
        SemanticQuery = question,
        QueryAnswer = new QueryAnswer(QueryAnswerType.Extractive),
        QueryCaption = new QueryCaption(QueryCaptionType.Extractive),
    };
}

In order to perform a Semantic Ranking in Azure AI Search, we create SemanticSearchOptions and set the QueryType to Semantic. As you can see, I also specify that I want the results to include both Answers and Captions (Extractive).

I also specify SemanticQuery explicitly, even though it is not necessary in this example, but it’s something worth remembering. There are two ways to “tell” Azure AI Search that we want to invoke Semantic Reranking.

  1. Set QueryType to Semantic and send the query using the search parameter (the first argument in the _searchClient.SearchAsync method).
  2. Use a Full-Text Lucene query together with semantic ranking. In this case, you set QueryType = Full, but you must provide a SemanticQuery.

In other words, SemanticQuery gives you the ability to send one query for the L1 phase and a different one for the L2 phase. This can be very useful when you want full control over how the initial retrieval behaves versus how the semantic reranker interprets the intent.

{
  "search": "eliteee~2 speed", // L1 phase query (BM25)
  "count": true,
  "queryType": "full", // Full-Text Lucene
  "semanticQuery": "fast and prestige", // L2 Semantic Ranking
  "semanticConfiguration": "default_semantic_config",
  "captions": "extractive",
  "answers": "extractive",
  "queryLanguage": "en-us"
}

In the C# code, I also do not specify SemanticConfigurationName because in my index definition I assigned a default semantic configuration.

{
	"semantic": {
		"defaultConfiguration": "default_semantic_config",
	}
}

Then it’s just a matter of invoking the SearchAsync method.

var response = await _searchClient.SearchAsync<StarshipSemanticSearchDocumentResult>(question, searchOptions);

Results

When I invoke “Hybrid (L1) + Semantic (L2)” search for such a question “fast and prestige“, then these results are returned:

Hybrid + Semantic, invocation results

Let’s now focus on the Score (@search.score) and the ReRanker Score (@search.rerankerScore). This example clearly shows that the L2 reranker changed the ordering produced by L1. The L1 scores come from the RRF fusion of phase 1 results (Full-Text Search BM25 + Vector). As you can see, the first and second items swapped positions once the L2 semantic reranker was applied.

Please also note the highlighted text in yellow, which serves as a hint to the user by showing the exact part of the verbatim content where the match was found. This can be very helpful when displaying answers and captions, as it provides immediate context and helps users understand why a particular result was returned.

Ok, let’s carry on.

Now let me ask the same question, but this time I will also use a scoring profile. I want to do this to show you the third score in action, which is the ReRanker Boosted Score. (@search.rerankerBoostedScore).

This is how that scoring profile looks like (you have to use boosting functions, if you use these ordinary weighted fields it won’t work).

Example of a scoring profile with the 'tag' function type

Then in C# code I specify that scoring profile and pass the scoring parameters:

  if (selectedSearchMethod.Contains("Scoring Profile"))
  {
      searchOptions.ScoringProfile = "boost_category_field";
      searchOptions.ScoringParameters.Add("tagBoostCategory-Luxury");
  }

I hardcoded the Luxury word but in a normal application that part of the scoring parameter will be of course fully dynamic. Below are the results of such an invocation.

Hybrid Search + Semantic Ranking + Scoring Profile in Azure AI Search

Now it boosted the two ships with Category = Luxury, exactly as expected, which you can see by looking at the ReRanker Boosted Score.

This should also make it clearer why, when defining a semantic configuration (look at that JSON definition I pasted above), we set "rankingOrder": "BoostedRerankerScore". This is the default option, and the alternative is "RerankerScore".

When BoostedRerankerScore is selected, the scoring profile is applied twice:

  • during the L1 phase of query execution
  • after the L2 phase, restoring the boost contribution to the results

Summary

I will be glad if after reading this post a few things will become more understandable to you.

I hope, I have convinced you that Hybrid + Semantic Reranking is a powerful set up which might be considered the best default option for most RAG scenarios… at least in 2026 🙂

Thanks for reading.

See you in the next post!

Categorized in:

RAG,