Table of Contents

Introduction

You can find all the C# code samples here: RAG GitHub Repository

Before reading this post about Multi-Query Retrieval for RAG its worth to look at these posts too:

Hey Everyone!

You build a RAG solution. You apply vector search, combine it with BM25, and even add a cross-encoder step on top. You test your solution, and everything seems to work very well.

Then you publish your application and let real users try it…

Suddenly, you notice that the accuracy of the results isn’t as high as expected. You verify everything again, try to fine-tune your vector search, and apply additional BM25 features, but none of it leads to a noticeable improvement when reviewing the production stats.

So you check the responses where users clicked the “Unlike” button (such a useful feature for analysis!) and start reviewing everything carefully.

And then the bingo moment comes!

Some user questions are so vague that your search pipeline struggles to find relevant results. Or, to put it differently: it does find results for the text they entered, but not for what they actually meant.

So if we assume that user queries are often imperfect or a bit vague, can we do something about it?

Yes!

That’s exactly where Multi-Query Retrieval for RAG can help.

What Is Multi-Query Retrieval for RAG?

Multi-Query for RAG pattern

Multi-Query Retrieval for RAG is a technique where, instead of relying on a single user query, your system generates several alternative versions of that query. These variations capture different angles, phrasings, or interpretations of what the user might actually mean.

But why does it help?

Because real users rarely write perfect, well-structured queries. They type fast, skip context, or assume the system “knows what they mean.” By expanding the original query into multiple semantically related ones, you give your retrieval pipeline a much better chance of finding the right documents.

In practice, this means your RAG system doesn’t depend on a single query. If one query misses the mark, another may find the relevant information. When combined and ranked, these results significantly improve recall and overall answer quality.

Of course… with some additional costs! We will get to pros and cons a little later. For now let’s focus on the multi-query retrieval for RAG pattern!

Query Expansion Techniques for RAG

If our goal is to “create N queries based on the original user query,” the next question is simple: how do we actually generate them?

Modern RAG systems rely on LLMs to produce several types of expansions. Below are the main LLM powered rewriting techniques.

Paraphrasing the Original Query

The LLM rewrites the query using different wording while preserving intent.
This helps retrieve documents that use alternative phrasing or synonyms.

Semantic Expansion

Instead of just rephrasing, the model adds related concepts or clarifying details. For example, a vague query like “renewal process” might expand into:

  • “How do I renew my subscription?”
  • “Steps to extend an existing plan”
  • “Subscription renewal requirements”

Domain Specific Reformulations

If the system knows the domain (e.g., interplanetary travel for Galactic Voyages), the LLM can rewrite the query using domain terminology.

Multi-Query – strategy

This is the strategy that uses the rewrites above. Instead of relying on a single reformulated query, the system:

  • Generates several diverse rewrites
  • Runs each rewrite through the search pipeline
  • Combines and ranks the results to improve recall

Multi-query retrieval isn’t a separate rewriting technique, it’s the retrieval strategy that leverages paraphrasing, semantic expansion, and domain-specific reformulation to produce a richer, more complete set of search results.

LLM (or SLM?) for Query Rewrites

One additional thing, before we start implementing it step by step.

Query rewriting doesn’t require the same powerful model you use for generating full RAG answers.

In fact, it works best with a smaller, fast SLM (SLM > Small Language Model) that’s optimized for low latency and strict instruction following. As long as the model can reliably paraphrase, expand, and reformulate queries without hallucinating, it’s a good fit. The heavy lifting happens later in the retrieval and answer-generation stages but the rewrite step just needs to be quick, consistent, and predictable.

Implementing Custom Multi-Query Retrieval

Before I show you how to leverage Multi-Query Retrieval for RAG using Azure AI Search and the query rewrites feature, I think it’s worth implementing such a search pipeline step by step so it’s clear how it works.

This is a domain specific system prompt I use:

private string GetDomainSpecificQueryRewriteSystemPrompt(int count = 3)
{
    return $$"""
        You are a query rewriting assistant for an FAQ system used by Galactic Voyages, a company that organizes trips to various planets and destinations across the galaxy.
        Your task is to take a user’s original question and generate {{count}} alternative versions of that query.
        Each rewritten query should preserve the user’s intent while exploring different phrasings, clarifications, or interpretations that might help retrieve more relevant FAQ answers.
        Return the rewrites in a structured JSON array called "Rewrites".

        The JSON must follow this exact structure:
        {
            "Rewrites": [
                "rewrite 1",
                "rewrite 2",
                "rewrite 3"
            ]
        }

        The "Rewrites" array must contain exactly {{count}} items.
        """;
}

When user asks such a question “fast and prestige” then it generates such query rewrites:

Multi-query pattern with rewritten queries based on a domain specific system prompt.

As you can see, we could classify each of these as a domain specific reformulation. Every query captures the intent of the original question from the perspective of the fictional Galactic Voyages travel agency.

Now, let me show you how it looks like when using the generic system prompt.

private string GetGenericQueryRewriteSystemPrompt(int count = 3)
{
    return $$"""
        You are a query rewriting assistant for Retrieval-Augmented Generation (RAG).
        Your task is to take a user’s original question and generate {{count}} alternative versions of that query.
        Each rewritten query should preserve the user’s intent while exploring different phrasings, clarifications, or interpretations that might help retrieve more relevant FAQ answers.
        Return the rewrites in a structured JSON array called "Rewrites".
        
        The JSON must follow this exact structure:
        {
            "Rewrites": [
                "rewrite 1",
                "rewrite 2",
                "rewrite 3"
            ]
        }
        
        The "Rewrites" array must contain exactly {{count}} items.
        """;
}

Below are the results:

Multi-query pattern with rewritten queries based on a generic system prompt.

This is more of a semantic or paraphrasing reformulation. I think we can both agree it looks weaker compared to using a domain-specific system prompt.

That’s something worth remembering. If you have a very specific domain, it can make a significant difference.

Let’s stick to the domain specific system prompt and discuss the further steps. I will invoke Vector Search with a custom query rewriter now.

First of all, I have to create an embedding for each of the query rewrites:

{
	var embeddings = await _embeddingClient.GenerateEmbeddingsAsync(rewrittenQueries);
	searchOptionsPerQuery = [.. embeddings.Value
		.Select(embedding =>
		{
			var searchOptions = CreateSearchOptionsBase();
			searchOptions.VectorSearch = CreateVectorSearchOptions(embedding.ToFloats());

			return searchOptions;
		})];
}

And if you’re thinking now… wow, does that mean my system will generate 5x more query vectors with a custom rewriter? The answer is: yes. That’s the trade‑off.

As you can see below, for each of the rewritten queries, the TOP 3 (in a real app I suggest to return more) most similar documents are returned from Azure AI Search.

Multi-Query rewrites for RAG combined with Vector Search

Now the question arises: how should we combine these results?

There are three basic approaches:

RRF (Reciprocal Rank Fusion)

This method ignores the raw Score value and focuses purely on ranking. Each query produces its own ranked list, and RRF merges them by rewarding documents that consistently appear near the top. If you remember hybrid search (read more here), this follows the same RRF principle.

Take the MAX score per document

For each document ID, you keep the highest score it received across all rewrites. This works well when you trust the scoring function and want to highlight the strongest match.

Aggregate score per document

Here you sum all scores for each document ID. This rewards documents that appear frequently across multiple rewrites, even if no single score is the highest.

In my C# app, I have chosen the MAX score approach:

private static IReadOnlyList<StarshipSemanticSearchDocumentResult> GetBestDocumentsByMaxScore(SemanticSearchResult[] result, int topN = 3)
{
    var bestDocumentsById = new Dictionary<string, StarshipSemanticSearchDocumentResult>();

    foreach (var document in result.SelectMany(searchResult => searchResult.Documents))
    {
        if (!bestDocumentsById.TryGetValue(document.Id!, out var currentBest) || document.Score > currentBest.Score)
        {
            bestDocumentsById[document.Id!] = document;
        }
    }

    return [.. bestDocumentsById.Values.OrderByDescending(document => document.Score).Take(topN)];
}

Which method is best? Hard to say… That would require testing on large datasets from different domains, and the answer may vary depending on the specific scenario.

If you asked me which one to choose though, I would probably go with RRF like logic as a default option but of course… if you have a RAG solution then you should also have an evaluation pipeline so just check the accuracy by changing just that part of the RAG pipeline and compare the results.

Below are the final results:

Result 1
  Score: 0.8757
  ReRanker Score: 0
  Id: gv-luxury-007
  Title: Celestial-Cruise-Liner
  Category: Luxury
  Overview: An opulent passenger vessel offering premium travel experiences between core galactic hubs.

Result 2
  Score: 0.8693
  ReRanker Score: 0
  Id: gv-yacht-010
  Title: Horizon-Class Private Yacht
  Category: Luxury
  Overview: A high-end private vessel designed for elite travelers seeking speed and exclusivity in their transit.

Result 3
  Score: 0.8467
  ReRanker Score: 0
  Id: gv-scout-005
  Title: Pathfinder-Class Scout
  Category: Reconnaissance
  Overview: A nimble, stealth-focused ship used for charting unknown sectors and deep-space exploration.

Now you know how to implement it yourself but Multi-Query retrieval for RAG can be also used as a built-in Azure AI Search feature.

Let’s discuss it now.

Using Azure AI Search for Query Rewrites

First of all, you want to leverage query rewrites feature in Azure AI Search?

You must enable the Semantic Ranker feature. Go to the Premium Features tab and enable it.

When I write this blog post this feature is still in the Public Preview phase.

You should be also aware that query rewrites capability is not available in all the regions. You can check the available regions here.

How to invoke it then? First of all, you must use semantic ranking (you can find a deep-dive into semantic ranking here) which means you have to:

  • specify queryType as “semantic” OR
  • use semanticQuery field

Now, you should specify 2 properties:

  • set queryRewrites to “generative|count-[from 1 to 10]” e.g. generative|count-5
  • set queryLanguage to the search text language. Use language locales e.g. en-US or pl-PL

This is how such a query may look like:

{
    "top": 3,
    "search": "fast and prestige",
    "count": true,
    "queryType": "semantic",
    "captions": "extractive",
    "answers": "extractive",
    "queryRewrites": "generative|count-3",
    "queryLanguage": "en-us",
    "debug": "queryRewrites"
}

Please note that I also specified debug to ‘queryRewrites’ thanks to which we can see an additional debug information. Below are the results:

{
  "@odata.context": "https://deployed-in-azure-aisearch.search.windows.net/indexes('starships-index-semantic-query-rewrite')/$metadata#docs(*)",
  "@odata.count": 9,
  "@search.answers": [
    {
      "key": "gv-yacht-010",
      "text": "A high-end private vessel designed for elite travelers seeking speed and exclusivity in their transit.",
      "highlights": "A<em> high-end private vessel </em>designed for<em> elite </em>travelers seeking<em> speed and exclusivity </em>in their transit.",
      "score": 0.8980000019073486
    }
  ],
  "@search.debug": {
    "semantic": null,
    "queryRewrites": {
      "text": {
        "inputQuery": "fast and prestige",
        "rewrites": [
          "Fast and Prestige gameplay",
          "Fast and Prestige game",
          "Fast and the Furious film series",
          "Fast and the Furious Prestige movie",
          "Fast and the Furious and Prestige"
        ]
      },
      "vectors": []
    }
  },
  "value": [
    {
      "@search.score": 4.027074,
      "@search.rerankerScore": 1.8844430446624756,
      "@search.captions": [
        {
          "text": "A high-end private vessel designed for elite travelers seeking speed and exclusivity in their transit.",
          "highlights": "A<em> high-end private vessel </em>designed for<em> elite </em>travelers seeking<em> speed and exclusivity </em>in their transit."
        }
      ],
      "Id": "gv-yacht-010",
      "Title": "Horizon-Class Private Yacht",
      "Category": "Luxury",
      "Overview": "A high-end private vessel designed for elite travelers seeking speed and exclusivity in their transit.",
      "Features": [
        "Custom interior",
        "Point-to-point warp drive"
      ]
    },
    {
      "@search.score": 33.20215,
      "@search.rerankerScore": 1.7688860893249512,
      "@search.captions": [
        {
          "text": "A fast, agile security vessel designed for fleet protection and rapid response to distress signals.",
          "highlights": "A<em> fast, </em>agile security vessel designed for fleet protection and rapid response to distress signals."
        }
      ],
      "Id": "gv-inter-006",
      "Title": "Interceptor-Class Guardian",
      "Category": "Security",
      "Overview": "A fast, agile security vessel designed for fleet protection and rapid response to distress signals.",
      "Features": [
        "Pulse laser cannons",
        "Enhanced shield generators"
      ]
    },
    {
      "@search.score": 5.882675,
      "@search.rerankerScore": 1.495944857597351,
      "@search.captions": [
        {
          "text": "The Zephyr-Class is a compact vessel designed for rapid transit between orbital stations and planetary surfaces.",
          "highlights": "The Zephyr-Class is a compact vessel designed for<em> rapid transit </em>between orbital stations and planetary surfaces."
        }
      ],
      "Id": "gv-shuttle-002",
      "Title": "Zephyr-Class Transit",
      "Category": "Shuttle",
      "Overview": "The Zephyr-Class is a compact vessel designed for rapid transit between orbital stations and planetary surfaces.",
      "Features": [
        "Ergonomic seating",
        "Automated navigation alerts"
      ]
    }
  ]
}

If you look at the query rewrites that were generated, you may feel a little disappointed… and honestly, so do I.

I deliberately wanted to show you the custom query rewriter with both a domain specific system prompt and a generic system prompt, because what we’re seeing here resembles the quality of the rewrites produced with that generic prompt, do you remember?

I hope that in the future Azure AI Search will add the ability to control the prompt that’s sent to the SLM behind the scenes.

But… what I’m showing you here is a very specific example of a brief and vague query. For more descriptive queries, that mechanism obviously works much better.

Let’s carry on and focus on the C# example now. Below is the method responsible for invoking the similar search directly in C# code:

private async Task<SemanticSearchResult> SearchUsingAiSearchAsync(string? question, string selectedSearchMethod)
{
    var searchOptions = CreateSearchOptionsBase();
    searchOptions.QueryType = SearchQueryType.Semantic;
    searchOptions.QueryLanguage = "en-us";

    if (selectedSearchMethod.Contains("Vector"))
    {
        searchOptions.VectorSearch = new VectorSearchOptions()
        {
            Queries =
            {
                new VectorizableTextQuery(question)
                {
                    KNearestNeighborsCount = 3,
                    Fields = { nameof(StarshipSearchDocument.OverviewVector) }
                }
            }
        };
    }

    searchOptions.SemanticSearch = new SemanticSearchOptions()
    {
        QueryRewrites = new QueryRewrites(QueryRewritesType.Generative)
        {
            Count = 5
        }
    };

    searchOptions.Debug = "queryRewrites";

    return await SearchAsync(question, searchOptions);
}

I specify QueryRewrites within SemanticSearchOptions and Debug property at the SearchOptions root scope. I hope this all makes sense. But there is one more interesting aspect I would like to explain which is VectorizableTextQuery.

If you read this blog post you know what vectorizer means in the context of Azure AI Search service but let me repeat it concisely here.

If we define a vectorizer within our JSON index definition then we do not have convert user’s query/prompt into a query vector in our C# logic because we offload that responsibility to the Azure AI Search service.

Without a vectorizer defined for a given vector field we are forced to use VectorizedQuery where we pass an embedding but in case of VectorizableTextQuery we just pass the text and that query vector is created on the Azure AI Search side.

So what’s the conclusion?

If you want to use query rewrites in conjunction with Vector Search then you MUST use a vectorizer because the text you pass using the search property must be the same you pass to the VectorizableTextQuery.

Just think of it, what would be the point of sending an embedding of the original question if that question will be turned into N similar questions and at that point such an original embedding becomes completely redundant.

Below are the results of combining BM25 + Vector Search + Semantic Ranking + Query Rewrites:

Query Rewrites feature in Azure AI Search.

What was the flow behind the scenes?

  • The L0 phase – the original query was sent to a SLM which created 5 query rewrites
  • The L1 phase (in parallel for each of the rewritten queries)
    • BM25 was invoked for each of the rewritten queries
    • based on the each of the rewritten queries an embedding was created using a vectorizer and then a vector search was performed
  • The L1 phase – RRF function was invoked to select TOP N most relevant documents
  • The L2 phase – cross-encoder was invoked for the TOP N most relevant documents (up to 50), ReRanker Score got assigned and results were ordered based on it

Now is the right time to talk about Pros and Cons of the Multi-Query retrieval for RAG.

Pros and Cons

This approach certainly has a few benefits, but there are also some drawbacks you should be aware of. It’s worth understanding both sides before deciding to use this pattern.

Pros:

  • Better recall for vague queries – Multiple rewrites capture different interpretations of the user’s intent, improving the chances of retrieving the right documents.
  • More robust retrieval across varied content – If your data spans different terminology or sub-domains, multi-query retrieval helps surface relevant material a single query might miss.
  • Better results for short or underspecified queries – Rewrites add missing context and phrasing variations, which often leads to noticeably better results.

Cons:

  • Higher computational and latency cost – More rewrites mean more BM25 calls, more embeddings and more vector searches.
  • Rewrite quality varies with the model – A generic or weak rewriter can produce noisy or unhelpful rewrites, reducing the overall benefit.

Summary

I hope that after reading this post, you now have a clear understanding of how Multi-Query Retrieval for RAG works and why it can be such a beneficial option.

We walked through both a custom implementation and the built-in approach offered by Azure AI Search (still in Public Preview).

Now it’s your turn to try it out in your own project, and of course, don’t forget to evaluate the accuracy of your RAG system along the way.

That’s it for today.

Thanks a lot for reading, and see you in the next post!

Categorized in:

RAG,