HyDE for RAG in Azure: Improve Retrieval with Hypothetical Embeddings

Table of Contents

Introduction

You can find all the C# code samples here: RAG GitHub Repository

Before reading this post about HyDE for RAG its worth to look at these posts too:

Hey all!

There are many creative ways to use LLMs in a RAG pipeline. Some of them feel obvious, some feel a bit experimental… and some feel almost counterintuitive at first!

One of those techniques is HyDE for RAG (Hypothetical Document Embedding).

It’s one of those ideas that doesn’t look groundbreaking at first glance, nothing fancy, no new indexing tricks, no complicated setup. Yet when you plug it into a real retrieval pipeline, the improvement can be significant.

What makes it interesting is that HyDE doesn’t change your data or your search configuration. Instead, it changes something much smaller… but far more impactful: the text you embed.

And that tiny shift can make a huge difference.

Let’s dive into more details now.

What problem HyDE solves

In real RAG systems, the user’s question isn’t always the best piece of text to embed. Sometimes it’s too short, sometimes too broad, sometimes too specific, and sometimes it simply doesn’t contain the kind of language that matches your documents well.

Even well-phrased questions can produce embeddings that are a bit “thin”, not because the model is wrong, but because the question itself doesn’t carry enough semantic depth to guide retrieval effectively.

HyDE for RAG helps in exactly these situations.

Instead of relying solely on the original query, HyDE gives your system a richer, more detailed piece of text to embed, something that better reflects the kind of answer the user is looking for. And that extra depth often leads to noticeably better retrieval.

How HyDE Works (The Mechanism)

At its core, HyDE changes one simple thing in the RAG pipeline: instead of embedding the user’s question, we embed a synthetic answer generated by the model.

Here’s the flow:

User Query The user asks a question – short, long, vague, precise, anything.
LLM Generates a Hypothetical Answer – the model produces a possible answer based solely on the question, without access to your documents. This answer isn’t shown to the user, it’s just an internal artifact.
Embed the Hypothetical Answer – Because the synthetic answer is richer and more detailed than the original query, its embedding carries more semantic depth.
Use That Embedding for Retrieval – The vector search now operates on a stronger signal, which often leads to better matches in your document store.
Retrieve Relevant Documents – The rest of the RAG pipeline continues as usual but with better candidates from the start.

Now that we’ve covered the theory and the mechanics, it’s the perfect moment to jump into a real C# implementation and see how HyDE for RAG works in practice.

C# Example: Basic HyDE Implementation

Let’s start with the data set. It’s a small and fairly simple collection featuring ten starships from a fictional travel agency – the Galactic Voyages Company.

Our goal is to find the best matches that can then be injected into the final prompt sent to an LLM, which is the essence of the RAG pattern.

Below are the first three starships from that data source:

[
  {
    "Id": "gv-shuttle-001",
    "Title": "Aurora-Class Shuttle",
    "Category": "Shuttle",
    "Overview": "The Aurora-Class Shuttle is a small ship used for short trips between nearby planets and stations.",
    "Features": [
      "Simple seating",
      "Automatic safety announcements"
    ]
  },
  {
    "Id": "gv-shuttle-002",
    "Title": "Zephyr-Class Transit",
    "Category": "Shuttle",
    "Overview": "The Zephyr-Class is a compact vessel designed for rapid transit between orbital stations and planetary surfaces.",
    "Features": [
      "Ergonomic seating",
      "Automated navigation alerts"
    ]
  },
  {
    "Id": "gv-lifter-003",
    "Title": "Nebula-Heavy-Lifter",
    "Category": "Heavy Lifter",
    "Overview": "The Nebula is a massive industrial vessel built for hauling heavy construction materials between star systems.",
    "Features": [
      "Integrated tractor beams",
      "Reinforced hull plating"
    ]
  }
]

[
  {
    "Id": "gv-shuttle-001",
    "Title": "Aurora-Class Shuttle",
    "Category": "Shuttle",
    "Overview": "The Aurora-Class Shuttle is a small ship used for short trips between nearby planets and stations.",
    "Features": [
      "Simple seating",
      "Automatic safety announcements"
    ]
  },
  {
    "Id": "gv-shuttle-002",
    "Title": "Zephyr-Class Transit",
    "Category": "Shuttle",
    "Overview": "The Zephyr-Class is a compact vessel designed for rapid transit between orbital stations and planetary surfaces.",
    "Features": [
      "Ergonomic seating",
      "Automated navigation alerts"
    ]
  },
  {
    "Id": "gv-lifter-003",
    "Title": "Nebula-Heavy-Lifter",
    "Category": "Heavy Lifter",
    "Overview": "The Nebula is a massive industrial vessel built for hauling heavy construction materials between star systems.",
    "Features": [
      "Integrated tractor beams",
      "Reinforced hull plating"
    ]
  }
]

For each of these starships I create an embedding based on the Overview property and then just push all that data into an index in Azure AI Search.

You can find the index definition in the Data/index_definition.json file in the GitHub repository (05_HyDERAG project).

I use 2 models deployed in Microsoft Foundry:

gpt-4.1-mini – to generate hypothetical answers and for query rewrites (used in the more advanced HyDE setup)
text-embedding-ada-002 – to generate embeddings

This is how the ctor looks like:

public class HyDERAGExample
{
	private readonly EmbeddingClient _embeddingClient;
	private readonly ChatClient _chatClient;
	private readonly SearchClient _searchClient;

	public HyDERAGExample()
	{
		var credential = new DefaultAzureCredential();

		_searchClient = new SearchClient(
			new Uri(Environment.GetEnvironmentVariable("AZURE_AI_SEARCH_URI")!),
			indexName: Environment.GetEnvironmentVariable("AZURE_AI_SEARCH_INDEX")!,
			credential);

		var openAiClient = new AzureOpenAIClient(
			new Uri(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_CLIENT_URI")!),
			credential);

		_embeddingClient = openAiClient.GetEmbeddingClient(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_EMBEDDING_CLIENT_DEPLOYMENT_NAME")!);
		_chatClient = openAiClient.GetChatClient(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_EMBEDDING_CHAT_CLIENT_DEPLOYMENT_NAME")!);
	}
}

public class HyDERAGExample
{
	private readonly EmbeddingClient _embeddingClient;
	private readonly ChatClient _chatClient;
	private readonly SearchClient _searchClient;

	public HyDERAGExample()
	{
		var credential = new DefaultAzureCredential();

		_searchClient = new SearchClient(
			new Uri(Environment.GetEnvironmentVariable("AZURE_AI_SEARCH_URI")!),
			indexName: Environment.GetEnvironmentVariable("AZURE_AI_SEARCH_INDEX")!,
			credential);

		var openAiClient = new AzureOpenAIClient(
			new Uri(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_CLIENT_URI")!),
			credential);

		_embeddingClient = openAiClient.GetEmbeddingClient(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_EMBEDDING_CLIENT_DEPLOYMENT_NAME")!);
		_chatClient = openAiClient.GetChatClient(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_EMBEDDING_CHAT_CLIENT_DEPLOYMENT_NAME")!);
	}
}

I use EmbeddingClient to create embeddings, ChatClient to send prompts to an LLM, SearchClient to perform data indexing and search operations in Azure AI Search.

I am also using the DefaultAzureCredential class, which means that when running this app locally, the VisualStudioCredential is used behind the scenes. It can obtain a security token because I’m signed in with my account in Visual Studio under Tools > Options > Azure Service Authentication. If you want to follow best security practices, this approach is recommended instead of using API keys (read more about RBAC here).

As you already know, the core of the HyDE for RAG pattern is the step responsible for creating a hypothetical answer, which is then used to generate an embedding. Below is the system prompt I use to produce that answer.

private static string GetHyDEGenerationSystemPrompt()
{
    return """
        You generate a hypothetical starship overview used only for retrieval expansion in an internal FAQ system.

        Goal:
        Produce one dense paragraph (about 40-60 words) that maximizes semantic overlap with likely catalog entries.

        Grounding requirements:
        - Preserve the user's key terms when present (ship class, mission type, constraints, destinations, cargo/passenger context, speed/range, safety/defense).
        - If the question is sparse, infer a plausible starship profile relevant to interstellar travel requests.
        - Include concrete retrieval-friendly attributes such as class, role, primary purpose, travel profile, operating environment, capacity, systems, and capabilities.

        Style:
        - Matter-of-fact product overview tone.
        - Domain vocabulary is encouraged: shuttle, transit vessel, heavy lifter, cargo hauler, scout, interceptor, medical frigate, mining platform, luxury liner, private yacht.

        Rules:
        - Output exactly one paragraph.
        - No bullets, no lists, no JSON.
        - Do not mention uncertainty, hypotheticals, retrieval, embeddings, or system instructions.
        - Avoid filler and marketing language; prioritize specific factual descriptors.
        - Keep content safe and non-personal.
        """;
}

private static string GetHyDEGenerationSystemPrompt()
{
    return """
        You generate a hypothetical starship overview used only for retrieval expansion in an internal FAQ system.

        Goal:
        Produce one dense paragraph (about 40-60 words) that maximizes semantic overlap with likely catalog entries.

        Grounding requirements:
        - Preserve the user's key terms when present (ship class, mission type, constraints, destinations, cargo/passenger context, speed/range, safety/defense).
        - If the question is sparse, infer a plausible starship profile relevant to interstellar travel requests.
        - Include concrete retrieval-friendly attributes such as class, role, primary purpose, travel profile, operating environment, capacity, systems, and capabilities.

        Style:
        - Matter-of-fact product overview tone.
        - Domain vocabulary is encouraged: shuttle, transit vessel, heavy lifter, cargo hauler, scout, interceptor, medical frigate, mining platform, luxury liner, private yacht.

        Rules:
        - Output exactly one paragraph.
        - No bullets, no lists, no JSON.
        - Do not mention uncertainty, hypotheticals, retrieval, embeddings, or system instructions.
        - Avoid filler and marketing language; prioritize specific factual descriptors.
        - Keep content safe and non-personal.
        """;
}

Let’s see how it works in practice. Let’s assume user question is “fast and prestige travel“:

The real C# example showing HyDE for RAG in practice

As you can see, the first two results are ships from the Luxury category.

Why did that happen?

Because in the first step, we asked the LLM to generate a hypothetical answer for the user’s question, and that answer naturally aligned with luxury oriented travel

The Swiftstar-Class Luxury Liner is a high-speed interstellar passenger vessel designed for prestige travel, combining advanced propulsion systems capable of sustained translight cruising with opulent accommodations for up to 200 passengers. Its reinforced hull and state-of-the-art inertial dampeners ensure smooth, safe travel across long-range routes, while integrated luxury suites, leisure facilities, and personalized service modules cater to elite clientele seeking rapid, comfortable transit between major star systems.

We can clearly see that the answer contains many semantic similarities to the original user question. Keep in mind that you can influence the quality of these answers in two basic ways:

by adjusting the system prompt
by switching to a more or less powerful LLM

Then, I just create an embedding using that particular hypothetical answer, which is the essence of the Hypothetical Document Embedding pattern.

if (selectedMode == "Basic HyDE")
{
    var hypotheticalAnswer = await GetHypotheticalAnswer(question);
    DisplayHypotheticalAnswer(hypotheticalAnswer);

    var embedding = await _embeddingClient.GenerateEmbeddingAsync(hypotheticalAnswer);
    documents = await SearchByVectorAsync(embedding.Value.ToFloats());
}

if (selectedMode == "Basic HyDE")
{
    var hypotheticalAnswer = await GetHypotheticalAnswer(question);
    DisplayHypotheticalAnswer(hypotheticalAnswer);

    var embedding = await _embeddingClient.GenerateEmbeddingAsync(hypotheticalAnswer);
    documents = await SearchByVectorAsync(embedding.Value.ToFloats());
}

The last step is to perform an ordinary vector search against the OverviewVector field in the index (you can find a deep dive into vector search in Azure AI Search here).

That’s it! That’s the basic version of HyDE for RAG pattern.

How HyDE Combines with Other Retrieval Techniques

HyDE for RAG combined with Keyword Search, Vector Search and Semantic Reranking + Multi-Query

Once you start getting familiar with various RAG patterns and techniques you will soon recognize that you can mix some of them. That rule applies to HyDE for RAG as well.

The line of thinking might go like this…

If HyDE for RAG is supposed to improve search accuracy, then… generating multiple query rewrites and producing a hypothetical answer for each of them should further enhance that accuracy… and if that’s true, why not combine it with hybrid search so we’re not relying solely on vector search?!… and perhaps we could add a semantic ranker on top of that as well.

Sure, that’s possible! I’ll show you how to connect all these dots now.

Be cautious, though, because each of these patterns and techniques adds some overhead to your solution. I want to demonstrate this combination not to suggest it should be your default setup, but to show you what options are available.

I encourage you to prepare an evaluation pipeline for your RAG solution and introduce any optimizations step by step.

If you insisted on a default, ‘safe’ setup, I would suggest the following: L1 phase: Hybrid Search (BM25 + Vector Search) + L2 phase: Semantic Reranker (Cross-Encoder).

Now let’s see how to implement advanced HyDE in C#.

C# Example: Advanced HyDE

At first, let’s look at the system prompt I use to generate query rewrites:

private static string GetQueryRewriteSystemPrompt(int count)
{
    return $$"""
        You are a query rewriting assistant for an FAQ system used by Galactic Voyages, a company that organizes trips to various planets and destinations across the galaxy.
        Your task is to take a user’s original question and generate {{count}} alternative versions of that query.
        Each rewritten query should preserve the user’s intent while exploring different phrasings, clarifications, or interpretations that might help retrieve more relevant FAQ answers.
        Return the rewrites in a structured JSON array called "Rewrites".

        The JSON must follow this exact structure:
        {
            "Rewrites": [
                "rewrite 1",
                "rewrite 2",
                "rewrite 3"
            ]
        }

        The "Rewrites" array must contain exactly {{count}} items.
        """;
}

private static string GetQueryRewriteSystemPrompt(int count)
{
    return $$"""
        You are a query rewriting assistant for an FAQ system used by Galactic Voyages, a company that organizes trips to various planets and destinations across the galaxy.
        Your task is to take a user’s original question and generate {{count}} alternative versions of that query.
        Each rewritten query should preserve the user’s intent while exploring different phrasings, clarifications, or interpretations that might help retrieve more relevant FAQ answers.
        Return the rewrites in a structured JSON array called "Rewrites".

        The JSON must follow this exact structure:
        {
            "Rewrites": [
                "rewrite 1",
                "rewrite 2",
                "rewrite 3"
            ]
        }

        The "Rewrites" array must contain exactly {{count}} items.
        """;
}

And this is how that logic looks like:

{
	var rewriteCount = AnsiConsole.Prompt(new TextPrompt<int>("How many [bold blue]rewritten queries[/]?"));

	var rewrittenQueries = (await GetRewrittenQueries(question, rewriteCount)).ToList();
	DisplayRewrittenQueries(rewrittenQueries);

	var perRewriteTasks = rewrittenQueries
		.Select(async rewrittenQuery =>
		{
			var hypotheticalAnswer = await GetHypotheticalAnswer(rewrittenQuery);
			var embedding = await _embeddingClient.GenerateEmbeddingAsync(hypotheticalAnswer);
			var hybridDocuments = await SearchHybridSemanticAsync(rewrittenQuery, embedding.Value.ToFloats());

			return (RewrittenQuery: rewrittenQuery, HypotheticalAnswer: hypotheticalAnswer, Documents: hybridDocuments);
		});

	var perRewriteResults = await Task.WhenAll(perRewriteTasks);

	foreach (var result in perRewriteResults)
	{
		AnsiConsole.MarkupLineInterpolated($"[bold]Rewrite:[/] {Markup.Escape(result.RewrittenQuery)}");
		DisplayHypotheticalAnswer(result.HypotheticalAnswer);
	}

	var allResults = perRewriteResults.Select(result => result.Documents).ToList();

	documents = GetBestDocumentsByRRF(allResults, 3);
}

{
	var rewriteCount = AnsiConsole.Prompt(new TextPrompt<int>("How many [bold blue]rewritten queries[/]?"));

	var rewrittenQueries = (await GetRewrittenQueries(question, rewriteCount)).ToList();
	DisplayRewrittenQueries(rewrittenQueries);

	var perRewriteTasks = rewrittenQueries
		.Select(async rewrittenQuery =>
		{
			var hypotheticalAnswer = await GetHypotheticalAnswer(rewrittenQuery);
			var embedding = await _embeddingClient.GenerateEmbeddingAsync(hypotheticalAnswer);
			var hybridDocuments = await SearchHybridSemanticAsync(rewrittenQuery, embedding.Value.ToFloats());

			return (RewrittenQuery: rewrittenQuery, HypotheticalAnswer: hypotheticalAnswer, Documents: hybridDocuments);
		});

	var perRewriteResults = await Task.WhenAll(perRewriteTasks);

	foreach (var result in perRewriteResults)
	{
		AnsiConsole.MarkupLineInterpolated($"[bold]Rewrite:[/] {Markup.Escape(result.RewrittenQuery)}");
		DisplayHypotheticalAnswer(result.HypotheticalAnswer);
	}

	var allResults = perRewriteResults.Select(result => result.Documents).ToList();

	documents = GetBestDocumentsByRRF(allResults, 3);
}

Once we have all query rewrites generated then for each of them:

Hypothetical Answer is created
Embedding is created based on that answer
Hybrid Search + Semantic Reranking is invoked (using a rewrittenQuery)

Please note that all of these independent search operations are invoked in paralell using Task.WhenAll .

Once the results are returned I use RRF logic (which focuses on a rank of each document) to combine them into a final set.

This is how it works:

The real C# example showing HyDE for RAG combined with query rewrites + hybrid search + semantic ranking

In addition, I decided to boost the relevance of the Vector Search in the L1 phase. I do not have any evidence that it provides more accurate results (especially based on such a small data set) but intuition tells me it’s a better setup (I just took an assumption that HyDE works best while performing semantic similarity rather than lexical one) but also I just wanted to remind you about that option as well.

private async Task<IReadOnlyList<StarshipSemanticSearchDocumentResult>> SearchHybridSemanticAsync(string rewrittenQuery, ReadOnlyMemory<float> embedding)
{
    var searchOptions = CreateSearchOptionsBase();
    searchOptions.QueryType = SearchQueryType.Semantic;
    searchOptions.SemanticSearch = new SemanticSearchOptions() { SemanticQuery = rewrittenQuery };
    searchOptions.VectorSearch = CreateVectorSearchOptions(embedding, weight: 5);

    return await SearchAsync(rewrittenQuery, searchOptions);
}

private async Task<IReadOnlyList<StarshipSemanticSearchDocumentResult>> SearchHybridSemanticAsync(string rewrittenQuery, ReadOnlyMemory<float> embedding)
{
    var searchOptions = CreateSearchOptionsBase();
    searchOptions.QueryType = SearchQueryType.Semantic;
    searchOptions.SemanticSearch = new SemanticSearchOptions() { SemanticQuery = rewrittenQuery };
    searchOptions.VectorSearch = CreateVectorSearchOptions(embedding, weight: 5);

    return await SearchAsync(rewrittenQuery, searchOptions);
}

This is how I build the VectorSearchOptions then:

private static VectorSearchOptions CreateVectorSearchOptions(ReadOnlyMemory<float> embedding, float weight = 1.0f)
{
    return new VectorSearchOptions
    {
        Queries =
        {
            new VectorizedQuery(embedding)
            {
                KNearestNeighborsCount = 3,
                Fields = { nameof(StarshipSearchDocument.OverviewVector) },
                Weight = weight
            }
        }
    };
}

private static VectorSearchOptions CreateVectorSearchOptions(ReadOnlyMemory<float> embedding, float weight = 1.0f)
{
    return new VectorSearchOptions
    {
        Queries =
        {
            new VectorizedQuery(embedding)
            {
                KNearestNeighborsCount = 3,
                Fields = { nameof(StarshipSearchDocument.OverviewVector) },
                Weight = weight
            }
        }
    };
}

Pros and Cons of HyDE

Now, once we know what Hypothetical Document Embedding is and how to implement HyDE for RAG pattern in C# including the basic and the advanced version of it, let’s focus on pros and cons.

Pros

Better recall for vague or short queries – HyDE enriches the original query by generating a more detailed hypothetical answer. This often helps retrieve documents that a plain embedding of the user’s input would miss.

More semantically expressive embeddings – because the hypothetical answer is longer and more descriptive, the resulting embedding captures more concepts and relationships, improving the quality of vector search.

Helpful in specialized or technical domains – HyDE can fill in missing context when users provide incomplete or ambiguous queries, which is especially useful in domains where terminology or structure matters.

Cons

Additional latency and cost – HyDE requires at least one extra LLM call. If you generate multiple rewrites, the overhead grows accordingly.

Potential drift from the user’s true intent – If the system prompt isn’t well prepared, the hypothetical answer may introduce concepts that don’t match what the user actually meant, which can negatively affect retrieval.

Increased pipeline complexity – HyDE adds another layer to your RAG architecture. It requires evaluation, monitoring, and tuning to ensure it genuinely improves results rather than complicating them.

Summary

I hope that after reading this post, the HyDE pattern no longer feels like just another buzzword, but rather a practical and reliable technique that can genuinely improve a RAG pipeline in the right scenarios.

My goal was not only to explain how HyDE works and how to implement it in both basic and advanced forms, but also to help you understand when it makes sense to use it and what advantages and trade offs come with it

Thanks for reading!

See you in the next post!

Tags: Azure AI Search, Microsoft Foundry

Categorized in:

RAG,

HyDE for RAG in Azure: Improve Retrieval with Hypothetical Embeddings

Introduction

What problem HyDE solves

How HyDE Works (The Mechanism)

C# Example: Basic HyDE Implementation

How HyDE Combines with Other Retrieval Techniques

C# Example: Advanced HyDE

Pros and Cons of HyDE

Pros

Cons

Summary

Multi-Query Retrieval for RAG: Query Rewrites in Azure AI Search

Chunking Strategies for RAG with C#: Fixed-Size, Semantic, Parent-Child and more

Did you know? 📊

Thanks for the visit 🙌

Press ESC to close

Introduction

What problem HyDE solves

How HyDE Works (The Mechanism)

C# Example: Basic HyDE Implementation

How HyDE Combines with Other Retrieval Techniques

C# Example: Advanced HyDE

Pros and Cons of HyDE

Pros

Cons

Summary

Multi-Query Retrieval for RAG: Query Rewrites in Azure AI Search

Chunking Strategies for RAG with C#: Fixed-Size, Semantic, Parent-Child and more

More in this CategoryRAG

Document Level Access in Azure AI Search: A Complete Guide to Secure RAG

Foundry IQ Masterclass: The Agentic Retrieval Pipeline Explained

Foundry IQ for Agentic Retrieval: Intro to enterprise RAG

Building Graph RAG: A Beginner’s Guide using C# and Neo4j

Did you know? 📊

Thanks for the visit 🙌