Introduction
You can find all the C# code samples here: RAG GitHub Repository
Hey everyone!
The Naive RAG is the first post in a new series about RAG, where we’ll explore various RAG patterns and techniques, starting from the most basic ones and moving toward more advanced approaches, finishing with multi-hop reasoning agents.
I hope you’re going to like it!
The Scenario: A Galactic Voyages FAQ System
Before we dive into the code and architecture, let’s establish a practical business scenario.

Imagine we work for the “Galactic Voyages” company. Our users are tired of reading through long, static descriptions of our starship fleet on the company website. Our task is to build a chat-like FAQ experience where users can ask questions about our fleet ships like the Starlance Explorer, Nebula X Cruiser, and Iron Drive Clipper and get immediate, accurate answers.
Because standard LLMs are not trained on our private corporate data, we need a way to ground their answers in our actual markdown files. This is where the Naive RAG pattern comes in.
The Basics of the RAG Pattern

Retrieval‑Augmented Generation (RAG) enriches an LLM’s prompt with the most relevant pieces of your private data so it can answer accurately.
Sample of a markdown file:
# Nebula‑X Cruiser
## Overview
The Nebula‑X Cruiser is a long‑distance luxury ship used for comfortable travel between star systems.
## Specifications
- Top Speed: Warp 4.2
- Fuel: Dark‑Matter Cells
- Seats: 240 passengers
- Artificial Gravity: Yes
- Faster‑Than‑Light Travel: Standard Warp Drive
## Features
- Large window deck for space views
- Private cabins
- Smooth and quiet warp travel
## Notes
The Nebula‑X is known for very gentle warp jumps, which makes it popular with first‑time travelers.
How the flow works
- Ingest & embed – Your private documents are converted into vectors using an embedding model. In this simple setup, each markdown file becomes one large vector with no chunking. In production ready solutions chunking is applied very often because its hard to capture “the meaning” of a very long data chunk accurately (imagine a PDF file with 50 pages represented just by a single vector)
- Store in a vector database – Those vectors are saved so they can be searched later. There are many specialized vector databases, but most major database providers have already added vector search capabilities to their traditional, non‑vector engines as well (check this post to get familiar with the options available in Azure).
- Query vector– A user question is turned into a vector using the same embedding model, ensuring all the vectors are within the same vector space thanks to which similarity search can be performed.
- Similarity search – The database finds the closest vectors using cosine similarity (or any other configured method), which measures how aligned their meanings are.
- Augment the prompt – The original text tied to those vectors is added to the user’s question and sent to the LLM.
With this retrieved context, the LLM can generate a grounded, data‑aware answer.
Secure Access to Microsoft Foundry using RBAC
Let’s shift to how we can implement all of that, starting with an aspect that cannot be neglected… security!
I have deployed my models, specifically gpt-4o-mini for the chat client and text-embedding-ada-002 for the embedding client, inside Microsoft Foundry.

When building enterprise-grade applications in C#, relying on API keys is an outdated security practice. Instead, we are using the Azure.Identity NuGet package to leverage Role Based Access Control (you can find deep dive into RBAC in this post).
{
private readonly EmbeddingClient _embeddingClient;
private readonly ChatClient _chatClient;
public NaiveRagExample()
{
var openAiClient = new AzureOpenAIClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_CLIENT_URI")!),
new DefaultAzureCredential());
_embeddingClient = openAiClient.GetEmbeddingClient(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_EMBEDDING_CLIENT_DEPLOYMENT_NAME")!);
_chatClient = openAiClient.GetChatClient(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_EMBEDDING_CHAT_CLIENT_DEPLOYMENT_NAME")!);
}
}By utilizing the DefaultAzureCredential class, our application automatically goes through different credential providers behind the scenes. Locally, it picks up my VisualStudioCredential which is able to pull a security token, thanks to the fact that in my Visual Studio I went to Tools > Options > Azure Service Authentication and logged in with my account (which also exists in my Azure Entra tenant).
In production, DefaultAzureCredential will seamlessly leverage EnvironmentCredential or ManagedIdentityCredential (the best option!). As long as the assigned identity holds the Azure AI User role on the Microsoft Foundry resource, the secure connection can be established.
Implementing Naive RAG in C#
With our secure connection established, we can execute the core logic.
First, we need to vectorize our markdown files. In our data ingestion pipeline, we iterate through the files, read the text, and pass it to the embedding client.
private async Task<IReadOnlyCollection<VectorSearchRecord>> GenerateEmbeddingsAsync()
{
var result = new List<VectorSearchRecord>();
foreach (var kvp in _dataSource)
{
var filePath = GetFilePath(kvp.FileName);
var text = await File.ReadAllTextAsync(filePath);
var embedding = await _embeddingClient.GenerateEmbeddingAsync(text);
var vectorSearchRecord = new VectorSearchRecord
{
Id = kvp.ShipName,
Vector = embedding.Value.ToFloats().ToArray(),
Data = new Dictionary<string, string>()
{
["Text"] = text
}
};
result.Add(vectorSearchRecord);
}
return result;
}Once embeddings are created we can index them in our custom InMemoryVectorDB and then just wait for a user question.
When a question like “How fast is the Nebula X Cruiser?” comes in, we vectorize it and run our cosine similarity search to find the nearest neighbors.
private async Task<string> GetAnswerAsync(string selectedSystemPrompt, string selectedQuestion, int topK = 3)
{
var queryVector = (await _embeddingClient.GenerateEmbeddingAsync(selectedQuestion)).Value.ToFloats().ToArray();
var topNSimilarResults = _inMemoryVectorDb.Search(queryVector, topK);
ChatCompletion chatCompletion = await _chatClient.CompleteChatAsync(new List<ChatMessage>
{
new SystemChatMessage(selectedSystemPrompt),
new UserChatMessage(CreateUserPrompt(selectedQuestion, topNSimilarResults))
});
return chatCompletion.Content[0].Text ?? "Empty response";
}Finally, we construct the user prompt by enriching it with the text chunks we received from the similarity search operation. That is the essence of the RAG pattern!
private string CreateUserPrompt(string question, IEnumerable<VectorSearchResult> topNSimilarResults)
{
var builder = new StringBuilder();
builder.AppendLine("Here are the documents related to the question.");
builder.AppendLine("Use only the information in these documents when answering.");
builder.AppendLine();
builder.AppendLine("=== Retrieved Documents ===");
foreach (var kvp in topNSimilarResults.Select((result, index) => (Result: result, Index: index)))
{
builder.AppendLine();
builder.AppendLine($"[Document {kvp.Index + 1}]");
builder.AppendLine(kvp.Result.Data["Text"]);
}
builder.AppendLine();
builder.AppendLine("=== User Question ===");
builder.AppendLine(question);
return builder.ToString();
}At the end we pass a selected system prompt in conjunction with our user chat message to LLM and waiting for the answer.
Testing the RAG-Based FAQ System
When I run the application and ask about the Nebula X Cruiser’s speed, the system successfully retrieves the correct markdown file, ignores the irrelevant starships, and tells me the top speed is Warp 4.2.

Now let’s check what is the impact of a system prompt on the final answer.
Let’s start with a kid friendly system prompt. Below is the answer to the same question with an another system prompt applied

Let’s test the marketing system prompt now:

Handling Missing Information
What happens when we ask a question our data doesn’t cover, like “Which ship can travel the furthest without refueling?”

Because our markdown files do not provide any actual distance range metrics, the similarity search pulls the closest approximations… which are equally similar because netiher of the documents contains such chunk of information.
However, because we instructed the LLM to only use the provided context, it safely responds that the documents do not provide specific range information. This is exactly how you prevent an LLM from hallucinating in a corporate environment.
Three Real-World Use Cases for Naive RAG
Even without advanced chunking, re-ranking, or hybrid search, the naive RAG pattern is incredibly powerful. Here are three areas where you can implement it immediately:
- Intelligent FAQ Systems: Exactly like our Galactic Voyages scenario, transforming static website pages into conversational interfaces.
- User Manuals: Instead of forcing customers to
Ctrl+Fthrough massive 50-page PDF appliance manuals, you can vectorize the document and let them ask specific troubleshooting questions. - Corporate Knowledge Bases: You can build a data ingestion pipeline that pulls pages from Confluence or SharePoint, vectorizes them, and provides your engineering team with a chat interface to query internal documentation quickly.
Summary
I’ve demonstrated how to build a foundational “Naive RAG” system in C# that grounds LLM responses in private markdown data using vector similarity. By leveraging Azure RBAC for secure authentication to Microsoft Foundry and the Azure.AI.OpenAI library, I’ve shown how to transform static documents into an interactive FAQ experience while preventing hallucinations through strict context injection. You could have also seen the impact of a system prompt on the final results.
This basic setup serves as the essential starting point before I move into more complex techniques in my upcoming posts.
Thanks for reading and see you in the next post!
