Foundry IQ Masterclass: The Agentic Retrieval Pipeline Explained

Table of Contents

Introduction

Hey everyone!

Before you make a decision to use Foundry IQ in Microsoft Foundry, it is good to know how it works behind the scenes step-by-step. With this understanding, it should be easier to make the right decision for your architecture.

If the topic of RAG, Foundry IQ and Azure AI Search is completely new to you, then I encourage you to read these posts first:

Today I want to talk about Agentic Retrieval in Foundry IQ. I’d like to discuss some low level technicalities so that you know exactly how it works and how changing specific configuration influences the retrieval pipeline. I think knowing this is essential if you need to decide whether to go with the “out of the box” offering from Microsoft or implement your own solution that you can fully control. By the end, you will know exactly if the level of control that Foundry IQ and the Knowledge Base SDK gives you is sufficient.

Beyond Single Source RAG: Solving the Data Silo Problem

Classic RAG with a single data source and Azure AI Search index

In a classic RAG (Retrieval Augmented Generation) architecture, we usually start with a single data source. You pull PDFs, vectorize them, and push them to a vector database. But in the enterprise world, data is spread across multiple data silos like SharePoint, Azure Blob Storage, existing search indexes and various web sources.

The challenge is providing a unified knowledge layer that knows which source to call and how to keep everything up to date. This is where Foundry IQ steps in, providing a Knowledge Base that acts as a logical container for your data and the configuration that drives the Agentic Retrieval pipeline.

Defining the Knowledge Base & Knowledge Sources

Basic Configuration for a Knowledge Base in Foundry IQ. — Basic configuration in Foundry IQ

Knowledge sources linked to a specific Knowledge base in Foundry IQ — Knowledge Sources linked to a specific Knowledge Base resource.

A Knowledge Base consists of two parts: the basic configuration that impacts how the pipeline performs and a list of Knowledge Sources.

Remote knowledge sources and Indexed knowlege sources in Foundry IQ.

Indexed Data Sources: These create resources directly in Azure AI Search, including indexes and indexers.
Remote Sources: These are on-demand calls, such as Web search or SharePoint called ad-hoc.

When you configure these, Foundry IQ is actually orchestrating Azure AI Search resources for you behind the scenes. This includes setting up Skillsets like the Azure Content Understanding skill to extract data from complex PDFs.

Here is a look at exactly what is created in your Azure AI Search service when you spin up a Knowledge Base and select a storage account as a data source.

Data Sources

These represent the connections to your raw data, such as Azure Blob Storage containers, which Azure AI Search uses to fetch your documents for ingestion.

Indexers

Screenshot of Azure AI Search portal showing successful indexer runs for the knowledge sources.

Indexers are the automated engines that crawl your Data Sources and push the processed content into your search indexes.

Indexes

These are the target storage units where your vectorized and chunked data is stored, ready for Agentic Retrieval.

Knowledge Bases

Screenshot of the Agentic Retrieval menu in Azure AI Search showing the default knowledge base configuration

This is the logical container that groups your Knowledge Sources and defines the global Reasoning Effort and instructions for the pipeline. This is also the same resource which you can see in Microsoft Foundry portal under ‘Knowledge‘ tab.

Knowledge Sources

This features the specific sources linked to a Knowledge Base, mapping logical names to the underlying resource identifiers in your storage. We can link a single knowledge source to multiple knowledge bases.

Skillsets and Azure Content Understanding

Screenshot of a JSON definition for an Azure AI Search skillset highlighting the Microsoft.Skills.Util.ContentUnderstandingSkill.

This features the low-level Skillset configuration. When you select “Standard” extraction, Foundry IQ automatically adds the Content Understanding Skill to handle OCR and layout analysis for rich PDFs.

Agentic Pipeline Mechanics: How Retrieval Really Works

A detailed flowchart showing the Foundry IQ Agentic Retrieval pipeline. It illustrates the progression from User Intents and Conversation History into a Query Planning and Source Selection phase using a Full LLM. The flow continues through parallel sub-query execution, L2 Semantic Reranking, and an L3 Semantic Classifier (SLM) that determines if a "Fast Exit" is allowed. It includes a Reflection loop for insufficient data and ends with two output modes: Extractive Data (Verbatim Text) and Answer Synthesis (Synthesized Answer).

Here is the breakdown of that agentic retrieval pipeline:

Query Planning: The Full LLM (which you can select within the Basic Configuration) looks at your intents, conversation history, and retrieval instructions to deconstruct the original request into multiple, specific sub-queries.
Source Selection: Instead of a blind search, the system compares user’s query against your Knowledge Source descriptions to pick the most relevant data silos.
Parallel Execution & L2 Reranking: The sub-queries are sent to the selected sources simultaneously. Each result set then undergoes L2 Semantic Reranking via a cross-encoder to ensure high-precision relevance.
L3 Semantic Classification: A fast, custom-tuned SLM performs a final check. If it decides the information is sufficient, it triggers a “Fast Exit” to return the response immediately.
Reflection Loop: If the SLM deems the initial results “insufficient,” it triggers a reflection step where the Full LLM loops back to refine the search. Keep in mind that only one retry can be performed to maintain efficiency.
Flexible Output Modes: The final result includes an activity trace, references, and the response delivered either as Extractive Data (verbatim chunks from your index) or a Synthesized Answer generated by the Full LLM based on your specific answer instructions (available only for Low and Medium reasoning effort).

The “Control Analysis”: Tuning Reasoning Effort

The most direct way to influence the pipeline is through the Reasoning Effort setting. This parameter dictates how much “thinking” the agent does before returning an answer.

Minimal: Skips query planning and source selection. It sends the query to all sources blindly. This is faster but less efficient for complex data.
Low: Enables query planning, knowledge source selection and answer synthesis. The Knowledge Base will summarize the findings into a coherent answer (assuming the Answer Synthesis option is selected). You can also return the raw data using Extractive Data option.
Medium: This is the full agentic experience. It enables the L3 Semantic Classifier and the Reflection Loop, allowing the system to self-correct if the first retrieval pass was insufficient.

Tip: Your Knowledge Source descriptions are critical. The LLM uses these to decide which source to pick during the planning phase. Don’t leave them blank!

Implementation & Programmatic Control

Some of us will interact with knowledge bases programmatically to build production-grade solutions. For this, we use the Azure.Search.Documents SDK. Keep in mind that as of now, these features are still in beta, and the service itself is in public preview.

{
	var kb = new KnowledgeBaseRetrievalClient(
		new Uri(Environment.GetEnvironmentVariable("AZURE_AI_SEARCH_URI")!), 
		knowledgeBaseName: Environment.GetEnvironmentVariable("AZURE_AI_SEARCH_KNOWLEDGE_BASE")!, 
		new DefaultAzureCredential());
}

{
	var kb = new KnowledgeBaseRetrievalClient(
		new Uri(Environment.GetEnvironmentVariable("AZURE_AI_SEARCH_URI")!), 
		knowledgeBaseName: Environment.GetEnvironmentVariable("AZURE_AI_SEARCH_KNOWLEDGE_BASE")!, 
		new DefaultAzureCredential());
}

To get started, I use the KnowledgeBaseRetrievalClient (you can find this code here). It requires the endpoint URI of your Azure AI Search service and the specific Knowledge Base ID you want to target. You can also see that I use DefaultAzureCredential from Azure.Identity NuGet package to establish a secure connection using Search Index Data Reader RBAC role assigned to my security principal (read more about RBAC security model here).

var kbRetrievalRequest = new KnowledgeBaseRetrievalRequest()
{
    IncludeActivity = true,
    KnowledgeSourceParams =
    {
        new AzureBlobKnowledgeSourceParams("knowledge-source-contoso-cloud")
        {
            AlwaysQuerySource = false,
            IncludeReferences = true,
            IncludeReferenceSourceData = true,
            RerankerThreshold = 2.5f
        },
        new IndexedOneLakeKnowledgeSourceParams("ksourceName")
        {
            AlwaysQuerySource = true,
            IncludeReferences = true,
            IncludeReferenceSourceData = true,
            RerankerThreshold = 0.70f
        },
        new IndexedSharePointKnowledgeSourceParams("ksourceName")
        {
            AlwaysQuerySource = true,
            IncludeReferences = true,
            IncludeReferenceSourceData = true,
            RerankerThreshold = 0.70f
        },
        new RemoteSharePointKnowledgeSourceParams("kssourceName")
        {
            AlwaysQuerySource = true,
            IncludeReferences = true,
            IncludeReferenceSourceData = true,
            RerankerThreshold = 0.70f,

            FilterExpressionAddOn = "KeywordQueryLanguage filter"
        },
        new SearchIndexKnowledgeSourceParams("ksourceName")
        {
            AlwaysQuerySource = true,
            IncludeReferences = true,
            IncludeReferenceSourceData = true,
            RerankerThreshold = 0.70f,

            FilterAddOn = "Location eq 'Warsaw'"
        },
        new WebKnowledgeSourceParams("ksourceName")
        {
            AlwaysQuerySource = true,
            IncludeReferences = true,
            IncludeReferenceSourceData = true,
            RerankerThreshold = 0.70f,

            Count = 55,
            Freshness = "freshness",
            Language = "pl",
            Market = "pl",
        }
    },
    MaxOutputSize = 5, // > 5000
    MaxRuntimeInSeconds = 10, // 10 - 600
    Messages =
    {
        new KnowledgeBaseMessage([new KnowledgeBaseMessageTextContent(userQuery)])
        {
            Role = "user"
        }
    },
    OutputMode = KnowledgeRetrievalOutputMode.AnswerSynthesis, // AnswerSynthesis
    RetrievalReasoningEffort = new KnowledgeRetrievalMediumReasoningEffort(),
};

var kbRetrievalRequest = new KnowledgeBaseRetrievalRequest()
{
    IncludeActivity = true,
    KnowledgeSourceParams =
    {
        new AzureBlobKnowledgeSourceParams("knowledge-source-contoso-cloud")
        {
            AlwaysQuerySource = false,
            IncludeReferences = true,
            IncludeReferenceSourceData = true,
            RerankerThreshold = 2.5f
        },
        new IndexedOneLakeKnowledgeSourceParams("ksourceName")
        {
            AlwaysQuerySource = true,
            IncludeReferences = true,
            IncludeReferenceSourceData = true,
            RerankerThreshold = 0.70f
        },
        new IndexedSharePointKnowledgeSourceParams("ksourceName")
        {
            AlwaysQuerySource = true,
            IncludeReferences = true,
            IncludeReferenceSourceData = true,
            RerankerThreshold = 0.70f
        },
        new RemoteSharePointKnowledgeSourceParams("kssourceName")
        {
            AlwaysQuerySource = true,
            IncludeReferences = true,
            IncludeReferenceSourceData = true,
            RerankerThreshold = 0.70f,

            FilterExpressionAddOn = "KeywordQueryLanguage filter"
        },
        new SearchIndexKnowledgeSourceParams("ksourceName")
        {
            AlwaysQuerySource = true,
            IncludeReferences = true,
            IncludeReferenceSourceData = true,
            RerankerThreshold = 0.70f,

            FilterAddOn = "Location eq 'Warsaw'"
        },
        new WebKnowledgeSourceParams("ksourceName")
        {
            AlwaysQuerySource = true,
            IncludeReferences = true,
            IncludeReferenceSourceData = true,
            RerankerThreshold = 0.70f,

            Count = 55,
            Freshness = "freshness",
            Language = "pl",
            Market = "pl",
        }
    },
    MaxOutputSize = 5, // > 5000
    MaxRuntimeInSeconds = 10, // 10 - 600
    Messages =
    {
        new KnowledgeBaseMessage([new KnowledgeBaseMessageTextContent(userQuery)])
        {
            Role = "user"
        }
    },
    OutputMode = KnowledgeRetrievalOutputMode.AnswerSynthesis, // AnswerSynthesis
    RetrievalReasoningEffort = new KnowledgeRetrievalMediumReasoningEffort(),
};

The real power lies in the KnowledgeBaseRetrievalRequest. This is where I can override global settings and fine-tune how each Knowledge Source behaves. Please also note that we can control the RerankerThreshold which is related to the L2 semantic reranking phase. The range for that relevancy value is from 0.0 to 4.0. If you want to reject less relevant results you can achieve it by modifying that property.

You can also see that beyond the 4 properties which are available for all the sources
(AlwaysQuerySources, IncludeReferences, IncludeReferenceSourceData, RerankerThreshold) there are some custom ones like FilterExpressionAddOn for RemoteSharePointKnowledgeSourceParams or FilterAddOn for SearchIndexKnowledgeSourceParams which you can use to enrich the query which is sent to your existing Azure AI Search index (if you want to use semantic search in your existing index then please also remember about defining a vectorizer, if you convert user prompt into a query vector in your C# or Python code then it won’t work!).

One critical detail I always look at is IncludeActivity = true. This allows me to see the Activity Trace, which includes the generated sub-queries and other information which allows to understand what actions were performed behind the scenes.

Summary

I hope this information helped you to understand what you can control and what not within the Foundry IQ Agentic Retrieval pipeline. By deconstructing the flow, from Query Planning to the L3 Semantic Classifier, it becomes clear that while Microsoft automates the complex orchestration of Azure AI Search resources like Indexes and Skillsets, the quality of your output still relies heavily on your configuration of Reasoning Effort and the precision of your Knowledge Source descriptions.

Through the Azure.Search.Documents SDK, we have the power to fine tune this process even further by overriding RerankerThresholds or applying specific Filters on the fly. Whether the managed simplicity of Foundry IQ fits your current requirements or you realize you need the absolute control of a custom-built solution, I hope these technical details help you make the right architectural choice for your next project.

Thanks for reading and see you in the next post!

Tags: Azure AI Search, Microsoft Foundry

Categorized in:

RAG,

Foundry IQ Masterclass: The Agentic Retrieval Pipeline Explained

Introduction

Beyond Single Source RAG: Solving the Data Silo Problem

Defining the Knowledge Base & Knowledge Sources

Data Sources

Indexers

Indexes

Knowledge Bases

Knowledge Sources

Skillsets and Azure Content Understanding

Agentic Pipeline Mechanics: How Retrieval Really Works

The “Control Analysis”: Tuning Reasoning Effort

Implementation & Programmatic Control

Summary

Foundry IQ for Agentic Retrieval: Intro to enterprise RAG

Did you know? 📊

Thanks for the visit 🙌

Press ESC to close

Introduction

Beyond Single Source RAG: Solving the Data Silo Problem

Defining the Knowledge Base & Knowledge Sources

Data Sources

Indexers

Indexes

Knowledge Bases

Knowledge Sources

Skillsets and Azure Content Understanding

Agentic Pipeline Mechanics: How Retrieval Really Works

The “Control Analysis”: Tuning Reasoning Effort

Implementation & Programmatic Control

Summary

Foundry IQ for Agentic Retrieval: Intro to enterprise RAG

More in this CategoryRAG

Foundry IQ for Agentic Retrieval: Intro to enterprise RAG

Building Graph RAG: A Beginner’s Guide using C# and Neo4j

Contextual Retrieval for RAG: Additional Context to Boost Accuracy

Chunking Strategies for RAG with C#: Fixed-Size, Semantic, Parent-Child and more

Did you know? 📊

Thanks for the visit 🙌