Table of Contents

Introduction

Hey everyone!

You can find all the C# code samples here: MicrosoftAgentFramework GitHub Repo

This post about service-managed chat history in the Microsoft Agent Framework is a continuation of our previous topic, so I encourage you to get familiar with that one first:

If the Microsoft Agent Framework is a completely new topic to you, I also encourage you to read this foundational post:

In this post, I want to focus on three specific examples:

  1. ProjectConversationsClient (for linear chat history)
  2. OpenAI Responses API stateful (with stored chat history and forking capabilities)
  3. OpenAI Responses API stateless (without stored chat history)

The first two examples are strictly related to the main topic of service-managed chat history. The third one is not, but because I am going to describe the state mechanics of the OpenAI Responses API here, I think this is the right place to talk about the stateless mode of that API as well, including how it handles encrypted reasoning content.

Why delegate storing chat history at all?

The previous post focused heavily on optimizing client-managed storage: tuning composite clustered indexes in Azure SQL to avoid table scans and carefully crafting hierarchical partition keys in Cosmos DB. While that client-side approach provides absolute control over the data layer, it requires ongoing engineering effort around infrastructure management, provisioning, and maintenance.

Shifting to service-managed chat history fundamentally alters these engineering trade-offs. Delegating state management directly to the platform simplifies both the codebase and deployment topology in several key ways:

  • Minimal Request Payloads: Instead of transmitting the entire conversation history back and forth on every turn, the client only passes a lightweight reference identifier. This drastically reduces network overhead and token consumption as the conversation grows.
  • Offloaded Compaction & Window Management: The platform handles token limits automatically under the hood. Development teams no longer need to design, test, and maintain custom client-side compaction strategies such as sliding windows, message truncation, or tool-call collapsing.
  • Retained Chain of Thought (CoT) Context: For complex reasoning models, the step-by-step reasoning tokens and internal context remain fully intact within the server-side state, ensuring that the model’s analytical continuity is never lost between turns.
  • Eliminated Infrastructure & Boilerplate: This approach combines the removal of backend database management (SQL, Cosmos DB, Redis) with the elimination of client-side serialization boilerplate. The application layer bypasses custom state-reconstruction code entirely, rehydrating sessions instantly via a single platform-provided identifier.

This approach essentially trades the fine-grained control of a self-managed database for operational simplicity, allowing development to focus on agent orchestration and business logic rather than database tuning.

Linear (Single-Threaded) Conversations

Let’s start with the most common scenario: storing a standard, linear chat history. In this model, messages follow a simple order, one after another. We cannot split or fork the conversation into different paths. Here, the chat data is saved directly inside the Microsoft Foundry project using the ProjectConversationsClient (from Microsoft.Agents.AI.Foundry NuGet package).

Here is the complete code:

namespace _03_ServiceManagedChatHistory
{
    public class ConversationStoredInFoundryExample
    {
        private readonly AIProjectClient _aiProjectClient;
        private readonly ChatClientAgent _chatClientAgent;
        private readonly ProjectConversationsClient _projectConversationsClient;

        private readonly JsonSerializerOptions _serializerOptions = new() { WriteIndented = true };

        public ConversationStoredInFoundryExample()
        {
            _aiProjectClient = new AIProjectClient(
                new Uri(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_CONVERSATION_CLIENT_URI")!), 
                new DefaultAzureCredential());

            _chatClientAgent = _aiProjectClient.AsAIAgent(
                model: Environment.GetEnvironmentVariable("AZURE_OPEN_AI_MODEL_DEPLOYMENT_NAME")!, 
                instructions: "You are a helpful assisant.", name: "Agent which stores the chat history in Microsoft Foundry");

            _projectConversationsClient = _aiProjectClient
                .GetProjectOpenAIClient()
                .GetProjectConversationsClient();
        }

        public async Task RunAsync()
        {
            Console.Write("Enter a conversation ID to resume (or press Enter to start a new conversation): ");
            var conversationId = Console.ReadLine();

            ProjectConversation projectConversation;
            if (!string.IsNullOrWhiteSpace(conversationId))
            {
                projectConversation = await _projectConversationsClient.GetProjectConversationAsync(conversationId);
                Console.WriteLine($"Resumed conversation: {projectConversation.Id}");
            }
            else
            {
                projectConversation = await _projectConversationsClient.CreateProjectConversationAsync();
                Console.WriteLine($"Started new conversation: {projectConversation.Id}");
            }
            var session = await _chatClientAgent.CreateSessionAsync(projectConversation.Id);

            while (true)
            {
                Console.Write("You: ");
                var input = Console.ReadLine();

                if (string.IsNullOrWhiteSpace(input))
                {
                    break;
                }

                if (input == "history")
                {
                    await PrintTranscriptAsync(projectConversation.Id);
                    continue;
                }

                if (input == "delete")
                {
                    await _projectConversationsClient.DeleteConversationAsync(conversationId);
                    Console.WriteLine($"Conversation with ID `{conversationId}` deleted. Exit.");
                    break;
                }

                var agentResponse = await _chatClientAgent.RunAsync(message: input, session);
                Console.WriteLine($"Agent: {agentResponse}");
                Console.WriteLine($"Response id: {agentResponse.ResponseId}");

                if (session is ChatClientAgentSession chatClientAgentSession)
                {
                    Console.WriteLine($"Conversation id: {chatClientAgentSession.ConversationId}");

                    Console.WriteLine(JsonSerializer.Serialize(chatClientAgentSession, _serializerOptions));
                }
            }
        }

        public async Task PrintTranscriptAsync(string conversationId)
        {
            var itemsStream = _projectConversationsClient.GetProjectConversationItemsAsync(
                conversationId,
                itemKind: null,
                limit: 10,
                order: "asc",
                after: null,
                before: null);

            await foreach (AgentResponseItem agentItem in itemsStream)
            {
                ResponseItem standardItem = agentItem.AsResponseResultItem();

                if (standardItem is MessageResponseItem messageItem)
                {
                    var textParts = messageItem.Content
                        .Where(part => part.Kind == ResponseContentPartKind.OutputText || !string.IsNullOrEmpty(part.Text))
                        .Select(part => part.Text);

                    var fullMessageText = string.Join(Environment.NewLine, textParts);

                    Console.WriteLine($"[{messageItem.Role.ToString().ToUpper()}]: {fullMessageText}");
                }
            }

            Console.WriteLine("========================================================\n");
        }
    }
}

Let’s break down how this code manages the chat history on the server:

  • Saved on the Platform: The agent does not save the chat history on our local machine or in our own database. Instead, it asks Microsoft Foundry to manage it. We use CreateProjectConversationAsync() to start a new chat thread, or GetProjectConversationAsync(conversationId) to get an old one.
  • Easy Session Loading: We don’t need to write complex code to load old messages from a database. We simply pass the projectConversation.Id into _chatClientAgent.CreateSessionAsync(). The framework automatically connects the agent’s current session to that specific history on the server.
  • Flexible History Fetching: When we download the history using GetProjectConversationItemsAsync, we don’t have to pull every message blindly. The method provides built-in options to filter and control the data. We can change the sorting (order), limit the number of messages (limit), filter by type (itemKind), or use cursor pagination (after and before) to load long chats in smaller pages.
  • Full Control over Deletion: Unlike other cloud models where data disappears automatically after some time, this approach gives us full control over the lifecycle. Calling DeleteConversationAsync deletes the conversation from the platform immediately. This is very important for privacy rules like GDPR, where users can ask to delete their data.

Let’s run this example to see how it works in practice. Here is the console output when we start a brand new conversation:

Enter a conversation ID to resume (or press Enter to start a new conversation):
Started new conversation: conv_0d4bf6d6f508656d00gevJ5H74KtWNcqLwQdLqYHg6mi4BBj9f
You: Hi, I am Michal
Agent: Hi Michal - nice to meet you! How can I help today?
Response id: resp_0d4bf6d6f508656d006a1fb954f20081909e21eee170088858
Conversation id: conv_0d4bf6d6f508656d00gevJ5H74KtWNcqLwQdLqYHg6mi4BBj9f
{
  "conversationId": "conv_0d4bf6d6f508656d00gevJ5H74KtWNcqLwQdLqYHg6mi4BBj9f",
  "stateBag": {}
}
You: remind my name
Agent: Your name is Michal.
Response id: resp_0d4bf6d6f508656d006a1fb961030c8190a66cf3334e5b2f38
Conversation id: conv_0d4bf6d6f508656d00gevJ5H74KtWNcqLwQdLqYHg6mi4BBj9f
{
  "conversationId": "conv_0d4bf6d6f508656d00gevJ5H74KtWNcqLwQdLqYHg6mi4BBj9f",
  "stateBag": {}
}
You:

When we look at this output, there are two important things to notice:

  • Consistent Conversation ID vs. Changing Response ID: The session automatically tracks and stores the conversationId. This ID stays exactly the same for every message during our chat thread. On the other hand, the ResponseId changes with every turn because it represents a unique, single answer.
  • Why the State Bag is Empty: You can see that the stateBag JSON object is completely empty {}. This happens because the framework recognizes that we passed a valid conversationId when we invoked CreateSessionAsync. Since the chat history is fully managed by the cloud service under the hood, the framework does not initialize a local InMemoryChatHistoryProvider.

Now, if we run the console application again, we can simulate a new request coming to the server. By entering our previous conversation ID, we can continue our chat right where we left off. We can also type “history” to see that all previous messages were successfully pulled from the server:

Enter a conversation ID to resume (or press Enter to start a new conversation): conv_0d4bf6d6f508656d00gevJ5H74KtWNcqLwQdLqYHg6mi4BBj9f
Resumed conversation: conv_0d4bf6d6f508656d00gevJ5H74KtWNcqLwQdLqYHg6mi4BBj9f
You: one more time
Agent: Your name is Michal.
Response id: resp_0d4bf6d6f508656d006a1fbc72937481909c8a958e69af43c1
Conversation id: conv_0d4bf6d6f508656d00gevJ5H74KtWNcqLwQdLqYHg6mi4BBj9f
{
  "conversationId": "conv_0d4bf6d6f508656d00gevJ5H74KtWNcqLwQdLqYHg6mi4BBj9f",
  "stateBag": {}
}
You: history
[USER]: Hi, I am Michal
[ASSISTANT]: Hi Michal - nice to meet you! How can I help today?
[USER]: remind my name
[ASSISTANT]: Your name is Michal.
[USER]: one more time
[ASSISTANT]: Your name is Michal.
========================================================

You:

Notice how the agent still remembers the name even though the application was completely restarted and has no local memory. This proves that the full conversation history lives safely on the server.

Of course, even though the platform stores all the actual messages, we still need to save the conversationId on our side (for example, in a database like Azure Cosmos DB) linked to our user. This way, we know which ID to load when the user comes back.

OpenAI Responses API stateful

Now let’s look at a much more flexible model: the OpenAI Responses API stateful mode for service-managed chat history. In this approach, we don’t look at a conversation as a simple straight line. Instead, the history works like a tree with branches. Every single answer from the model has its own ID, and we can use any past answer as a starting point to continue the chat.

Here is the complete code for this example:

namespace _03_ServiceManagedChatHistory
{
    public class ResponsesApiWithStoredOutputExample
    {
        private readonly ChatClientAgent _mafAgent;
        private readonly JsonSerializerOptions _serializerOptions = new() { WriteIndented = true };

        public ResponsesApiWithStoredOutputExample()
        {
            var openAiClient = new AzureOpenAIClient(
                new Uri(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_RESPONSES_CLIENT_URI")!),
                new DefaultAzureCredential());

            var responsesClient = openAiClient
                .GetResponsesClient();

            _mafAgent = responsesClient
                .AsAIAgent(new ChatClientAgentOptions()
                {
                    Name = "Agent which does not store any history on its own",
                    ChatOptions = new ChatOptions { Instructions = "You are a helpful assisant." },
                },
                model: Environment.GetEnvironmentVariable("AZURE_OPEN_AI_MODEL_DEPLOYMENT_NAME")!);
        }

        public async Task RunAsync()
        {
            Console.Write("Enter a Previous Response ID to resume (or press Enter to start a new conversation): ");
            var previousResponseId = Console.ReadLine();

            var session = !string.IsNullOrWhiteSpace(previousResponseId)
                ? await _mafAgent.CreateSessionAsync(previousResponseId)
                : await _mafAgent.CreateSessionAsync();

            if (!string.IsNullOrWhiteSpace(previousResponseId))
            {
                Console.WriteLine($"Resumed from response: {previousResponseId}");
            }
            else
            {
                Console.WriteLine("Started a new conversation.");
            }

            while (true)
            {
                Console.Write("You: ");
                var input = Console.ReadLine();

                if (string.IsNullOrWhiteSpace(input))
                {
                    break;
                }

                var agentResponse = await _mafAgent.RunAsync(message: input, session);
                Console.WriteLine($"Agent: {agentResponse}");
                Console.WriteLine($"Response id: {agentResponse.ResponseId}");

                if (session is ChatClientAgentSession chatClientAgentSession)
                {
                    Console.WriteLine($"Conversation id: {chatClientAgentSession.ConversationId}");
                    Console.WriteLine(JsonSerializer.Serialize(chatClientAgentSession, _serializerOptions));
                }
            }
        }
    }
}

This model changes how we handle session tracking:

  • Resuming from a Response ID: We don’t pass a conversation thread ID to CreateSessionAsync. Instead, we pass the previousResponseId (the ID of the last answer). The cloud service knows exactly which previous messages led to that answer and automatically reloads the whole history for us.
  • The Ability to Fork: Because sessions are linked to a response ID rather than a fixed conversation container, we can build “forking” workflows. For example, a user can go back to an older answer from yesterday, pass that specific ID, and start a completely new branch of the conversation without losing or ruining the original path.

The Evolution of OpenAI APIs and the Chain of Thought Benefit

To understand why the OpenAPI Responses API is a big step forward, we should look briefly at how OpenAI APIs have evolved over the years:

  • 2020 – Chat API: The early days, designed mostly for single text prompts and simple completions (a.k.a. the sentence finisher).
  • 2022 – Chat Completions API: Introduced the concept of roles (system, user, assistant) to structure conversations, but the client still had to send the entire history back and forth every time.
  • 2025 – Responses API: The modern model that moves state management to the server side (by default) and introduces linked response trees and other capabilities like Web Search, File Search, Code Interpreter etc.

The Chain of Thought Problem: Solved

When using advanced reasoning models (like the gpt-5 series), the model generates an internal Chain of Thought (CoT): a step-by-step thinking process, before giving the final answer.

OpenAI decided not to return the full, raw internal Chain of Thought to the client to protect their intellectual property. In a purely stateless model, this creates a massive problem: because the client never receives the CoT tokens, it cannot send them back in the next request. As a result, the model loses its deep analytical context between turns.

With the stateful OpenAI Responses API this problem completely disappears. Because OpenAI saves the conversation history directly on their servers, they persist the original Chain of Thought context as well. When we resume a session using a previous Response ID, the model doesn’t just remember the text answers; it remembers its exact internal thinking path from the previous turn.

OpenAI Responses API stateless

But now you may wonder… wait a second, I have heard the new Responses API is the recommended approach and the Chat Completions API will be abandoned soon… but what if I cannot allow storing sensitive chat history on an external server?

If we work in an enterprise with strict compliance or privacy rules, saving chat threads on a public cloud endpoint might be blocked. For this exact situation, the framework provides a way to run the modern Responses API in a completely stateless mode.

Here is the constructor used in the 3rd example (the rest remains almost similar to the 2nd example):

public ResponsesApiWithStoredOutputDisabledExample()
{
    AzureOpenAIClient openAiClient = new AzureOpenAIClient(
        new Uri(Environment.GetEnvironmentVariable("AZURE_OPEN_AI_RESPONSES_CLIENT_URI")!),
        new DefaultAzureCredential());

    var responesClient = openAiClient
        .GetResponsesClient()
        .AsIChatClientWithStoredOutputDisabled(
            Environment.GetEnvironmentVariable("AZURE_OPEN_AI_MODEL_DEPLOYMENT_NAME")!, 
            includeReasoningEncryptedContent: _includeReasoningEncryptedContent);

    _mafAgent = new ChatClientAgent(responesClient, new ChatClientAgentOptions()
    {
        Name = "Agent which does not store any history on its own",
        ChatOptions = new ChatOptions
        {
            Instructions = "You are a helpful assisant.",
            Reasoning = new ReasoningOptions() { Effort = ReasoningEffort.Medium, Output = ReasoningOutput.Full }, // effort, summary
        }
    });
}

Let’s look closely at that constructor, as it sets up the stateless behavior and configures how the model handles complex reasoning:

  • Disabling Server Storage: We chain the .AsIChatClientWithStoredOutputDisabled() method directly onto the responses client. This tells the system that we want to use the new API pipeline, but we explicitly forbid the cloud service from saving our chat history.
  • Configuring Reasoning Effort: Inside ChatOptions, we configure the ReasoningOptions by setting the Effort to Medium and Output to Full. This allows us to control how much thinking time the model uses for difficult tasks before it gives us the final text answer.
  • Enabling Encrypted Content: We pass the _includeReasoningEncryptedContent flag into the extension method. This ensures that even though the server discards the history, it still packages and sends the encrypted Chain of Thought tokens down to our client application inside the response payload.

Demo: Analyzing the Stateless Output

Let’s look at the JSON structure returned during this stateless demo.

A screenshot of a JSON code snippet on a black background showing the data structure of a stateless agent session. At the top, a blue arrow points to the conversationId property, which is set to null. A blue box highlights the InMemoryChatHistoryProvider inside the stateBag object. Further down, another blue box highlights a content item with the type set to reasoning, showing readable text about travel recommendations for Krakow. At the bottom, a final blue arrow points to the protectedData property, which contains a long, encrypted text string.

When we analyze this output, three critical architectural details stand out:

  • Null Conversation ID: As we can see at the top of the image, the conversationId is completely null. Because we explicitly told the OpenAI service to disable stored outputs, the platform does not create or track a server-side conversation.
  • Fallback to Local Memory: Because the cloud service is not saving the chat history and we did not configure a custom production database, the framework automatically falls back to using the default InMemoryChatHistoryProvider.
  • The Reasoning Text: Inside the contents array, we can see an item with "$type": "reasoning". This text outlines the logical steps the model takes. If we run our agent in streaming mode, we can stream this specific text directly to the UI. This is a great user experience pattern because it shows the user the progress of the agent “thinking” in real time.

The Mystery: How is the Chain of Thought Encrypted?

Looking closely at the image, you might ask a very logical question:

Wait a second… if we enabled includeReasoningEncryptedContent because the Chain of Thought is supposed to be encrypted, why can we clearly read the text inside the reasoning block?

Here is the explanation: The text you can read under "$type": "reasoning" is not the raw, original Chain of Thought. Instead, it is an outcome generated by a reasoning summarizer component. It is a safe summary meant for display.

The true, raw, original Chain of Thought tokens are highly sensitive. They are completely protected and encrypted by the platform. As pointed out by the bottom arrow, that raw data lives securely inside the protectedData property as an unreadable string.

Summary

I hope this information helps you implement service-managed chat history in your Microsoft Agent Framework projects. Whether you choose project-level conversations, stateful response trees, or a completely stateless mode, you now have the tools to choose the right architecture for your needs.

Thanks for reading and see you in the next post!

Categorized in:

Agents,