Table of Contents

Introduction

I encourage you to read these posts first to get the most out of the information provided here:

We have already discussed the basics of Azure AI Search, followed by how to leverage data sources and indexers for automated data ingestion using the pull approach. In another post, we explored the push approach, where the responsibility for indexing the data lies entirely on our side. With that foundational knowledge of the indexing strategies, we are now ready to compare these two approaches and address the key question that often arises in the early stages of working with Azure AI Search: when should you use the pull approach, and when the push approach?

When to use the Pull approach

The fact that, thanks to data sources and indexers, you can easily index your data and make it searchable is very appealing. In addition, you can apply various powerful skillsets using Azure AI Services such as Azure Vision, Azure AI Document Intelligence, AI Translator, and others to enrich the data. Let’s now consider the most suitable use cases for choosing this approach.

Pull indexing is best suited for:

  • Rapid prototyping or proof‑of‑concept projects.
  • Ingesting data from Azure‑native sources with minimal effort.
  • Teams that prefer automation over custom development.
  • Scenarios where scheduled refreshes are sufficient.
    • Important: Indexers can be scheduled, but the minimum refresh interval is 5 minutes. If this is not acceptable in your project then push indexing or a hybrid approach is the only option!
  • If your solution requires AI enrichment (skillsets) or integrated vectorization, you must use the pull model. Skillsets are bound to indexers and cannot run independently, so push indexing is not supported in these scenarios.

When to use the Push approach

What I like about the push approach is the control it gives us (and as developers, we do like having control, don’t we?). Instead of relying on data sources and indexers to handle ingestion automatically, we take full responsibility for deciding what gets indexed and when. This lets us tailor the process to complex scenarios, custom pipelines, or data that doesn’t fit predefined data sources. It also allows us to integrate indexing directly into our applications, ensuring updates happen exactly as we intend. With that in mind, let’s look at the situations where the push approach makes the most sense.

Push indexing works best in these scenarios:

  • the minimal 5 minutes refresh interval is just not acceptable in your project
  • Data originates outside Azure (e.g., CRM, ERP, third‑party APIs) and there is not a built-in data source you could simply use.
  • You need per‑document control with actions like Upload, Merge, MergeOrUpload, and Delete.
  • Complex transformations or enrichment are required before indexing.
  • Integration with CI/CD pipelines or event‑driven workflows is needed.
  • Important: Please remember that if you decide to rely fully on the push approach, you need to create a process that allows you to sync all data to the index yourself. In most cases, you will be updating individual index documents, but having the ability to index all data on demand is essential for enterprise solutions.

Hybrid strategies

I can imagine scenarios where readers feel uncertain because there are good reasons to leverage the pull approach, but the minimum 5 minutes refresh interval (as an example) can quickly become a limitation. Dear reader, all is not lost. We can take the best of both approaches and combine them!

Imagine we’re storing celestial objects in Azure Cosmos DB with fields like name, type, description and distance_light_years.

{
    "id": "1",
    "name": "Sirius",
    "type": "Star",
    "description": "Sirius is the brightest star in the night sky, located in the constellation Canis Major.",
    "distance_light_years": 8.6,
    "_rid": "9oRCANrQQLABAAAAAAAAAA==",
    "_self": "dbs/9oRCAA==/colls/9oRCANrQQLA=/docs/9oRCANrQQLABAAAAAAAAAA==/",
    "_etag": "\"00005800-0000-5600-0000-693d14210000\"",
    "_attachments": "attachments/",
    "_ts": 1765610529
}

Now for the sake of that example let’s imagine that we need to fulfill two business requirements:

  1. The description field must be translated into a few languages, but immediate updates are not mandatory. This data can be refreshed even every 24 hours.
  2. Business also requires that changes to the priority_level property must be reflected immediately, defined as within <30 seconds. The priority_level field may have two states and an event is published to the Azure Service Bus queue priority_level_changed whenever its value changes.
    • normal – default state for most objects.
    • critical_observation – urgent cases requiring immediate attention.

Below is an example of an architecture that fulfills both of these requirements:

Azure architecture diagram illustrating a hybrid approach to data indexing in Azure AI Search, using both the pull and push methods simultaneously.

Let’s first focus on the upper part of the diagram. Here we see an Azure Function App that consumes messages from a specific Azure Service Bus queue (using a built‑in binding, for example) and then invokes the Merge operation to update a single field, priority_level. Keep in mind that the Merge operation assumes the document already exists. Such edge cases should be considered when implementing the final solution.

In the lower part of the diagram, we see an indexer that uses Azure Cosmos DB as a data source and points to our container with celestial objects. Within the indexer definition, there is a skill Microsoft.Skills.Text.TranslationSkill which can translate the description field into multiple languages using the Azure Translator service behind the scenes. The CRON expression configured in the index definition ensures the indexer runs every 24 hours, as required by the business.

ℹ️ Please remember that you can define additional indexers and data sources that point to the same index.

Summary

This article explained the pull, push, and hybrid approaches in Azure AI Search. The pull model works well when scheduled updates and automated AI enrichment are enough, while the push model is better when you need full control and immediate changes. Each approach has clear strengths, but in many projects the most effective solution is to combine them. Pull indexers can handle stable data and AI enrichment tasks, while push merges ensure that urgent updates, such as changes to the priority_level field, are applied right away.

In the end, hybrid indexing gives you both reliability and speed, scheduled updates for predictable data, and real‑time changes when fast reactions are required.

Categorized in:

AI Services,