Indexing Deep Dive — Cosmos DB

TL;DR

Cosmos indexes every field of every document by default. That's why your queries "just work" — and also why your write RUs are higher than they need to be. Tuning the index policy to match your actual queries is the single biggest write-cost optimization available.

Key takeaways

▸Default policy — include `/*` (everything). Convenient but pays for indexing fields nobody queries.
▸Three index types — range (equality, ORDER BY, comparisons), spatial (geo), composite (multi-field ORDER BY / WHERE).
▸Excluding paths via `excludedPaths` cuts write RUs by 20–60% on doc-heavy workloads.
▸Composite indexes are opt-in and required for any multi-field ORDER BY.
▸You can change index policy on a live container — Cosmos rebuilds in the background, no downtime.

Cosmos’s default — index every property of every document on every path — is a deliberate choice. It means your queries work the moment you write them, with no upfront DBA work. The cost is that every write has to update every index, and most apps have many fields that nobody ever queries. Tuning the index policy is where you get write RUs back.

What the default policy actually looks like

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [{ "path": "/*" }],
  "excludedPaths": [{ "path": "/_etag/?" }]
}

Translation — “index everything except the system etag.” Every JSON path on every doc is in the index, supporting equality, ordering, and range queries.

The cost you’re paying

A 1 KB document with a default policy adds ~5 RU to a write. Half of that is index maintenance. Strip out fields you never query and that drops to 3 RU — a 40% saving on a write-heavy workload.

For an event log container ingesting 10K writes/sec — that’s 20K RU/s saved, which translates directly to provisioned throughput cost.

Excluding paths

{
  "indexingMode": "consistent",
  "includedPaths": [{ "path": "/*" }],
  "excludedPaths": [
    { "path": "/payload/*" },
    { "path": "/diagnostics/?" },
    { "path": "/_etag/?" }
  ]
}

/payload/* — exclude an entire subtree. Useful when you’re storing large blobs of data that you only fetch by id.

/diagnostics/? — exclude one specific path. The ? matches the leaf only.

The full reference uses globs — * matches one segment, ? matches a leaf — and the most-specific match wins.

Index types

Range — the default. Supports =, <, >, <=, >=, BETWEEN, ORDER BY. Works on numbers, strings, booleans.

Spatial — for ST_DISTANCE, ST_WITHIN, polygon containment. Only on GeoJSON points/polygons. Opt-in.

Composite — required for multi-field ORDER BY. Declared as an ordered list:

{
  "compositeIndexes": [
    [
      { "path": "/tenantId", "order": "ascending" },
      { "path": "/createdAt", "order": "descending" }
    ]
  ]
}

This index serves ORDER BY c.tenantId ASC, c.createdAt DESC and the reverse. Other field orderings require additional composite indexes.

Indexing modes

Consistent (default) — index updates synchronously with writes. Reads always see the latest. Highest write cost.
None — no indexing. Only point reads work; queries fail. Use for write-only logs you’ll consume by Change Feed and never query directly.

There’s no “lazy” mode anymore — Cosmos retired it. Modern choice is consistent or none.

Tuning workflow

Run your app a week with the default policy.
Open the portal → container → “Indexing Metrics.” It tells you per-path utilization.
Look for paths with “0 reads, N writes” — those are pure cost.
Either exclude them with a glob, or — if you have a tightly scoped app — flip to an includedPaths allow-list.
Update the policy. Cosmos rebuilds in background. Watch RU/s and 429s during the rebuild.

For most teams, two iterations of this loop drop write RUs by 30–50%. Over a year, that’s the salary of an SDE.

Vector indexes

Lesson V14 covers these in depth, but worth a mention — Cosmos now has native vector indexes for similarity search. They’re declared in the index policy with their own type (quantizedFlat, diskANN) and dimension. If you’re doing RAG inside Cosmos, you’ll write this policy too.

🎯 Common questions

Q1. How do I know which paths are worth excluding? ▾

Look at your queries. Any field that's never in a WHERE, ORDER BY, or JOIN is a candidate for exclusion. Big arrays of strings (logs, history) are usually the highest-value cuts. The portal's "indexing metrics" report shows you exactly which paths are read by recent queries.

Q2. What happens to existing data when I change the policy? ▾

Cosmos reindexes in the background. Reads that need a path mid-rebuild may transparently fall back to scan. The container's `IndexTransformationProgress` header tells you % complete. No downtime, no manual rebuild step.

Q3. Do I need composite indexes for multi-condition WHERE clauses? ▾

No — separate range indexes on each field handle multi-condition WHERE just fine. Composite indexes are specifically for `ORDER BY` across multiple fields, or for queries where the SDK can prove a composite scan is faster.