Request Units & Cost — Cosmos DB

TL;DR

A Request Unit (RU) is Cosmos DB's unit of throughput — one normalized blend of CPU, memory, and IOPS. Every operation has a deterministic cost in RUs, so you can predict your bill from your workload's shape, not from raw IOPS guesses. Get the cost model in your head and you stop overprovisioning.

Key takeaways

▸1 point read of a 1 KB doc = 1 RU. That's the anchor every other cost is calibrated against.
▸Writes are 5–7× more expensive than reads, mostly because of indexing. Cutting unused indexes is the single biggest cost lever.
▸Query cost depends on document scan, not result size. A query that scans 100 docs and returns 10 still pays for the 100.
▸Cross-partition queries cost 10–30× a single-partition query. Make sure your hot path includes the partition key.
▸Three pricing modes — Provisioned (fixed RU/s, cheap if utilization is high), Autoscale (4× burst, costs 1.5× the floor), Serverless (pay per request, best for spiky low-traffic).

Most engineers come from a relational world where database cost is a fuzzy “size of the box × storage volume”. Cosmos breaks that mental model. Every operation has a price tag in RUs, and your monthly bill is just the sum of those price tags. Once you internalize the math, your bill stops being mysterious.

This lesson is the math.

What a Request Unit actually is

An RU is a normalized blend of CPU + memory + IOPS for one operation. Microsoft calibrated the unit so that reading a single 1 KB document by id + partitionKey costs exactly 1 RU. Everything else is scaled relative to that anchor.

Why a blend? Because a database operation isn’t just disk I/O. A query parses, plans, scans, deserializes, applies filters, projects fields — each of those touches CPU and memory. If Cosmos charged you in raw IOPS, write-heavy workloads would underpay (most of their cost is index updates, not disk seeks); query-heavy workloads would overpay. The RU normalizes all of it into one number.

The price tag for every operation

Here’s the cheat sheet most engineers eventually memorize:

Operation	Approximate RU cost
Point read (`id + partitionKey`), 1 KB	1 RU
Point read, 10 KB	~3 RUs
Point read, 100 KB	~10 RUs
Insert, 1 KB, default indexing	5–7 RUs
Replace (full-doc update)	~5–7 RUs
Patch (partial update, 5 fields)	~5 RUs
Delete	~5 RUs
Single-partition query, 10 results	~5–10 RUs
Cross-partition query, 100 docs scanned	30–100+ RUs
Stored procedure (one call)	varies — counts each internal op

A few patterns worth internalizing:

Reads are cheap, writes are 5–7×. If your app is 10:1 read-to-write, your cost is dominated by reads. If 1:1, writes dominate.
Doc size scales the cost roughly linearly above 1 KB, more aggressively for very large docs.
Query cost is about scan, not result size. If your filter eliminates 90 % of docs at the index, the query is cheap. If it filters in code after scan, you pay for everything you scanned.

The three pricing modes

Cosmos sells RUs three ways. The right one depends on your workload’s shape.

Provisioned throughput

You set a fixed RU/s. Cosmos guarantees that capacity 24/7. Cheapest at steady utilization (60 %+).

1,000 RU/s × $0.008/hour × 730 hours = $58/month

Use for production workloads with predictable traffic.

Autoscale

You set a max; Cosmos scales between max/10 and max based on actual traffic. Costs 1.5× provisioned per RU consumed, but you stop overprovisioning for the peak.

10,000 RU/s autoscale max → fluctuates 1,000–10,000
You're billed at max / 10 per hour as a floor + actual usage

The math — autoscale wins when peak/average > 1.5×. For business-hours traffic, end-of-month cron jobs, or any workload with a 5–10× spike pattern, it’s a clear win.

Serverless

No provisioning at all. Pay $0.25 per million RUs consumed. Hard cap of 5,000 RU/s per container.

Use for:

Dev/test environments
Prototypes and side projects
Apps with low traffic that spikes occasionally
Workloads under ~1M RUs/month (often free under Azure free tier)

Modeling your bill before you ship

You can predict your monthly RU cost from a napkin:

RUs/month = (reads × 1) + (writes × 6) + (queries × 8) + (cross-partition × 30)

For a SaaS app with:

10M reads/day → 300M reads/month × 1 RU = 300M RUs
1M writes/day → 30M writes/month × 6 RU = 180M RUs
100K queries/day → 3M queries/month × 8 RU = 24M RUs

That’s ~504M RUs/month, peaking at maybe 10× the average per second. So 50,000 peak RU/s. On autoscale max 50,000 (floor 5,000), the bill lands around $2,200/month.

Now do the same exercise for one bad cross-partition query running 100 times/day at 30 RUs each — that’s 90,000 RUs/month. Trivial. Now make it 100,000 times/day — that’s 90M RUs/month, bigger than your write traffic. This is how a cost regression sneaks in.

The five biggest cost levers

In rough order of impact:

Right-size indexing. Default policy indexes everything. If you only query 5 paths but write 50, you’re paying 10× too much per write. Lesson V07 has the playbook.
Pin queries to a single partition. A WHERE partitionKey = '...' AND ... query is 10–30× cheaper than the same query without the key.
Don’t hydrate full docs when you only need a few fields. A query with SELECT c.id, c.name scans the same as SELECT * but is cheaper to deserialize.
Batch reads when you can. A single point-read of one doc is 1 RU. A batch operation reading 10 docs is also ~10 RUs but one network round-trip — the savings is in latency, not RUs, but it matters.
Use the right pricing mode. Switching from provisioned to autoscale saves 30 %+ on workloads with peak/avg > 1.5.

Where this lesson connects

V02 Partitioning — partition design is RU design. Hot partitions throttle even when the container as a whole is under-provisioned.
V06 Querying — write queries that index-scan instead of doc-scan.
V07 Indexing — the single biggest write-cost lever.
V10 Monitoring — read the x-ms-request-charge header religiously; alert on RU consumption percentile by operation type.

If you remember three things from this lesson —

Read the request-charge header during development. Every PR should know what the RUs are.
Index only what you query.
The partition key is also a cost decision, not just a data-modeling one.

— you’ll spend a fraction of what most teams do.

🎯 Common questions

Q1. How do I see the actual RU cost of an operation? ▾

Every Cosmos response includes a header `x-ms-request-charge`. The SDKs surface it as `response.RequestCharge` (.NET / Java / Node). In the Data Explorer, run a query and look at the **Query Stats** tab. Build a habit of glancing at it during development — it's the difference between learning costs and discovering them in the bill.

Q2. When should I use Provisioned vs Autoscale vs Serverless? ▾

Provisioned — steady-state production workloads with predictable traffic. Cheapest per RU at high utilization. Autoscale — workloads that spike (e.g. business hours, end of month). You set a max; Cosmos scales between max/10 and max as needed. Costs 1.5× provisioned per RU but eliminates 10× spikes from throttling. Serverless — dev/test, spiky low-traffic apps, prototypes. No floor charge, you only pay for actual operations. Free up to 1 million ops/month if usage is low.

Q3. What's the difference between RU/s at the database level vs container level? ▾

Database-level RU/s — a shared pool that all containers in the database draw from. Cheaper when you have many low-traffic containers (one consolidated 4,000 RU/s pool vs ten containers at 400 RU/s each). Container-level RU/s — dedicated allocation per container. Better isolation, predictable per-tenant costs. The standard pattern for SaaS is database-level for shared/system containers, container-level for per-customer or hot containers.

Q4. I'm getting HTTP 429 errors. What do they actually mean? ▾

429 = the partition exceeded its RU/s allocation in the current second. Cosmos throttles you with a `Retry-After` header (typically 50–500 ms). The SDKs auto-retry with backoff up to a configured limit. Sustained 429s mean either (1) you're under-provisioned, (2) hot partition skew, or (3) a query plan that's scanning more docs than expected. Lesson V10 covers how to diagnose which.

Q5. How do indexed paths affect write cost? ▾

Cosmos's default policy indexes every JSON path, which is convenient but expensive on writes — every indexed path generates an index entry, and each entry costs RUs. A document with 50 fields can cost 7 RUs to write; the same doc with only 5 indexed paths costs 4 RUs. For write-heavy containers (telemetry, event logs, audit trails), write a custom index policy that only indexes the paths you query. Lesson V07 walks through it.

Q6. What's the smallest RU/s I can provision? ▾

400 RU/s for a single-region, fixed-throughput container. That's about $24/month at the standard rate. Autoscale starts at a 100 RU/s floor (1,000 max). Serverless has no floor — pay per request, up to a hard cap of 5,000 RU/s per container.