Cosmos's metrics live in three places — the portal's Insights blade for a quick overview, Diagnostics Logs for per-request detail (you must turn this on), and the SDK's per-call Diagnostics for in-app instrumentation. The three signals that matter — **429 throttling rate**, **P99 latency**, and **normalized RU consumption**. Get those on a dashboard before anything else.
- ▸Turn on Diagnostics Settings → Send to Log Analytics. Without this, you can't query historical request data.
- ▸429s aren't errors — they're the SDK retrying. The signal is the *rate* of 429s relative to total requests, and the SDK retry budget.
- ▸Normalized RU consumption is the % of your provisioned RU/s used over a 1-min window. > 70% sustained is a scale-up signal; spikes to 100% are usually fine.
- ▸The "Top Queries" view in Insights shows you which queries spend the most RUs. That's where indexing tuning starts.
- ▸Per-request Diagnostics from the SDK is the gold-standard troubleshooting tool — partition-by-partition cost, retry timeline, network breakdown.
You can run a Cosmos workload for months without monitoring. You’ll never know when it starts to break — only that customers are complaining. Set up monitoring before you ship, and the operational story shifts from forensic to proactive.
The three places metrics live
1. Portal → Insights (free, instant, shallow) A built-in dashboard. Throughput, latency, request count, top queries, top operations. Good for “is anything obviously wrong right now.” Not queryable, not historical past 30 days.
2. Diagnostics Logs (per-request, queryable, historical)
You must turn this on — Settings → Diagnostic Settings → Add diagnostic setting → send DataPlaneRequests, QueryRuntimeStatistics, PartitionKeyStatistics to Log Analytics. After that, you can query everything in KQL.
3. SDK Diagnostics (per-call, in-app)
Every Cosmos response includes a Diagnostics object — RU breakdown per partition, retry timeline, sub-millisecond timings. Best for live troubleshooting and unit-level insight.
The three signals that matter
429 rate
A 429 means the SDK exceeded your RU budget and is retrying. The metric to alarm on isn’t 429 count — it’s the ratio of 429s to total requests and whether the SDK retry budget is being consumed.
AzureDiagnostics
| where TimeGenerated > ago(1h)
| where Category == "DataPlaneRequests"
| summarize total = count(), throttled = countif(statusCode_s == "429") by bin(TimeGenerated, 1m)
| extend pct = 100.0 * throttled / total
5% sustained = under-provisioned. < 1% = fine, the SDK absorbs it transparently.
P99 latency
P50 hides everything. The user pain is in the tail. For point reads, expect < 10 ms P99 in-region. For queries, < 50 ms single-partition. If P99 walks away from P50, something specific is slow — a hot partition, a missing index, or a cross-partition fan-out.
Normalized RU consumption
This is the capacity signal — 0–100% of your provisioned RU/s, averaged per minute, per physical partition. Important — it’s a max across partitions, not an average. One hot partition at 100% drags this up while every other sits at 5%.
AzureMetrics
| where MetricName == "NormalizedRUConsumption"
| summarize max(Maximum) by bin(TimeGenerated, 1m)
Sustained > 70% → consider autoscale, more RU/s, or partition-key surgery.
Reading SDK Diagnostics
In .NET — response.Diagnostics.ToString() returns a giant JSON object. The fields that matter:
- TotalRequestCharge — RUs consumed.
- ContactedReplicas — list of partition replicas touched.
- RetryContext — every retry attempt, with reason and delay.
- ClientSideRequestStatistics → StoreResponseStatistics — per-replica latency and status code.
If a request is slow, paste the Diagnostics into a tool, expand the timeline. You’ll see exactly where time went — DNS, TLS, queueing, backend processing, retries.
Top queries view
Insights → “Top Queries by Average RU.” Sort the leaderboard. The top 10 are typically:
- Queries missing the partition key (cross-partition fan-outs)
- ORDER BY without a range/composite index
- COUNT(*) over many docs
- SELECT * on large documents
Each is fixable in lessons V02, V06, V07. Monitoring is what tells you which one to fix first.
What to alert on
Three alerts, no more:
- 429 rate > 5% for 5 minutes (warning), > 15% (page).
- P99 latency > 200ms for 10 minutes for read operations.
- Container availability < 99.9% (Cosmos’s published SLA — drops below this only on real incidents).
Skip the rest. Alert fatigue is real, and Cosmos’s SLAs cover most of what you’d otherwise watch for.
Q1. My P99 latency spiked but my P50 is fine. What's the most likely cause? ▾
Cross-partition queries on a small subset of requests, or hot-partition contention. Pull the Diagnostics for the slow requests — if they show many partitions touched, it's a query problem (lesson V06). If they show one partition with high `BackendLatency`, it's a hot key (lesson V02).
Q2. 100% normalized RU consumption — should I be alarmed? ▾
A spike, no — that means you used what you provisioned. Sustained 100% with growing 429 rate, yes — you need autoscale, more RUs, or to find what's misbehaving. The metric to watch is "PhysicalPartitionThroughputInfo" — sometimes one partition is hot while the rest are idle.
Q3. How do I find the actual slow query? ▾
In Log Analytics, query CDBQueryRuntimeStatistics for the time window. Sort by `requestCharge` desc. The top 10 queries usually reveal the culprits — a forgotten cross-partition COUNT, a missing composite index, a SELECT * on a 200 KB document.
Comments 0
Discuss this page. Markdown supported. Be kind.