The default `new CosmosClient(...)` works in a demo and bites you in production. Three things almost always need attention — make the client a singleton (it pools connections), pick **direct** mode over gateway for latency, and configure retry policies for throttling. Bulk mode is a separate, opt-in flag that 10×s ingestion throughput.
- ▸Singleton CosmosClient. New client = new TCP/SSL handshake fan-out per partition. Reuse one instance per process.
- ▸Direct mode (TCP) beats Gateway mode (HTTPS) for ~30% lower P99 latency. Gateway is fine for short-lived workloads behind a proxy.
- ▸Retry policy — keep the SDK defaults but cap `MaxRetryWaitTimeOnThrottledRequests` so retries don't pile up beyond your timeout budget.
- ▸Bulk mode batches writes per partition transparently — 10× throughput for ingestion jobs but raises tail latency on individual ops.
- ▸Always pass the partition key explicitly on point reads. The SDK can derive it from the doc, but explicit is faster.
The Cosmos SDK has good defaults, but “good enough for the README” and “good enough for production” are different bars. Three SDK choices — singleton, transport mode, retry policy — separate apps that scale cleanly from apps that mysteriously throttle every Friday at 3pm.
Make it a singleton
// Wrong — new client per request
public async Task<Order> Get(string id, string userId) {
using var client = new CosmosClient(connStr); // ← bad
...
}
// Right — one per process
public class OrderRepo {
private readonly CosmosClient _client;
public OrderRepo(CosmosClient client) { _client = client; } // ← inject the singleton
...
}
A CosmosClient is a heavy object — connection pools, partition routing cache, address resolver, telemetry pipeline. Construct it once at app startup, share it everywhere. In ASP.NET Core, register it as a singleton in DI. In a long-running worker, hold a static reference. In a serverless function, instantiate at module load (outside the handler).
Direct vs gateway mode
Two ways to talk to Cosmos:
Gateway mode (HTTPS port 443) — the SDK sends every request to a gateway, which routes to the right partition. One TLS connection, simple to firewall. Adds one extra hop per request — ~5–10 ms.
Direct mode (TCP, dynamic ports) — the SDK fetches a routing table at startup and connects directly to each physical partition. ~30% lower P99 latency. Default in modern SDKs.
Pick gateway when — you’re in a constrained network (corporate proxy, locked-down K8s, restrictive firewall), or you’re at very low QPS (the per-connection overhead doesn’t amortize).
Pick direct everywhere else.
The retry policy
Cosmos returns 429 Too Many Requests when you exceed provisioned RU/s. The SDK auto-retries with exponential backoff. Two settings matter:
new CosmosClientOptions {
MaxRetryAttemptsOnRateLimitedRequests = 9, // default
MaxRetryWaitTimeOnRateLimitedRequests = TimeSpan.FromSeconds(30), // default
}
The danger — if your endpoint timeout is 5 seconds and the SDK is cheerfully retrying for 30, you get cascading timeouts that look like an outage but are really just over-provisioned retries.
Cap MaxRetryWaitTimeOnRateLimitedRequests to about 80% of your endpoint timeout. Past that, fail fast and let the upstream retry policy decide.
Bulk mode for ingestion
Default mode optimizes for low latency on individual operations. Bulk mode optimizes for throughput on many concurrent operations.
var client = new CosmosClient(connStr, new CosmosClientOptions { AllowBulkExecution = true });
Under the hood — the SDK groups ops by partition into ~10 MB batches and ships them in one round trip. Up to 10× the throughput of unbatched ops, especially across many partition keys.
Trade-offs — individual op latency increases (the SDK waits up to ~100ms to fill a batch) and you can’t see per-op error detail until the batch resolves. Use bulk for ETL, migrations, ingestion. Don’t use it for an interactive write path.
Explicit partition keys
Compare:
// Implicit — SDK reads the doc to find the partition-key field
await container.ReadItemAsync<Order>(id, PartitionKey.None); // wrong, fans out
// Explicit — single-partition routed read
await container.ReadItemAsync<Order>(id, new PartitionKey(userId));
The explicit form is required for a real point read (1 RU). PartitionKey.None makes Cosmos search every partition. Even if you “know” the SDK should figure it out from the document body, pass it explicitly — it’s free and makes failures easier to debug.
Connection-string hygiene
The primary key in the connection string is full admin. For production:
- Use AAD RBAC (lesson V13). The SDK supports
DefaultAzureCredential. - If you must use keys, use the read-only secondary key for read services.
- Store keys in Key Vault, not appsettings.json. Rotate quarterly.
Diagnostics and logging
Every operation returns a Diagnostics field. In production:
- Log Diagnostics for any op > 500 ms or any 429.
- Don’t log it on every call — it’s verbose.
- Pipe it to App Insights (lesson V10) so you can correlate with traces.
Q1. Why singleton — what's the cost of multiple CosmosClients? ▾
Each CosmosClient establishes its own connection pool to every physical partition (in direct mode) or to the gateway (in gateway mode). For a container with 10 physical partitions, that's 10+ TCP/TLS handshakes per client at startup. Multiplied across requests in a serverless function, you get ~200ms of cold-start overhead and connection-reset 429s.
Q2. Direct vs gateway — when does gateway win? ▾
When you can't open arbitrary outbound TCP — corporate proxies, some Kubernetes setups, the Azure Functions consumption plan in some configurations. Also for very low QPS workloads where the connection pool overhead of direct mode isn't amortized. Otherwise, direct.
Q3. How do I size the connection pool? ▾
The .NET / Java SDKs auto-size based on observed load. Override only if you see "ConnectionPoolExhausted" errors — typically at very high concurrency on a single client. Increase `MaxRequestsPerTcpConnection` first, then `MaxTcpConnectionsPerEndpoint`.
Comments 0
Discuss this page. Markdown supported. Be kind.