Cosmos has no `BEGIN TRANSACTION`. What it has instead is **transactional batch** — atomic up to 100 operations, all within the same partition key. For multi-document workflows that span partitions, you fall back to compensating transactions or stored procedures. The good news — point reads are 1 RU and faster than any SQL DB.
- ▸A point read (id + partition key) is 1 RU and < 10 ms — the cheapest, fastest op in Cosmos.
- ▸Use upsert when you don't care if the doc exists; use create + replace when you do.
- ▸Transactional batch — up to 100 ops, must share a partition key, all-or-nothing, < 2 MB total.
- ▸Stored procedures run server-side in JS, also single-partition, useful for read-then-write atomicity.
- ▸Cross-partition transactions don't exist. If you need them, your model is wrong (revisit lesson V03).
After modeling and partitioning, the day-to-day stuff — read a doc, write a doc, update a doc. Cosmos’s CRUD surface looks similar to a SQL database at first, but the rules underneath are different. The big surprise — there are no cross-partition transactions, and that constraint shapes everything else.
The four basic operations
| Op | RU cost | Notes |
|---|---|---|
| Point read | 1 RU | id + partition key, fastest path |
| Query | 2.5+ RUs | Indexed, can fan out across partitions |
| Create | ~5–7 RUs (1 KB doc) | Fails if id already exists |
| Replace | ~5–7 RUs | Fails if doc doesn’t exist |
| Upsert | ~5–7 RUs | Create-or-replace, idempotent |
| Patch | ~10 RUs | Partial update — change one field, don’t rewrite the doc |
| Delete | ~5 RUs |
Patch is underrated. A 1 MB doc with a small views counter — incrementing via replace rewrites all 1 MB; via Patch you only touch the counter. Same RUs paid, but no read-modify-write race.
Transactional batch
Cosmos’s “transaction” primitive. Up to 100 operations, all within the same logical partition (same partition-key value), all-or-nothing.
container.execute_item_batch([
("create", (order, {})),
("patch", (cart_id, {"partition_key": user_id}, [{"op": "set", "path": "/state", "value": "submitted"}])),
("create", (audit_event, {})),
], partition_key=user_id)
If any op fails (validation error, etag conflict, throttle), all are rolled back. You get a list of BatchOperationResult — success/failure per op.
Constraints:
- Same partition key for every op
- Total payload < 2 MB
- 100 ops max
- No queries inside the batch — just CRUD ops
This is the primitive for “place order, decrement inventory, log audit” within a single user/tenant scope.
Stored procedures (use sparingly)
Server-side JavaScript. Runs inside the partition replica, sees up-to-the-microsecond data, no network hops. Good for read-then-write atomicity that batch can’t express (e.g. “find the doc with the lowest score, increment it”).
function incrementScore(id) {
const collection = getContext().getCollection();
collection.queryDocuments(collection.getSelfLink(),
`SELECT * FROM c WHERE c.id = "${id}"`,
(err, docs) => {
if (err || !docs.length) throw new Error('not found');
docs[0].score++;
collection.replaceDocument(docs[0]._self, docs[0]);
});
}
The catches — JavaScript-only, hard to debug, single-partition only, and the runtime kills your sproc at 5 seconds. Most teams find batch + Patch cover everything stored procs would.
Cross-partition writes — the hard case
Sometimes you genuinely need to write across partitions atomically. Two patterns:
Saga — break the workflow into independently idempotent steps with compensating actions. Service A creates the order; if Service B’s payment fails, Service A’s compensator cancels the order. Frameworks — Temporal, Dapr, your own state machine.
Outbox — write the cross-partition intent as an event in the same partition as the primary write. A separate consumer (Change Feed, lesson V11) reads the outbox and applies the secondary write with retries. Eventual, but reliably eventual.
If you’re doing this often, your model is probably wrong — re-read lesson V03 and look for a partition key that keeps related data together.
Idempotency, the unsung hero
Cosmos retries on its own — network blips, throttling, transient failures. The SDK retries the same write up to 9 times by default. If your write isn’t idempotent, retries can produce duplicates.
Two cheap habits — always use upsert instead of create when business logic permits, and always pass an explicit id (don’t let the SDK generate one). Then a retry of the same operation produces the same result.
Q1. What's the difference between upsert and replace? ▾
Replace requires the document to exist — fails with 404 otherwise. Upsert inserts if missing, replaces if present. Use replace when "this should already exist" is part of your business logic; use upsert when you're idempotent by design (event handlers, syncs).
Q2. How do I implement an atomic counter? ▾
Three options — (1) a stored procedure that does read-modify-write server-side, (2) a transactional batch with a Patch op (`$inc`), (3) for high-write counters, multiple bucket documents that you sum at read time (avoids the single-partition write hot-spot).
Q3. Can I get a SQL-style transaction across two partition keys? ▾
No. The recommended pattern is the **saga** — break the workflow into steps, each idempotent and reversible. If step 3 fails, run the compensating actions for steps 1 and 2. More work than `BEGIN/COMMIT`, but it's the price of horizontal scale.
Comments 0
Discuss this page. Markdown supported. Be kind.