A modern payment is a tightly orchestrated dance across a Gateway, an idempotency key in Redis, a PSP like Stripe, a state machine driven by webhooks, a double-entry Ledger, a database write, a Kafka event, and a nightly reconciliation job — all sub-second, all exactly once.
Use this architecture any time you're moving money or any "must-not-double-execute" operation — refunds, transfers, subscription billing, in-app purchases, even point-redemption systems. The same primitives (idempotency keys, webhooks, ledgers, reconciliation) apply whenever **at-most-once** semantics matter more than throughput.
The full story
You tap Pay on your phone. Within a second your screen shows a green tick. But behind that single tap, eight independent systems just coordinated to move money exactly once, audit it permanently, and keep your phone, the merchant, the bank, and a dozen downstream services in sync.
Here’s what actually happens.
The full architecture at a glance
flowchart TD
U[Buyer's app] -->|POST /payments<br/>+ idempotency key| GW[API Gateway<br/><i>auth · rate-limit</i>]
GW --> PS[Payment Service]
PS <-->|"check / SET<br/>idempotency key"| Redis[(Redis<br/>idempotency cache<br/>24h TTL)]
PS -->|"charge"| PSP[PSP<br/>Stripe / Adyen / Razorpay]
PSP -.async.-> WH[Webhook receiver]
WH --> PS
PS --> DB[(Database<br/>payments table +<br/>ledger entries +<br/>outbox)]
DB -.poll.-> OBP[Outbox publisher]
OBP -->|payment.completed| K{{Kafka}}
K --> N[Notifications]
K --> F[Fraud]
K --> A[Accounting]
K --> An[Analytics]
K --> L[Loyalty]
K --> M[Merchant dash]
Sweep[Stuck-state sweeper] -.poll PSP.-> PS
Recon[Nightly reconciliation] -.compare.-> DB
Recon -.compare.-> PSP
style U fill:#1c2333,stroke:#475569,color:#e7eaf1
style GW fill:#1e3a8a,stroke:#3b82f6,color:#fff
style PS fill:#0e7490,stroke:#06b6d4,color:#fff
style Redis fill:#7e1d1d,stroke:#ef4444,color:#fff
style PSP fill:#581c87,stroke:#a855f7,color:#fff
style WH fill:#581c87,stroke:#a855f7,color:#fff
style DB fill:#0f1320,stroke:#475569,color:#cdd3df
style OBP fill:#9a3412,stroke:#f97316,color:#fff
style K fill:#9a3412,stroke:#f97316,color:#fff
style N fill:#365314,stroke:#84cc16,color:#fff
style F fill:#365314,stroke:#84cc16,color:#fff
style A fill:#365314,stroke:#84cc16,color:#fff
style An fill:#365314,stroke:#84cc16,color:#fff
style L fill:#365314,stroke:#84cc16,color:#fff
style M fill:#365314,stroke:#84cc16,color:#fff
style Sweep fill:#1e3a8a,stroke:#3b82f6,color:#fff
style Recon fill:#1e3a8a,stroke:#3b82f6,color:#fff
1. The request crosses the front door
The buyer’s app sends POST /payments to your API Gateway. Three things in that request matter more than the payment amount:
- A bearer token identifying the buyer.
- An idempotency key — a UUID generated on the client device.
- The PSP-tokenized card (a reference, never the raw PAN).
The gateway authenticates, rate-limits the buyer, and forwards to the Payment Service. So far this looks like any HTTP request. The first interesting thing is the idempotency key.
2. The idempotency check — Redis as the bouncer
Before doing anything expensive, the Payment Service asks Redis a single question:
Three outcomes:
| Redis says | Meaning | Action |
|---|---|---|
nil | Fresh request | SET key "PROCESSING" with a 24h TTL, proceed |
"PROCESSING" | Same request currently in flight | Wait briefly, then return 409 or poll |
| Result blob | Already completed | Return the cached response immediately |
3. The actual money movement — calling the PSP
Now the request goes to the Payment Service Provider — Stripe, Razorpay, Adyen, Braintree. This is the only step that touches real money. Everything else is bookkeeping.
The PSP call is the slowest and most failure-prone step in the whole flow:
- It spans the public internet
- It might involve 3D Secure (an extra browser redirect)
- It can be queued behind issuer bank APIs
- It can return after 60+ seconds
- It can succeed at the bank but fail the response
4. The state machine and the webhook
Every payment moves through a strict state machine:
stateDiagram-v2
[*] --> INITIATED
INITIATED --> PROCESSING: PSP accepts
PROCESSING --> AUTHORIZED: 3DS / issuer OK
AUTHORIZED --> CAPTURED: capture call (sync or auto)
CAPTURED --> SETTLED: end-of-day settlement
INITIATED --> FAILED: PSP rejects
PROCESSING --> FAILED: timeout / decline
AUTHORIZED --> VOIDED: cancel before capture
CAPTURED --> REFUNDED: refund call
FAILED --> [*]
VOIDED --> [*]
REFUNDED --> [*]
SETTLED --> [*]
The PSP doesn’t always tell you the final state in the synchronous response. Many cards (especially with 3DS) only resolve asynchronously via a webhook — POST /webhooks/psp with the final outcome.
When the webhook arrives, the Payment Service:
- Verifies the signature (HMAC of the body using the PSP’s secret).
- Idempotently advances the state — replaying a webhook never re-moves the state machine.
- Writes the new state to the DB and emits an event.
5. The double-entry ledger
While the synchronous flow is happening, the Ledger is doing something accountants would recognize from 700 years ago. Every successful payment produces two equal-and-opposite postings:
| Account | Debit | Credit |
|---|---|---|
| Buyer wallet | $20.00 | |
| Merchant wallet | $20.00 |
This isn’t redundant with the DB write — it’s the truth source for money. A balance row in your DB can drift, get corrupted, or be silently wrong after a bug.
6. The DB write and the Kafka event
The Payment Service now does a transactional write:
BEGIN;
INSERT INTO payments (id, status, amount, ...);
INSERT INTO ledger_entries (..., debit, ...);
INSERT INTO ledger_entries (..., credit, ...);
INSERT INTO outbox (event_type, payload);
COMMIT;
That last outbox insert is the transactional outbox pattern. A separate poller reads outbox rows and publishes them to Kafka.
7. The fan-out via Kafka
Once payment.completed lands in Kafka, every downstream service reacts in parallel:
- Notifications — SMS / email / push to buyer and merchant
- Fraud — async review of the transaction in context with others
- Accounting — adds to the daily settlement file
- Analytics — feeds dashboards and ML training pipelines
- Loyalty / rewards — credits points
- Merchant dashboard — pushes a real-time row update
8. The reconciliation job
Every night, a batch job pulls the previous day’s transactions from the PSP’s reporting API and compares them line-by-line against the local DB. Three buckets fall out:
- Matched → archive, done
- In PSP, not in our DB → we lost a webhook and the sweeper missed it (rare but real). Replay.
- In our DB, not in PSP → we wrote a
CAPTUREDthat the PSP doesn’t have. This is the scary one. Likely a bug. Page on-call.
Why every piece is non-negotiable
A common pushback when designing this for the first time is: “Do I really need all eight services? Can’t I just call Stripe and write a row?”
You can. And you’ll be fine until one of these happens:
The architecture isn’t there because it’s pretty. It’s there because each piece neutralizes a specific class of failure that will happen at scale.
What changes at higher scale
The pattern stays the same. What scales:
- Redis becomes a sharded cluster with per-shard hot keys
- The Payment Service becomes regionally partitioned by buyer
- The Ledger is sharded by account ID, with a global daily settlement
- Kafka becomes multi-cluster with mirrormaker for cross-region failover
- Reconciliation runs hourly instead of nightly, with continuous streaming reconciliation on top
Comments 0
Discuss this page. Markdown supported. Be kind.