80/20 Of The Week: Idempotency and Resilience

80/20 Of The Week 🗣️ … Idempotency and Resilience in systems (how to use “retry” properly)

I swear these are “simple” to understand the basics of but just sound tough xD

Let’s say an issue occurs in the system and a timeout happens. Client retries. And then… you charge the card twice or you order the same thing from Temu basket twice 😭

That’s why idempotency exists:

Idempotency = same request 100x → same result 1x… that is basically it…

How it works in real systems (simple)

Client sends an Idempotency-Key (usually a UUID) with a “dangerous” request like: POST /payments, POST /orders, POST /transfers… basically FINAL steps of operations like orders or similar.

Server does this:

Look up the key in dedup storage
If it exists → return the same stored response (status + body)
If it doesn’t → process once, then store the result

And yes… dedup storage can literally be just a table. Example: idempotency_keys table (Postgres / MySQL / Oracle)

key (Idempotency-Key)
scope (userId + endpoint)
request_hash (optional, for safety)
response_status, response_body
created_at, expires_at (TTL)

Where do you store it?

DB table 👍 (boring, durable, easiest to reason about)
Redis 👍 (fast + TTL), but you must accept the tradeoffs (if Redis is flushed/restarted and you didn’t persist → duplicates can come back… you can use DB as a fallback tho)

Most teams do:

DB as “source of truth”
Redis as a cache (optional)

Retries are good… until they aren’t.

If you retry:

without timeouts → threads hang and you waste resources
without backoff + jitter → thundering herd / stampede… example, you put timeout as fixed 2 seconds and now you have like 14 threads retrying EXACTLY after 2 seconds so they cause locking (race conditions) or other issues
without max attempts → infinite pain
without circuit breaker → you DDOS your own dependency/service lol

Usual “minimal but solid” combo:

Timeouts (client + server)
Retry with exponential backoff + jitter
Circuit breaker (stop the bleeding)
Bulkhead (isolate thread pools)
Rate limiting (protect the system)

Quick consistency note

The moment you have retries + queues + async workflows, you’re living in “duplicates can happen” land. So your safety net becomes: idempotency + good retry policy + clear consistency expectation (strong vs eventual)

If you build big systems, assume the worst and design for it 😄

ALSO… “idempotent” means: the ability of a system to produce the same outcome, even if the same file, event or message is received more than once… shout out to Google!

Idempotency in Systems