Back to blog

Idempotency: the pattern that saves you from duplicate messages

Backend Architecture Notes: assume retries, key by the business operation


In event-driven systems, one of the most important assumptions is this:

The same message can be processed more than once.

That sounds like a small technical detail.

It is not.

If a system processes a payment event twice, a customer may be charged twice.

If a system processes a bonus event twice, a user may receive the same bonus twice.

If a system processes an inventory reservation twice, stock may be reserved twice for the same order.

These are not theoretical problems. They are the kind of bugs that appear when distributed systems meet retries, timeouts, crashes, and real production traffic.

That is why idempotency matters.

What is idempotency?

An operation is idempotent when running it multiple times has the same effect as running it once.

For example, this operation is usually idempotent:

Set order status to Confirmed

If the order is already confirmed, setting it to confirmed again does not change the result.

This operation is usually not idempotent:

Add 50 EUR to account balance

If you run it twice, the account gets 100 EUR instead of 50 EUR.

That difference is critical.

In event-driven systems, consumers should often be designed so that processing the same event multiple times is safe.

Why duplicate messages happen

A common misconception is that a message broker will make sure every message is processed exactly once.

In practice, I usually design as if messages can be duplicated.

Duplicates can happen for many reasons:

A consumer processes a message but crashes before acknowledging it
A network timeout makes the producer retry publishing
A broker redelivers a message after a failed acknowledgement
A consumer times out while the database transaction actually succeeded
A manual replay sends old events through the system again
An external API returns an unclear result and the request is retried

Most of these situations are normal in production.

They are not signs that the architecture is broken.

They are signs that the architecture needs to handle failure correctly.

At-least-once delivery

Many messaging systems are designed around at-least-once delivery.

That means the system tries to make sure a message is delivered.

But the trade-off is that it may be delivered more than once.

This is often better than losing messages, but it shifts responsibility to the consumer.

The consumer must be able to handle duplicates.

For example:

Consumer receives PaymentSucceeded
Consumer updates the order as paid
Consumer crashes before acknowledging the message
Broker redelivers PaymentSucceeded
Consumer receives the same event again

If the handler simply applies the same effect again, the system may become incorrect.

If the handler is idempotent, the second delivery is harmless.

The dangerous version

Imagine this event:

{
  "eventType": "PaymentSucceeded",
  "paymentId": "pay_123",
  "orderId": "order_456",
  "amount": 50,
  "currency": "EUR"
}

A naive consumer might do this:

When PaymentSucceeded is received:
    add 50 EUR to merchant balance
    mark order as paid

The problem is the balance update.

If the same event is processed twice, the merchant balance may be credited twice.

This is a serious bug.

The safer version is not:

Add 50 EUR

The safer version is:

Apply payment pay_123 once

That small change matters.

The operation now has a business identity.

The system can check whether pay_123 has already been applied.

Use business idempotency keys

A good idempotency key represents the business operation, not just the technical message.

Examples:

paymentId
orderId
reservationId
refundId
invoiceId
bonusGrantId
transactionId

For a payment, the key might be:

paymentId = pay_123

For an inventory reservation, it might be:

reservationId = reservation_456

For a bonus grant, it might be:

bonusGrantId = bonus_789

The consumer can store this key when the operation is applied.

Then, if the same event or command arrives again, the consumer can detect that the operation already happened.

If paymentId was already processed:
    skip

Otherwise:
    apply payment
    store paymentId as processed

This is the core idea.

Event ID vs business ID

Many events include an eventId.

For example:

{
  "eventId": "evt_001",
  "eventType": "PaymentSucceeded",
  "paymentId": "pay_123"
}

The eventId is useful.

It helps with tracing, logging, deduplication, and debugging.

But sometimes the business id is more important than the event id.

Why?

Because the same business fact may be published more than once with different event IDs.

For example:

evt_001: PaymentSucceeded for pay_123
evt_002: PaymentSucceeded for pay_123

If you only deduplicate by event ID, the system sees two different events.

But from a business perspective, it is the same payment.

For financial or state-changing operations, I prefer idempotency around the business operation.

The question is not only:

Have I seen this event before?

The better question is often:

Have I already applied this business operation?

Store processed operations

A common implementation is to store processed operation IDs in the database.

For example, a consumer might have a table like this:

processed_messages
- id
- event_id
- operation_id
- processed_at

Or a domain-specific table:

applied_payments
- payment_id
- order_id
- amount
- applied_at

The important part is that the database should enforce uniqueness.

For example:

Unique constraint on payment_id

Then even if two consumers race to process the same payment, only one insert succeeds.

This is important because idempotency should not only exist in application memory.

Memory can be lost.

Multiple consumer instances may run at the same time.

A process can crash and restart.

The database is usually the right place to enforce the guarantee.

Make the check and write atomic

A common mistake is to do this:

Check if payment was processed
If not processed:
    apply payment
    mark payment as processed

That looks fine, but it can fail under concurrency.

Two consumers may process the same message at the same time:

Consumer A checks: not processed
Consumer B checks: not processed
Consumer A applies payment
Consumer B applies payment
Consumer A marks processed
Consumer B marks processed

The payment was applied twice.

To avoid this, the check and the write need to be atomic.

Usually that means using a database transaction and a unique constraint.

For example:

Begin transaction
Insert payment_id into applied_payments
Apply payment effect
Commit transaction

If the insert fails because payment_id already exists, the operation was already applied.

The duplicate can be skipped safely.

Idempotency in commands

Idempotency is not only useful for events.

It is also important for commands.

For example:

AuthorizePayment
ReserveInventory
CreateInvoice
GrantBonus
CancelSubscription

Commands are often retried when there is a timeout.

But a timeout does not always mean the command failed.

It may mean the command succeeded, but the response was lost.

For example:

Client sends AuthorizePayment
Payment Service authorizes payment
Response times out before client receives it
Client retries AuthorizePayment

Without idempotency, the customer may be charged twice.

With an idempotency key, the Payment Service can recognize the retry and return the original result.

idempotencyKey = order_456_authorize_payment

If the same command arrives again with the same key, the service should not perform the operation again. It should return the existing result.

Idempotency in APIs

The same idea applies to HTTP APIs.

For example, creating a payment or placing an order may be protected with an idempotency key.

The client sends:

Idempotency-Key: order_456_payment_attempt_1

The server stores the key with the result of the operation.

If the client retries the same request, the server returns the same result instead of creating a second payment.

This is especially useful for operations where retrying is normal but duplicate execution is dangerous.

Examples:

Create payment
Create order
Create invoice
Grant credit
Start subscription
Issue refund

A retry should not accidentally create a second business operation.

Idempotency and state transitions

Another way to make operations safer is to model state transitions clearly.

For example, an order might move through these states:

PendingPayment
PaymentAuthorized
Confirmed
Cancelled

Then a handler can check whether the transition is valid.

For example:

When PaymentAuthorized is received:
    If order is PendingPayment:
        set order to PaymentAuthorized
    If order is already PaymentAuthorized:
        ignore as duplicate
    If order is Cancelled:
        handle as unexpected or compensation case

This is safer than blindly applying changes.

State machines help because they make duplicate and invalid transitions visible.

They also help with out-of-order events.

If an event arrives that does not make sense for the current state, the system can decide whether to ignore it, retry it later, or send it to manual review.

Idempotency is not always easy

Some operations are naturally idempotent.

For example:

Set status to Cancelled
Create or update projection row by ID
Mark notification as sent
Store latest user profile by user ID

Other operations are harder:

Increment balance
Send email
Call external payment provider
Reserve limited stock
Generate sequential invoice number

These operations need more care.

For example, sending the same email twice may not corrupt data, but it can still be a bad user experience.

Calling an external provider twice may create real financial effects.

Generating invoice numbers may have legal or business constraints.

Idempotency is not only a technical property. It depends on the business impact of doing something twice.

Side effects need attention

Some side effects cannot easily be undone.

For example:

Sending an email
Sending an SMS
Calling a payment provider
Triggering a shipment
Granting external access

For these operations, it is important to record intent and result carefully.

For example:

notification_id = order_456_confirmation_email

Before sending the email, the system can check whether that notification was already sent.

After sending, it records the result.

But even this can be tricky.

What if the email is sent successfully, but the system crashes before recording that it was sent?

The message may be retried and the email may be sent again.

For some side effects, exactly-once behavior is extremely hard.

In those cases, the goal is to make duplicates unlikely, detectable, and acceptable where possible.

For critical effects, you may need provider-level idempotency keys, transactional outbox patterns, or reconciliation jobs.

Idempotency does not remove the need for observability

Even with idempotency, you still need visibility.

You want to know:

How many duplicate messages are being received?
How many events are skipped because they were already processed?
How many idempotency conflicts happen?
Are retries increasing?
Are consumers crashing after processing messages?
Are external APIs timing out?

Duplicate detection should not be completely silent.

A few duplicates may be normal.

A sudden increase may point to a consumer crash loop, broker issue, timeout problem, or producer retry bug.

Idempotency protects correctness.

Observability helps you understand system health.

Replays depend on idempotency

One of the benefits of event-driven systems is the ability to replay events.

For example, you may want to rebuild a projection, recover from a bug, or backfill a new read model.

But replaying events is only safe if consumers are designed carefully.

Some consumers are naturally replayable.

For example:

Build reporting table from events
Rebuild search index
Update analytics projection

Other consumers should not be replayed blindly.

For example:

Send customer email
Charge payment
Create shipment
Grant bonus

Before replaying events, you need to know which consumers are safe to replay and which ones produce external side effects.

Idempotency makes replay safer, but it does not remove the need to think.

Design rule: assume retries

A useful design rule is:

Every important operation should be safe to retry.

That does not mean every operation is easy to make idempotent.

But it means retries should not surprise the system.

For each handler, ask:

What happens if this message is processed twice?
What happens if two consumers process it at the same time?
What happens if the database write succeeds but acknowledgement fails?
What happens if the external API succeeds but the response times out?
What key identifies this business operation?
Where is that key stored?
Is there a unique constraint?

These questions are where the real design happens.

The interview version

If I had to explain idempotency in an interview, I would say:

Idempotency means that processing the same operation multiple times has the same effect as processing it once. It is important in event-driven systems because messages are often delivered at least once, which means consumers can receive duplicates.

I would not assume exactly-once processing across the whole system. A consumer might process a message, update the database, crash before acknowledging it, and then receive the same message again. So handlers need to be safe against duplicates.

A common approach is to use an idempotency key, preferably a business key like paymentId, orderId, reservationId, or bonusGrantId. The consumer stores that key when applying the operation, ideally with a unique constraint and inside the same transaction as the business update.

The important part is to make the operation itself idempotent. Instead of “add 50 EUR”, design it as “apply payment pay_123 once”. That way retries and duplicate messages do not create duplicate business effects.

Final thought

Idempotency is one of the patterns that separates a demo from a production system.

In a demo, every message arrives once.

Every API call returns cleanly.

Every consumer acknowledges at the right moment.

Every external provider responds clearly.

Production is different.

Messages are retried. Consumers crash. Networks time out. APIs return unknown results. People replay events. Multiple workers process messages in parallel.

Idempotency is how you make those situations safe.

It is not just a technical detail. It is a way of protecting the business from duplicate effects.

The question is not whether a message should be processed twice.

The question is what happens when it is.

This post is part of my Backend Architecture Notes series. In the next post, I will look at at-least-once delivery, exactly-once processing, and why “exactly once” is often more complicated than it sounds.