Back to blog

The Outbox pattern: reliable event publishing

Backend Architecture Notes: commit the event as data, publish it later


In event-driven systems, one of the most common problems is not consuming events.

It is publishing them reliably.

At first, publishing an event looks simple.

A service updates its database and then publishes an event.

For example:

Create order in database
Publish OrderCreated event

That looks fine.

But there is a hidden problem.

What happens if the database update succeeds, but publishing the event fails?

Now the order exists, but the rest of the system never hears about it.

That is the problem the Outbox pattern solves.

The basic problem

Imagine an Order Service.

When a customer places an order, the service needs to do two things:

1. Save the order
2. Publish OrderCreated

A naive implementation might look like this:

save order to database
publish OrderCreated to message broker

But there is a failure window between those two steps.

The database write can succeed.

Then the service can crash before publishing the event.

Or the broker can be unavailable.

Or the network can fail.

The result is bad:

The order exists
No OrderCreated event was published
Payment does not start
Inventory is not reserved
Confirmation email is not sent

The system is now inconsistent.

The source service knows something happened, but the rest of the system does not.

Reversing the order does not fix it

You might think:

Publish the event first
Then save the order

But that creates the opposite problem.

publish OrderCreated
save order to database

What happens if the event is published successfully, but the database write fails?

Now consumers may react to an order that does not exist.

For example:

Payment Service starts payment
Inventory Service reserves stock
Email Service sends confirmation

But the Order Service never actually saved the order.

That is also bad.

Changing the order of the two operations does not remove the problem.

The real issue is that the database and the message broker are two different systems.

There is no single transaction covering both.

The dual-write problem

This is often called the dual-write problem.

A service needs to write to two places:

Database
Message broker

If both writes succeed, everything is fine.

But if one succeeds and the other fails, the system can end up in an inconsistent state.

For example:

Database write succeeds
Broker publish fails

or:

Broker publish succeeds
Database write fails

Distributed systems are full of these small failure windows.

The Outbox pattern is a way to reduce one of the most common ones.

The idea of the Outbox pattern

The Outbox pattern changes the flow.

Instead of writing to the database and publishing directly to the broker, the service writes the business change and the event to the same database transaction.

For example:

Begin transaction
Insert order
Insert OrderCreated event into outbox table
Commit transaction

Now the order and the event record are stored atomically.

Either both are saved, or neither is saved.

The service is not trying to update the database and broker in one fragile step.

It only writes to its own database.

A separate publisher process later reads from the outbox table and publishes the event to the message broker.

The flow becomes:

1. Save business data and outgoing event in one database transaction
2. Outbox publisher reads unpublished events
3. Publisher sends events to broker
4. Publisher marks events as published

That is the core pattern.

A simple example

Imagine the Order Service receives a request to create an order.

Inside one database transaction, it does this:

Begin transaction

Insert into orders:
    order_id = order_123
    status = PendingPayment

Insert into outbox:
    event_id = evt_789
    event_type = OrderCreated
    aggregate_id = order_123
    payload = { orderId: "order_123" }
    published_at = null

Commit transaction

After the transaction commits, the order exists and the event is safely stored.

If the service crashes immediately after the commit, the event is not lost.

The outbox publisher can still find it later.

Select unpublished events from outbox
Publish OrderCreated to broker
Mark event as published

This makes event publishing much more reliable.

Why this works

The Outbox pattern works because it uses the database transaction you already trust.

The service owns its database.

It can atomically save:

The state change
The intent to publish an event

The event is not published directly inside the same transaction.

Instead, the event is recorded durably.

That durable record becomes the source for publishing.

This removes the dangerous gap where the business change succeeds but the event disappears.

The outbox table

An outbox table can be simple.

For example:

outbox_events
- event_id
- aggregate_type
- aggregate_id
- event_type
- event_version
- payload
- created_at
- published_at
- publish_attempts
- last_error

The exact fields depend on the system.

Useful fields include:

event_id
event_type
payload
created_at
published_at
retry_count
last_error
correlation_id
causation_id

The important part is that unpublished events are easy to find.

For example:

published_at is null

or:

status = Pending

Then a background worker can publish them.

The publisher process

The outbox publisher is responsible for sending events to the broker.

It usually runs in a loop:

Find unpublished outbox events
Publish event to broker
Mark event as published

In production, this needs more care.

For example:

How many events are fetched at once?
How are events locked while publishing?
What happens if publishing fails?
How are retries handled?
How are poison events detected?
How is ordering handled?
How do we avoid two publishers sending the same event at the same time?

The pattern is simple.

The implementation still needs to be designed properly.

The publisher can still publish duplicates

The Outbox pattern does not guarantee that an event is published exactly once.

That is important.

Imagine this flow:

Publisher reads outbox event
Publisher sends event to broker successfully
Publisher crashes before marking event as published

When the publisher restarts, it sees the event as unpublished.

So it publishes the event again.

This means consumers may receive duplicates.

That is why the Outbox pattern does not remove the need for idempotent consumers.

It solves reliable publishing.

It does not solve duplicate processing everywhere.

The correct mindset is:

Outbox prevents lost events after database commits.
Idempotency protects consumers from duplicate events.

You usually need both.

Marking events as published

After publishing an event, the publisher marks it as published.

For example:

published_at = now()

or:

status = Published

But this is another small failure window.

The broker publish can succeed, and then updating the outbox row can fail.

That is how duplicate publishing can happen.

This is acceptable if consumers are idempotent.

Trying to completely eliminate that duplicate window is often more complicated than accepting duplicates and handling them safely.

Polling vs transaction log tailing

There are different ways to publish outbox events.

The simple approach is polling.

A background worker regularly queries the outbox table:

Select events where published_at is null

This is easy to understand and often good enough.

The downside is that it adds query load and may introduce a small delay.

Another approach is transaction log tailing.

Instead of polling the table, a tool reads database changes from the transaction log and publishes them.

This is often used in Change Data Capture setups.

For example, a connector may read changes from the database log and stream outbox rows to a broker.

This can be powerful, especially at scale, but it adds infrastructure complexity.

The basic design idea stays the same:

The business change and event record are committed together.
Something reliable publishes the event afterwards.

Why not publish inside the transaction?

A common mistake is to publish to the broker inside the database transaction.

For example:

Begin transaction
Insert order
Publish OrderCreated
Commit transaction

This may look safe, but it still has problems.

What if the broker publish succeeds, but the database transaction later rolls back?

Then consumers may react to an event for data that was never committed.

Also, long-running transactions that include external calls can cause performance and locking problems.

A database transaction should generally protect database state.

Calling external systems inside that transaction often creates new failure modes.

The outbox avoids this by storing the event as data first.

The external publish happens after the transaction commits.

Outbox and event schemas

The outbox stores the event payload.

That means the payload should be treated like a contract.

For example:

{
  "eventId": "evt_789",
  "eventType": "OrderCreated",
  "eventVersion": 1,
  "occurredAt": "2026-06-23T10:00:00Z",
  "correlationId": "corr_456",
  "orderId": "order_123",
  "customerId": "customer_456",
  "totalAmount": 49.99,
  "currency": "EUR"
}

It is useful to include metadata such as:

eventId
eventType
eventVersion
occurredAt
correlationId
causationId
producer

This makes the event easier to trace, version, and debug.

The outbox is not just a transport detail.

It becomes part of the event publishing contract.

Ordering considerations

The Outbox pattern can help with ordering, but it does not magically solve all ordering problems.

If events for the same aggregate are written to the outbox in the same database, they will usually have a clear creation order.

For example:

OrderCreated
OrderPaid
OrderConfirmed

The publisher can publish them in that order.

But once events go through a broker, ordering depends on the broker, topic, partitioning, consumer groups, retries, and processing behavior.

If ordering matters, you still need to design for it.

Common techniques include:

Use aggregate ID as partition key
Include version numbers
Include sequence numbers
Validate state transitions in consumers
Make consumers tolerate late or duplicate events

The outbox gives you a reliable source of events.

It does not remove all distributed ordering issues.

Outbox and sagas

The Outbox pattern is especially useful in sagas.

A saga step often does two things:

Update local state
Publish an event that moves the process forward

For example, the Inventory Service receives a command:

ReserveInventory

It reserves stock and needs to publish:

InventoryReserved

If the reservation is stored but the event is lost, the saga gets stuck.

The outbox helps prevent that.

Begin transaction
Store inventory reservation
Insert InventoryReserved into outbox
Commit transaction

Now the saga can continue once the outbox publisher publishes the event.

Again, the consumer still needs idempotency because the event may be published more than once.

Outbox and the Inbox pattern

The Outbox pattern is often paired with the Inbox pattern.

The Outbox pattern helps producers publish events reliably.

The Inbox pattern helps consumers process messages reliably and idempotently.

A consumer may store incoming message IDs or business operation IDs before applying changes.

For example:

Begin transaction
Insert messageId into inbox table
Apply business update
Commit transaction

If the same message arrives again, the insert fails because the message was already processed.

So the duplicate can be skipped.

Together, the patterns look like this:

Producer:
    database change + outbox event in one transaction

Consumer:
    inbox record + business change in one transaction

This is a common approach when correctness matters.

Failure handling in the publisher

The publisher needs its own failure handling.

For temporary failures, retry with backoff.

For repeated failures, store the error and alert.

For invalid events, move them into a failed state instead of retrying forever.

Useful fields:

status
retry_count
next_retry_at
last_error
locked_until
published_at

This allows the publisher to avoid retry storms.

It also gives operators visibility into what is stuck.

Bad outbox implementations can silently accumulate unpublished events.

That is dangerous.

The outbox should be monitored like any other critical queue.

Observability

The Outbox pattern needs observability.

Useful metrics include:

Number of unpublished events
Oldest unpublished event age
Publish success rate
Publish failure rate
Retry count
Publishing delay
Events stuck in failed state
Duplicate publish count

The most important metric is often age.

If there are 10,000 unpublished events but they are only a few seconds old, the system may simply be under load.

If there is one unpublished event that is three hours old, that may be a serious problem.

You want to know whether the outbox is draining.

Reliable publishing is only useful if you can see when it stops working.

When should you use the Outbox pattern?

I would consider the Outbox pattern when a service needs to publish events based on committed database changes.

Especially when the event is important for downstream business processes.

Good examples:

OrderCreated
PaymentSucceeded
InventoryReserved
SubscriptionCancelled
RefundIssued
InvoiceGenerated
BonusGranted
PlayerBalanceUpdated

If losing the event would break a business workflow, the outbox is worth considering.

For low-value telemetry or non-critical notifications, a simpler approach may be acceptable.

Architecture should match the importance of the data.

What the Outbox pattern does not solve

The Outbox pattern is useful, but it is not magic.

It does not guarantee:

Exactly-once end-to-end processing
No duplicate messages
Perfect ordering across the whole system
Automatic schema compatibility
Correct business compensation
Consumer idempotency

It solves a specific problem:

How do we reliably publish an event after a local database change?

That is already a big problem.

But it is only one part of a reliable event-driven system.

The interview version

If I had to explain the Outbox pattern in an interview, I would say:

The Outbox pattern solves the problem of updating a database and publishing an event reliably. If a service saves data and then publishes an event, the database write can succeed while the publish fails. Then the rest of the system never learns about the change.

With the Outbox pattern, the service writes both the business change and the outgoing event to an outbox table in the same database transaction. A separate publisher process reads unpublished events from the outbox and sends them to the message broker.

This prevents events from being lost after the database transaction commits. However, the publisher may still publish the same event more than once, for example if it crashes after publishing but before marking the event as published. So consumers still need to be idempotent.

In production, I would monitor unpublished event count, oldest unpublished event age, retry counts, and publish failures. The outbox gives reliable publishing, but it still needs operational visibility.

Final thought

The Outbox pattern is one of those patterns that looks boring until you have seen the bug it prevents.

Without it, there is a small but dangerous gap between saving data and publishing an event.

Most of the time, that gap is invisible.

Then one day a service crashes at exactly the wrong moment.

An order exists, but payment never starts.

Inventory is reserved, but the saga never continues.

A subscription is cancelled, but billing never hears about it.

The Outbox pattern closes that gap by turning the event into durable data first.

Save the business change.

Save the event that describes it.

Commit both together.

Publish later.

That small shift makes event-driven systems much more reliable.

This post is part of my Backend Architecture Notes series. In the next post, I will look at retries, backoff, and dead-letter queues, because reliable publishing and consuming also need a clear strategy for failure.