rponte/avoid-distributed-transactions.md

Last active March 24, 2026 14:31

Star (17) You must be signed in to star a gist
Fork (3) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/rponte/9477858e619d8b986e17771c8be7827f.js"></script>
Save rponte/9477858e619d8b986e17771c8be7827f to your computer and use it in GitHub Desktop.

Download ZIP

THEORY: Distributed Transactions and why you should avoid them (2 Phase Commit , Saga Pattern, TCC, Idempotency etc)

Raw

avoid-distributed-transactions.md

Distributed Transactions and why you should avoid them

Modern technologies won't support it (RabbitMQ, Kafka, etc.);
This is a form of using Inter-Process Communication in a synchronized way and this reduces availability;
All participants of the distributed transaction need to be avaiable for a distributed commit, again: reduces availability.

Implementing business transactions that span multiple services is not straightforward. Distributed transactions are best avoided because of the CAP theorem. Moreover, many modern (NoSQL) databases don’t support them. The best solution is to use the Saga Pattern.

[...]

One of the most well-known patterns for distributed transactions is called Saga. The first paper about it was published back in 1987 and has it been a popular solution since then.

There are a couple of different ways to implement a saga transaction, but the two most popular are:

Events/Choreography: When there is no central coordination, each service produces and listen to other service’s events and decides if an action should be taken or not;
Command/Orchestration: when a coordinator service is responsible for centralizing the saga’s decision making and sequencing business logic;

Author

rponte commented Dec 31, 2024

How Complex Systems Fail

5. Complex systems run in degraded mode.
A corollary to the preceding point is that complex systems run as broken systems. The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws. [...]

16. Safety is a characteristic of systems and not of their components
Safety is an emergent property of systems; it does not reside in a person, device or department of an organization or system. Safety cannot be purchased or manufactured; it is not a feature that is separate from the other components of the system.

Author

rponte commented Jan 3, 2025 •

edited

Loading

Outbox Pattern - by Unico

The interesting part is they use Protobuf as a content type when sending events to the broker. Still, for some reason that's unclear in the article, they serialize this Protobuf data into JSON format before persisting it in the outbox table. I guess they do so because they use Debezium under the hood.
They also use the CloudEvents (v1.0.2) spec for defining the format of event data;

This is the Protobuf message using the CloudEvent spec:

syntax = "proto3";
import "google/protobuf/timestamp.proto";
import "google/protobuf/any.proto";  
message OutboxEvent {
  string specversion = 1;  
  string type = 2;  
  string source = 3;  
  string subject = 4;  
  string id = 5;  
  google.protobuf.Timestamp time = 6;  
  string datacontenttype = 7;  
  string dataschema = 8;  
  google.protobuf.Any data = 9;
}

And this is an example:

{
  "specversion": "1.0",
  "type": "someevent",
  "source": "integration",
  "subject": "1ec07712-79b7-485a-a0e2-0a1c33fd1016",
  "time": "2020-04-30T04:00:00Z",
  "datacontenttype": "application/json",
  "dataschema": "http://<schemapath>",
  "data": {
    "transactionId": "1ec07712-79b7-485a-a0e2-0a1c33fd1016",
    "doc": "123.123.123-00",
    "image_id": "ea02254f-28f4-4b31-99a5-957bb024f78d"
  }
}

Author

rponte commented Jan 16, 2025

Fidelis blog: System Design - Saga Pattern 🇧🇷 - artigo sobre Saga e Outbox Pattern escrito pelo Matheus Fidelis.

Author

rponte commented Feb 7, 2025

Do Caos a Consistência: A Ordem das Mensagens em Sistemas Distribuídos

Author

rponte commented Mar 26, 2025

River's blog: Building an idempotent email API with River unique jobs

Author

rponte commented May 3, 2025 •

edited

Loading

⭐️ (slides) Definition of Insanity: timeouts, retries and idempotency - by Sam Newman

Author

rponte commented May 3, 2025 •

edited

Loading

Thread on Twitter (X) by Qian Li:

Durable workflow timeouts

Timeouts are essential for building efficient and resilient systems. They help prevent systems from waiting indefinitely and free up resources while maintaining responsiveness under heavy load.

For example, suppose your server must finish a task within 30 minutes, but some operations are taking much longer to complete. Even if they eventually succeed, the response will still miss the deadline — wasting resources in the process. In such cases, proactively cancelling on timeout is the right choice.

DBOS docs: Workflow Timeouts

Author

rponte commented Jul 29, 2025 •

edited

Loading

Delivery semantics explained from the producer and consumer perspectives in Kafka: Kafka Message Delivery Guarantees

At most once: Messages are delivered once, and if there is a system failure, messages may be lost and are not redelivered.
At least once: This means messages are delivered one or more times. If there is a system failure, messages are never lost, but they may be delivered more than once.
Exactly once: This is the preferred behavior in that each message is delivered once and only once. Messages are never lost or read twice even if some part of the system fails.

Author

rponte commented Sep 7, 2025 •

edited

Loading

The most interesting part of how Dapr Outbox feature works is related to step 2: Dapr publishes an internal event BEFORE persisting the state and marker into the databases:

Author

rponte commented Sep 8, 2025

⭐️ Distributed transaction patterns for microservices compared

Author

rponte commented Sep 8, 2025

JavaZone: Ins and Outs of the Outbox Pattern - by Gunnar Morling

Author

rponte commented Sep 30, 2025

Implementing the Outbox Pattern - by Milan Jovanović

Author

rponte commented Oct 16, 2025

Microservices, clearing up the definitions -- by Andras Gerlits

For a software to be predictable, we need to make sure that single events from the client’s perspective are reflected as such throughout the whole system.

This means that if the client asks for a change, all its facets need to be accepted or rejected by the system as a single package, we can’t pick and choose which aspects to do or not to do, unless we have an intuitive way to prompt the user about what we have failed to achieve and that fault is translated back to the client. It’s easy to see that if we allow for such failures, we must design the corresponding 'translation' to the end user, and that this means understanding our client’s competence level with regards to that system. In other words, we can shift some of the responsibility on the end user, but only the ones we can expect them to manage appropriately and by providing them the right tools.

Author

rponte commented Oct 20, 2025 •

edited

Loading

⭐️ Spring Outbox | Github repository | LinkedIn Annoucement

Spring Outbox is a minimal-configuration Spring Boot library for reliably publishing domain events using the Outbox Pattern.

It works out of the box: you just add the dependency, enable the outbox, and provide a OutboxRecordProcessor bean. The library handles storing, processing, and retrying events automatically, so you can focus on your business logic instead of wiring infrastructure.

Author

rponte commented Nov 18, 2025

How to implement the Outbox pattern in Go and Postgres

Author

rponte commented Nov 24, 2025 •

edited

Loading

É possível fazer Event-Driven Architecture (EDA) sem um broker?

Author

rponte commented Nov 24, 2025

⭐️ The Write Last, Read First Rule: Keeping systems in synch

Author

rponte commented Nov 24, 2025 •

edited

Loading

⭐️ Building a Durable Execution Engine With SQLite - by Gunnar Morling

GitHub repository

Author

rponte commented Dec 1, 2025 •

edited

Loading

On Idempotency Key - by Gunnar Morling
A good write-up on Twitter about the Gunnar's article - by Abhishek Singh (@0xlelouch_)

Author

rponte commented Dec 3, 2025

A good description on the difference between Deduplication and Idempotency

🔗 Post on Linkedin by Henri Maxime DemoulinHenri Maxime Demoulin

Many engineers confuse these two: deduplication keys vs idempotency keys. They look similar but solve two completely different problems.

Deduplication keys are used to detect that a unit of work has already been executed and skip the execution entirely.

Their goal is to prevent the work from running twice.

Idempotency keys are used to ensure that even if a unit of work runs twice (or more), the final state is correct and equivalent.

One prevents execution, the other effects.

Deduplication keys have been around for a while, e.g., SQS uses them to ensure only once delivery within 5 minute windows.

Before you ship your next API or background worker, ask yourself: do you need idempotency, deduplication... or both?

Author

rponte commented Dec 30, 2025

Idempotence comes in different shapes - by Dominik Tornow

Idempotence comes in different shapes

Idempotence is the guarantee that repeating a request yields the same outcome (or, more formally, does not change the state of the system beyond the initial application)

In practice, idempotence comes in a few variants, most notably positive and negative idempotence

Positive

Positive idempotence denotes that the system has accepted the request in the past:

I have accepted this request in the past, I will accept the request again, I will apply this request again, nothing changes

-or-

I have accepted this request in the past, I will accept the request again, I will not apply this request again

Negative

Negative idempotence denotes that the system has rejected the request in the past

I have rejected this request in the past, I will reject this request again

Negative idempotence is often harder to guarantee:

When a system accepts a request, the system state changes, the new state is evidence of the past acceptance of the request

When a system rejects a request, the system state may not change, there is no evidence of the past rejection of the request

Author

rponte commented Dec 31, 2025 •

edited

Loading

⭐️ Real idempotence is about resilience to time and change, not just repetition. - by Abhishek Singh

rafaelpontezup commented Feb 4, 2026 •

edited

Loading

Linkedin post: Stop using the "Outbox pattern". It's 2026: you can use Durable Execution instead -- by Henri Maxime Demoulin

Stop using the "Outbox pattern". It's 2026: you can use Durable Execution instead.

want to write to system A and B atomically. But I can't write:

WriteA()

WriteB()

If the process crash between the two, WriteB() is lost forever.

The outbox pattern says:

Write to A and an intent to write to B atomically (e.g., in the same transaction or the same document)

Let another system, a "message relay", eventually send to B.

Eventually, the message relay will pick up on the message and A and B will be in sync.

But consider the downsides:

You need an external polling system

You need to move all the compensation logic (for atomicity) in the message relay

i.e., your code becomes a living nightmare and even Claude will struggle to get it right.

With durable execution, you are guaranteed that your program will eventually complete.

So you can finally write:

WriteA()

WriteB()

And if the process crash between A and B, the durable execution engine will resume execution where it left of, that is, A, and proceed > to writing to B.

Of course, external systems need to be idempotent, because you get at-least-once execution (like the outbox pattern.)

Who's going to defend the outbox pattern here? :)

I liked this comment. Something to think about:

rafaelpontezup commented Feb 4, 2026 •

edited

Loading

Linkedin post: There are at least 7 ways to achieve idempotency -- by Yves Goeleven

A functional core must be idempotent

Idempotence is the property of certain operations whereby the operation can be applied multiple times > without changing the result.

There are at least 7 ways to achieve it

Natural idempotency

Many operations can be designed in a naturally idempotent way, 'Turn On The Light' is a command which > is idempotent by default, it will have the same effect no matter if the lights were on or off to > begin with.

Keep track of prior decisions

Prior to applying a state change to the system, keep track of the decision that you are going to > apply the state change using event sourcing (Light Turned On), so that any subsequent requests to > make the same state change can be derived from prior recorded decisions and ignored.

Side effect checks

Some times a dangerous approach, but often very useful in the real world, is to check for indirect > side effects of a command to determine if a it needs to be performed or not. When the temperature of > the light is over 65C, there is no need to 'Turn On The Light', you've probably done that already (or > your house may be on fire)

Versioning

A special case of the side effect check, applicable when working directly with state, is to add > versioning information to them (like timestamps or sequence numbers) while at the same time also > adding expected version information to any command that intends to alter the target. Whenever the > version information on the target exceeds the version information in the incoming command, you can > discard the command.

Identify and deduplicate

A fourth option is to identify each command with a unique identifier, and keep track of a list of > recently performed commands (command sourcing). If the identifier is on the list, you can discard it.

Partner state machines

Partner state machines are an approach introduced in the legendary paper 'Life Beyond Distributed > Transactions' by Pat Helland. In a conversation between multiple partners, using messaging, each > partner can maintain a state machine for it's communication with any other partner. The state machine > represents the progression of the relationship between them and avoids conflicts by allowing only for > valid state transitions. By checking the state of the conversation, it is possible to ensure each > command is executed only once.

Accept uncertainty

And finally, there is always the option, at least from a business perspective, to live with some > uncertainty and deal with duplicates in a business sense. Very often there are business processes in > place which can correct these scenarios. For example when you performed a duplicate payment, than you > can dedupe it by issuing a credit nota.

I tried to argue with him about the use of transactions as a way to achieve idempotency (comment link):

Author

rponte commented Mar 4, 2026

Twitter: Idempotency Is Not Optional (And Why Most Teams Get It Wrong) - by @devXritesh

Author

rponte commented Mar 24, 2026

LinkedIn post: The outbox pattern exists because application code is not durable. - by Henri Maxime DemoulinHenri Maxime Demoulin (DBOS)

⭐️ How to implement a transactional outbox-like pattern using a DBOS workflow

The outbox pattern exists because application code is not durable.

(Belaboring an obvious point I hope)

The outbox pattern exists to compensate for a fundamental limitation of traditional programs: they can crash and lose the rest of their intent.

Consider a simple workflow: insert a row in a database and send an email. If the process crashes between the two operations, the email may never be sent. The outbox pattern addresses this by persisting the intent explicitly. The program inserts a record into an "outbox" table within the same transaction as the database update.

A separate worker later reads that table and performs the side effect (sending the email). Even if the original process crashes, the intent remains recorded and can eventually be executed.

In other words, the outbox pattern turns control flow into data. The system stores "what still needs to happen" so that another process can continue the work.

Durable workflows approach the same problem differently. Instead of externalizing control flow into an outbox table, the workflow runtime persists the execution state of the program itself. If a process crashes after writing to the database but before sending the email, the workflow engine simply resumes execution from the last durable step. The intent to send the email is already part of the persisted workflow state.

From that perspective, the outbox pattern is a workaround for a missing capability: durable execution of application logic.

p.s.: AI agents need durable workflows as a base capability. Check out DBOS inside Pydantic and LlamaIndex :)

rponte/avoid-distributed-transactions.md

Distributed Transactions and why you should avoid them

rponte commented Dec 31, 2024

Uh oh!

rponte commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Jan 16, 2025

Uh oh!

rponte commented Feb 7, 2025

Uh oh!

rponte commented Mar 26, 2025

Uh oh!

rponte commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Sep 8, 2025

Uh oh!

rponte commented Sep 8, 2025

Uh oh!

rponte commented Sep 30, 2025

Uh oh!

rponte commented Oct 16, 2025

Uh oh!

rponte commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Nov 18, 2025

Uh oh!

rponte commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

É possível fazer Event-Driven Architecture (EDA) sem um broker?

Uh oh!

rponte commented Nov 24, 2025

Uh oh!

rponte commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Dec 3, 2025

Uh oh!

rponte commented Dec 30, 2025

Idempotence comes in different shapes

Positive

Negative

Negative idempotence is often harder to guarantee:

Uh oh!

rponte commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rafaelpontezup commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rafaelpontezup commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Mar 4, 2026

Uh oh!

rponte commented Mar 24, 2026

Uh oh!

rponte commented Jan 3, 2025 •

edited

Loading

rponte commented May 3, 2025 •

edited

Loading

rponte commented May 3, 2025 •

edited

Loading

rponte commented Jul 29, 2025 •

edited

Loading

rponte commented Sep 7, 2025 •

edited

Loading

rponte commented Oct 20, 2025 •

edited

Loading

rponte commented Nov 24, 2025 •

edited

Loading

rponte commented Nov 24, 2025 •

edited

Loading

rponte commented Dec 1, 2025 •

edited

Loading

rponte commented Dec 31, 2025 •

edited

Loading

rafaelpontezup commented Feb 4, 2026 •

edited

Loading

rafaelpontezup commented Feb 4, 2026 •

edited

Loading