Distributed Transactions

May 20, 2021 · 1097 words · 6 min · Architecture Microservice Transaction

Transactions and Distributed Transactions

Transactions

A transaction is a logical unit of work in a database, composed of a finite sequence of database operations. The database must ensure the atomicity of transaction operations: when a transaction is successful, it means that all operations in the transaction have been fully executed; if the transaction fails, all executed SQL operations are rolled back.

A single-node database transaction has four main properties:

Atomicity: The transaction is executed as a whole. Either all operations within the transaction are executed, or none are executed.
Consistency: The transaction must ensure that the database moves from one consistent state to another. Consistent states mean that the data in the database must satisfy all integrity constraints.
Isolation: When multiple transactions are executed concurrently, the execution of one transaction should not affect the execution of others.
Durability: Changes made by a committed transaction should be permanently stored in the database.

Distributed Transactions

A distributed transaction is a transaction where the participants, transaction-supporting servers, resource servers, and transaction manager are located on different nodes of a distributed system.

With the adoption of microservice architectures, large business domains often involve multiple services, and a business process requires participation from multiple services. In specific business scenarios, data consistency among multiple services must be ensured.

For example, in a large e-commerce system, the order interface typically deducts inventory, reduces discounts, and generates an order ID. The order service, inventory, discount, and order ID are all separate services. The success of the order interface depends not only on local database operations but also on third-party system results. In this case, distributed transactions ensure that all these operations either succeed together or fail together.

In essence, distributed transactions are used to ensure data consistency across different databases.

Use Cases

Typical use cases in e-commerce systems include:

Order Inventory Deduction

When placing an order, operations include generating an order record and reducing product inventory. These are handled by separate microservices, so distributed transactions are required to ensure the atomicity of the order operation.
Third-Party Payments

In a microservice architecture, payment and orders are independent services. The order payment status depends on a notification from the financial service, which, in turn, depends on notifications from a third-party payment service.

A classic scenario is illustrated below:

From the diagram, there are two calls: the third-party payment service calling the payment service, and the payment service calling the order service. Both calls can encounter timeouts. Without distributed transactions, the actual payment status and the final payment status visible to the user may become inconsistent.

Implementation Approaches

Two-Phase Commit (2PC)

A transaction commit is divided into two phases:

Preparation Phase:
- The transaction manager (TM) initiates the transaction, logs the start of the transaction, and asks the participating resource managers (RMs) whether they can execute the commit operation, then waits for their responses.
- RMs execute local transactions, log redo/undo data, and return results to TM, but do not commit.
Commit/Rollback Phase:
- If all participating RMs execute successfully, the transaction proceeds to the commit phase:
  - TM logs the commit, sends a commit instruction to all RMs.
  - RMs commit the local transaction and respond to TM.
  - TM logs the end of the transaction.
- If any RM fails or times out during preparation or commit:
  - TM logs the rollback, sends rollback instructions to all RMs.
  - RMs rollback the local transaction and respond to TM.
  - TM logs the end of the transaction.

Characteristics

Atomicity: Supported
Consistency: Strong consistency
Isolation: Supported
Durability: Supported

Disadvantages

Synchronous Blocking: When participants occupy shared resources, others can only wait for resource release, leading to blocking.
Single Point of Failure: If the transaction manager fails, the entire system becomes unavailable.
Data Inconsistency: If the transaction manager only sends some commit messages, and a network issue occurs, only some participants receive the commit message, leading to inconsistency.
Uncertainty: If both the transaction manager and a participant fail after sending a commit message, it is uncertain whether the message was successfully committed.

Local Message Table

The transaction initiator maintains a local message table, and operations on the business table and the message table are within the same local transaction. Asynchronously, a scheduled task scans the message table and delivers the message downstream.

The broad concept of the local message table also allows downstream notification through methods other than message delivery, such as RPC calls.

The initiator executes a local transaction, operating both the business table and the local message table.
A scheduled task scans pending local messages (in the message table) and sends them to the message queue:
- If successful, mark the local message as sent.
- If failed, retry until successful.
The message queue delivers the message downstream.
The downstream transaction participant receives the message and executes a local transaction:
- If failed, no ACK is returned, and the message queue retries.
- If successful, an ACK is returned, marking the end of the global transaction.
- If the message or ACK is lost, the message queue retries.

Exceptional Scenarios

Message Loss: Handled by repeating the scheduled task.
Delivery Failure: Handled by retries, downstream must ensure idempotency.
ACK Loss: Handled by retries, downstream must ensure idempotency.

Advantages and Challenges

Advantages:

High system throughput, asynchronous downstream transactions via middleware decoupling.
Moderate business intrusion, requiring local message tables and scheduled tasks.

Challenges:

Incomplete transaction support, downstream transactions cannot be rolled back, only retried.

Characteristics

Atomicity: Supported
Consistency: Eventual consistency
Isolation: Not supported (committed branch transactions are visible to other transactions)
Durability: Supported

Best-Effort Notification

The best-effort notification is a simple approach to flexible transactions, suitable for business with low time sensitivity to eventual consistency, where the result of the passive party does not affect the initiator’s result.

This approach roughly works as follows:

System A completes its local transaction and sends a message to the MQ.
A service consumes the MQ and calls System B’s interface.
If System B succeeds, everything is fine; if it fails, the notification service periodically retries calling System B up to N times. If it still fails, it gives up.

Advantages and Challenges

Advantages:

Simple implementation.

Challenges:

No compensation mechanism, no guarantee of delivery.
Requires idempotency, with interfaces ensuring consistency and atomicity.

Characteristics

Atomicity: Not supported (requires additional interfaces)
Consistency: Not supported (requires additional interfaces)
Isolation: Not supported (committed branch transactions are visible to other transactions)
Durability: Supported

Classic Scenario

Payment Callback:

The payment service receives a successful payment notification from a third-party service, updates the payment status of the order, and synchronously notifies the order service. If this synchronous notification fails, an asynchronous script will keep retrying the order service interface.

References

Distributed Transactions: All You Need to Know

Linux Cgroups Overview CPU False Sharing