CAP Theorem
Scaling to a Distributed System
CAP Theorem is a fundamental principle in distributed systems that states it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:
- Consistency (C): Every read receives the most recent write or an error. All nodes in the system have the same view of the data at the same time.
- Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write. The system is always operational.
- Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes. In other words, the system can sustain a network partition (a communication break between two groups of nodes).
Why Partition Tolerance is a Must
In any real-world distributed system, network partitions are a fact of life. Servers can crash, switches can fail, and network links can be severed. A system that is not partition-tolerant will fail entirely if there is a communication break between its nodes.
Therefore, for any practical distributed system, Partition Tolerance (P) is not optional. You must design your system to handle network partitions.
This means the real trade-off in system design is between Consistency and Availability (C vs. A) when a partition occurs.
The CAP Theorem Trade-off: CP vs. AP
When a network partition happens, you have a choice to make.
Imagine a simple system with two nodes, Node 1 and Node 2, that replicate data between them. A network partition occurs, and they can no longer communicate.
A client writes a new piece of data to Node 1. Now, what should the system do?
Option 1: Choose Consistency over Availability (CP System)
If you choose consistency, you must guarantee that any read will return the most up-to-date data.
- Since Node 1 cannot replicate the new write to Node 2, it cannot guarantee that a read from Node 2 would be consistent.
- To maintain consistency, the system must sacrifice availability. It must refuse to serve requests to the partitioned node (Node 2). It might even have to refuse writes to Node 1 if it can't guarantee they will be replicated.
- The system remains consistent, but it is not fully available.
Examples of CP Systems:
- Relational Databases (in some configurations): Many traditional SQL databases are designed to be strongly consistent.
- Financial Systems: In a banking application, it's better to have a transaction fail than to show an incorrect balance.
Option 2: Choose Availability over Consistency (AP System)
If you choose availability, you must guarantee that the system will always respond to requests.
- Even though Node 1 and Node 2 are partitioned, they will both continue to accept requests.
- If a client writes new data to Node 1, Node 1 will accept it.
- If another client reads that same data from Node 2, it will receive the old, stale data because the new write has not been replicated yet.
- The system remains highly available, but it is not strongly consistent. The data will become consistent eventually, once the partition is resolved. This is known as eventual consistency.
Examples of AP Systems:
- NoSQL Databases: Many NoSQL databases like Cassandra and DynamoDB are designed to be highly available and partition-tolerant, making them AP systems. They are a good choice for applications where high availability is critical and some temporary inconsistency is acceptable.
- Social Media Feeds: It's generally acceptable if you don't see a new post for a few seconds. It's more important that the feed loads quickly.
Common Misconceptions
- It's not a binary choice: The CAP theorem is often presented as a simple "pick two" choice, but in reality, it's a spectrum. Systems can be tuned to be more or less consistent.
- It only applies during a partition: When the network is healthy and there are no partitions, a system can provide both consistency and availability. The trade-off only comes into play when a partition occurs.
In a system design interview, the CAP theorem is a powerful tool for discussing the trade-offs of your database choice. When you choose a database, you should be able to articulate where it falls on the CP/AP spectrum and why that is the right choice for the specific requirements of the system you are designing. For example, for an e-commerce shopping cart, you might choose an AP system for browsing and adding items, but switch to a CP system for the final checkout process to ensure transactional integrity.