Back-of-the-Envelope Estimation
Foundations & The First Server
Back-of-the-envelope estimation is the practice of using a combination of thought experiments and simple mathematical calculations to quickly arrive at a reasonable estimate for a system's capacity needs. It's one of the most important skills in a system design interview because it demonstrates that you can think about scale and make data-driven decisions.
The goal is not to find the exact right answer, but to be in the right order of magnitude.
Why Do We Do This?
Before you can design a system, you need to know what you're designing for.
- Does your system need to handle 10 requests per second, or 100,000?
- Do you need to store gigabytes of data, or petabytes?
- How much will it cost to run?
These estimations will directly influence your choice of technology, architecture, and infrastructure.
The Core Numbers You Should Know
You don't need to be a human calculator, but you should have a few key numbers memorized to speed up your calculations.
Powers of 2
2^10
= 1,024 ≈ 1 Thousand (Kilo)2^20
= 1,048,576 ≈ 1 Million (Mega)2^30
≈ 1 Billion (Giga)2^40
≈ 1 Trillion (Tera)2^50
≈ 1 Quadrillion (Peta)
Latency Numbers Every Programmer Should Know
- L1 cache reference: ~0.5 ns
- Branch mispredict: ~5 ns
- L2 cache reference: ~7 ns
- Mutex lock/unlock: ~25 ns
- Main memory reference: ~100 ns
- Send 2K bytes over 1 Gbps network: ~20,000 ns (20 µs)
- Read 1 MB sequentially from memory: ~250,000 ns (250 µs)
- Round trip within same datacenter: ~500,000 ns (0.5 ms)
- SSD random read: ~150-200 µs
- Read 1 MB sequentially from SSD: ~1,000,000 ns (1 ms)
- HDD seek: ~10,000,000 ns (10 ms)
- Read 1 MB sequentially from HDD: ~20,000,000 ns (20 ms)
- Round trip USA to Europe: ~150,000,000 ns (150 ms)
Key Takeaway: Reading from memory is fast. Reading from disk is slow. Reading over the network is even slower. This is why caching is so important.
Throughput & Storage Calculations
- Number of seconds in a day:
24 hours * 60 min/hr * 60 sec/min
≈25 * 3600
=86,400
≈ 90,000 seconds - Number of seconds in a month:
90,000 * 30
≈ 2.7 Million
A Practical Example: Designing a Twitter-like Service
Let's walk through an estimation exercise for a simplified version of Twitter.
Interviewer: "Let's design a service where users can post short text messages. How would you estimate the scale?"
Step 1: Clarify and State Assumptions
- Total Users: 500 Million
- Daily Active Users (DAU): 200 Million (A reasonable fraction of total users)
- Write Operations (Tweets per day): Each DAU posts, on average, 0.5 tweets per day.
200 Million DAU * 0.5 tweets/DAU
= 100 Million tweets per day
- Read Operations (Feed views per day): Each DAU views their feed, on average, 5 times per day.
200 Million DAU * 5 views/DAU
= 1 Billion feed views per day
- Read/Write Ratio:
1 Billion reads / 100 Million writes
= 10:1. This is a very common ratio for social media. The system is "read-heavy".
Step 2: Calculate QPS (Queries Per Second)
Write QPS:
100 Million tweets / 90,000 seconds
≈100,000,000 / 90,000
≈10,000 / 9
≈ ~1,100 QPS (write)
Read QPS:
1 Billion reads / 90,000 seconds
≈1,000,000,000 / 90,000
≈100,000 / 9
≈ ~11,000 QPS (read)
Peak QPS: Traffic is not evenly distributed. It often has peaks. A common rule of thumb is to assume peak traffic is 2x - 3x the average.
- Peak Write QPS:
1,100 * 2
= ~2,200 QPS - Peak Read QPS:
11,000 * 2
= ~22,000 QPS
Step 3: Estimate Storage Requirements
Size of a single tweet:
tweet_id
: 8 bytes (64-bit integer)user_id
: 8 bytestext
: 280 characters * 2 bytes/char (UTF-8) = 560 bytesmedia_url
(optional): ~50 bytestimestamp
: 8 bytes- Total: Let's round up to ~700 bytes per tweet.
Storage per day:
100 Million tweets/day * 700 bytes/tweet
=70 Billion bytes/day
= 70 GB per day
Storage for 5 years:
70 GB/day * 365 days/year * 5 years
70 * 365 * 5
≈70 * 1800
≈126,000 GB
= ~126 TB
Step 4: Estimate Bandwidth/Network Requirements
Ingress (Writes):
1,100 QPS * 700 bytes/tweet
=770,000 bytes/sec
= ~0.77 MB/s
Egress (Reads):
- Let's assume a feed view loads ~20 tweets.
11,000 QPS * (700 bytes/tweet * 20 tweets/view)
=11,000 * 14,000
≈154,000,000 bytes/sec
= ~154 MB/s
Summary of Estimates
- Write QPS: ~1.1k (Peak ~2.2k)
- Read QPS: ~11k (Peak ~22k)
- Storage (5 years): ~126 TB
- Ingress: ~0.77 MB/s
- Egress: ~154 MB/s
Now you have a concrete set of numbers to guide your design. You know you need a system that can handle thousands of queries per second and store terabytes of data. This immediately tells you that a single server won't be enough, and you'll need to think about load balancing, distributed databases, and caching.