System DesignFoundations & The First ServerBack-of-the-Envelope Estimation

Back-of-the-Envelope Estimation

Foundations & The First Server

Back-of-the-envelope estimation is the practice of using a combination of thought experiments and simple mathematical calculations to quickly arrive at a reasonable estimate for a system's capacity needs. It's one of the most important skills in a system design interview because it demonstrates that you can think about scale and make data-driven decisions.

The goal is not to find the exact right answer, but to be in the right order of magnitude.

Why Do We Do This?

Before you can design a system, you need to know what you're designing for.

  • Does your system need to handle 10 requests per second, or 100,000?
  • Do you need to store gigabytes of data, or petabytes?
  • How much will it cost to run?

These estimations will directly influence your choice of technology, architecture, and infrastructure.

The Core Numbers You Should Know

You don't need to be a human calculator, but you should have a few key numbers memorized to speed up your calculations.

Powers of 2

  • 2^10 = 1,024 ≈ 1 Thousand (Kilo)
  • 2^20 = 1,048,576 ≈ 1 Million (Mega)
  • 2^30 ≈ 1 Billion (Giga)
  • 2^40 ≈ 1 Trillion (Tera)
  • 2^50 ≈ 1 Quadrillion (Peta)

Latency Numbers Every Programmer Should Know

  • L1 cache reference: ~0.5 ns
  • Branch mispredict: ~5 ns
  • L2 cache reference: ~7 ns
  • Mutex lock/unlock: ~25 ns
  • Main memory reference: ~100 ns
  • Send 2K bytes over 1 Gbps network: ~20,000 ns (20 µs)
  • Read 1 MB sequentially from memory: ~250,000 ns (250 µs)
  • Round trip within same datacenter: ~500,000 ns (0.5 ms)
  • SSD random read: ~150-200 µs
  • Read 1 MB sequentially from SSD: ~1,000,000 ns (1 ms)
  • HDD seek: ~10,000,000 ns (10 ms)
  • Read 1 MB sequentially from HDD: ~20,000,000 ns (20 ms)
  • Round trip USA to Europe: ~150,000,000 ns (150 ms)

Key Takeaway: Reading from memory is fast. Reading from disk is slow. Reading over the network is even slower. This is why caching is so important.

Throughput & Storage Calculations

  • Number of seconds in a day: 24 hours * 60 min/hr * 60 sec/min25 * 3600 = 86,40090,000 seconds
  • Number of seconds in a month: 90,000 * 302.7 Million

A Practical Example: Designing a Twitter-like Service

Let's walk through an estimation exercise for a simplified version of Twitter.

Interviewer: "Let's design a service where users can post short text messages. How would you estimate the scale?"

Step 1: Clarify and State Assumptions

  • Total Users: 500 Million
  • Daily Active Users (DAU): 200 Million (A reasonable fraction of total users)
  • Write Operations (Tweets per day): Each DAU posts, on average, 0.5 tweets per day.
    • 200 Million DAU * 0.5 tweets/DAU = 100 Million tweets per day
  • Read Operations (Feed views per day): Each DAU views their feed, on average, 5 times per day.
    • 200 Million DAU * 5 views/DAU = 1 Billion feed views per day
  • Read/Write Ratio: 1 Billion reads / 100 Million writes = 10:1. This is a very common ratio for social media. The system is "read-heavy".

Step 2: Calculate QPS (Queries Per Second)

Write QPS:

  • 100 Million tweets / 90,000 seconds100,000,000 / 90,00010,000 / 9~1,100 QPS (write)

Read QPS:

  • 1 Billion reads / 90,000 seconds1,000,000,000 / 90,000100,000 / 9~11,000 QPS (read)

Peak QPS: Traffic is not evenly distributed. It often has peaks. A common rule of thumb is to assume peak traffic is 2x - 3x the average.

  • Peak Write QPS: 1,100 * 2 = ~2,200 QPS
  • Peak Read QPS: 11,000 * 2 = ~22,000 QPS

Step 3: Estimate Storage Requirements

Size of a single tweet:

  • tweet_id: 8 bytes (64-bit integer)
  • user_id: 8 bytes
  • text: 280 characters * 2 bytes/char (UTF-8) = 560 bytes
  • media_url (optional): ~50 bytes
  • timestamp: 8 bytes
  • Total: Let's round up to ~700 bytes per tweet.

Storage per day:

  • 100 Million tweets/day * 700 bytes/tweet = 70 Billion bytes/day = 70 GB per day

Storage for 5 years:

  • 70 GB/day * 365 days/year * 5 years
  • 70 * 365 * 570 * 1800126,000 GB = ~126 TB

Step 4: Estimate Bandwidth/Network Requirements

Ingress (Writes):

  • 1,100 QPS * 700 bytes/tweet = 770,000 bytes/sec = ~0.77 MB/s

Egress (Reads):

  • Let's assume a feed view loads ~20 tweets.
  • 11,000 QPS * (700 bytes/tweet * 20 tweets/view) = 11,000 * 14,000154,000,000 bytes/sec = ~154 MB/s

Summary of Estimates

  • Write QPS: ~1.1k (Peak ~2.2k)
  • Read QPS: ~11k (Peak ~22k)
  • Storage (5 years): ~126 TB
  • Ingress: ~0.77 MB/s
  • Egress: ~154 MB/s

Now you have a concrete set of numbers to guide your design. You know you need a system that can handle thousands of queries per second and store terabytes of data. This immediately tells you that a single server won't be enough, and you'll need to think about load balancing, distributed databases, and caching.