Guesstimating Scalability Numbers

6 min readDec 30, 2021

One of the key objectives of any large scale system design is to do capacity planning. With proper capacity planning, one can avoid running into scalability issues w.r.t. disk space, memory, number of CPUs, network bandwidth, frequent system crashes etc.

In system design interviews, candidates are expected to clarify requirements in terms of QPS, size of data on disk and in-memory, network IOPS etc. so that they can come up with hardware estimates to smoothly run their services in a scalable and available manner.

We will be focussing on the following 3 different types of servers/instances for most of our design discussions:

Application servers — These servers are responsible for running your actual application logic (code). Your JAVA services are running on these instances.
Database Servers — These servers are responsible for running your database engines.
Cache servers — These servers are responsible for caching frequently accessed data and repeated data access patterns and thus reduce load on database servers.

How do we arrive at the number of DB instances ?

To keep things we make the following assumptions:

Each DB instance has 10TB disk capacity.
DB instances are HDD type.
Network bandwidth is 20 GB/s
Size of data to store on disk = X TB
Read Throughput = Y GB/s
Reading sequentially from disk has a throughput of roughly 100MB/s
Although DB instances uses RAM for storing indexes but for our estimation we will assume that the memory is sufficient for indexes.
Reads from disk are sequential. This may not be true always.
Each DB instance can have multiple partitions (additional disks mounted).

Number of DB instances required to store all the data on disk is around ceil(X/10) since data size is X TB and each instance has 10TB disk.
Reading sequentially from disk has a throughput of roughly 100MB/s. If each DB instance handles roughly equal number of requests, then we would need approximately ceil(Y*1024/100) DB instances.
Number of DB instances required to server Y GB/s of data throughput requires ceil(Y/20) instances as each instance has a network throughput of 20 GB/s
Thus total number of DB instances required = max(ceil(X/10), ceil(Y*1024/100), ceil(Y/20)) = max(ceil(X/10), ceil(Y*1024/100))
because Y*1024/100 > Y/20.

How do we arrive at the number of cache servers ?

To keep things we make the following assumptions:

Each cache server has 64GB main memory (RAM).
Network bandwidth is 20 GB/s
Number of CPU cores per cache server is 4.
Average p99 latency required to serve a request from cache is 50ms.
Request per second from one instance = Number of CPU cores/Time to serve a request in seconds
Size of data to store in memory = X GB
Read Throughput = Y GB/s
Number of requests per second = Z
Reading sequentially from RAM has a throughput of roughly 4 GB/s
Disk capacity is more than enough to persist in-memory cache data in order to avoid data loss.
Cache data is partitioned across multiple instances and each instance has multiple replicas for load balancing.

Number of cache instances required to store all the data in-memory is around ceil(X/64).
Reading sequentially from RAM has a throughput of roughly 4 GB/s. If we partition the data such that each cache instance handles roughly equal number of requests, then we would need approximately ceil(Y/4) cache instances.
Number of CPU cores required to serve Z requests per second at 50ms latency = 50*Z/1000 = ceil(Z/20)
Number of cache instances required for ceil(Z/20) cores = ceil(Z/80) assuming each instance of 4 cores.
Number of cache instances required to server Y GB/s of data throughput requires ceil(Y/20) instances as each instance has a network throughput of 20 GB/s
Thus total number of cache instances required = max(ceil(X/64), ceil(Y/4), ceil(Z/80), ceil(Y/20)) = max(ceil(X/64), ceil(Y/4), ceil(Z/80))
because Y/4 > Y/20.

How do we arrive at the number of application servers ?

To keep things we make the following assumptions:

Each application server has 16GB main memory (RAM).
Network bandwidth is 20 GB/s
Number of CPU cores per application server is 4.
Average p99 latency required to serve a request from application server is 200ms.
Request per second from one instance = Number of CPU cores/Time to serve a request in seconds
Read Throughput = Y GB/s
Number of requests per second = Z
Using multiple application servers to load balance the requests. Assume requests are distributed uniformly across all instances.

If we load balance the requests equally, then with ceil(Y/4) instances, we can serve Y GB/s throughput. This is assuming that read throughput from main memory is 4 GB/s.
Number of CPU cores required to serve Z requests per second at 200ms latency = 200*Z/1000 = ceil(Z/5)
Number of instances required for ceil(Z/5) cores = ceil(Z/20) assuming each instance with 4 cores.
Number of instances required to server Y GB/s of data throughput requires ceil(Y/20) instances as each instance has a network throughput of 20 GB/s
Thus total number of cache instances required = max(ceil(Y/4), ceil(Z/5), ceil(Y/20)) = max(ceil(Y/4), ceil(Z/5))
because Y/4 > Y/20.

Let’s look at one system design problem to understand the numbers in detail — Design Youtube

Functional Requirements:

Users can upload and stream videos
Users can search for videos
Users can comment on videos and also upvote/downvote

Different entities for which we want to store data:

Users
Videos
Comments
Inverted Index for Search

Let’s estimate the size of data, queries per second and read/write throughput ONLY for the videos.

500 hours of videos uploaded every minute. Assuming 1 hr of video = 1 GB size, total size of videos uploaded every second = 500*1Gb/min = 9 GB/s (write throughput)
1 billion hours of videos streamed every day, total size of videos watched per second = 10⁹ GB/day = 10⁹/86400 = 12 TB/s (read throughput)
Assuming each video on average is of 1 minute, number of videos uploaded per minute = 500*60 = 30K/min = 5K/s
Number of videos watched per second = 10⁹*60/86400 = 700K/s
Let’s assume that the total number of requests per second = 1 million (including videos, recommendations, search etc.)
Total number of videos for 5 years = 5000*5*365*86400 = 800 billion
Total size of videos for 5 years = 9GB*5*365*86400 = 1400 PB

Let’s estimate the number of DB instances required to store the videos and stream them. Generally videos are stored in big data databases such as HDFS or BigTable.

Size of data to store on disk (X) = 1400 PB = 1400*1024 TB
Read/Write Throughput (Y) = 12 TB/s = 12*1024 GB/s

Total number of DB instances required = max(ceil(X/10), ceil(Y*1024/100)) = max(143360, 125830) = 144K instances.

Let’s estimate the number of cache servers required to stream videos. Assume that 0.5% of all the videos are very popular at any time and are cached. Thus total size of videos to cache = 0.5*1400/100 = 7 PB.

Size of data to store in memory (X) = 7*1024*1024 GB
Read Throughput (Y) = 12*1024 GB/s
Number of requests per second (Z) = 10⁶

Total number of cache instances required = max(ceil(X/64), ceil(Y/4), ceil(Z/80)) = max(114688, 3072, 12500) = 115K

Let’s estimate the number of application servers required to upload/stream videos. Let us assume that

Read Throughput (Y) = 12*1024 GB/s
Number of requests per second (Z) = 10⁶

Total number of application servers required = max(ceil(Y/4), ceil(Z/5)) = max(3072, 200K) = 200K.

Note that we made lots of assumptions while deriving these numbers. While it is not required to be accurate in our estimations, but the closer we are to our actual requirements, the better we can recover from downtimes due to server crashes.

We could have also considered things such as:

SSDs instead of HDDs for storing data in DB.
Very large index sizes for DBs. Which would also require to consider the memory for DB instances.
Local caching on application servers. Also it could be possible that the data held in-memory of application servers and JAVA heap exceeds 16GB. Then we would need to consider the memory while doing calculations.
Memory management of different programming languages JAVA vs C++.
Run time performance of programming languages — interpreted languages (e.g. Python) are slow as compared to compiled (C++) and JDK (e.g. JAVA, Scala etc.)
Multi-threading and multi-processing.
Modern hardwares can take advantage of GPUs also.

Latency Numbers Every Programmer Should Know

Guesstimating Scalability Numbers

Written by Abhijit Mondal

Responses (1)