Anatomy of a High Level System Design
System design interviews can be daunting if a candidate is unprepared. Even if a candidate is prepared well enough, during an interview they might crumble under pressure while structuring their system design and such a situation can lead to following debacles:
- Candidate presents a very unstructured design (although they mention all important points but do not connect them in a logical flow)
- Candidate misunderstands a problem statement and gives a system design of a similar but different system. Candidate does not realizes this until 15–20 minutes into the interview after interviewer probes the design.
- Candidate blacks out in the middle when they realize that their approach is leading to nowhere.
After giving several system design interviews in both software engineering and machine learning and also taking several interviews at some big tech companies, here are some pointers that candidates should keep handy during an interview.
Since most interviews nowadays are online, candidates can list down all these pointers in a tab or notepad (most interviewers or companies do not consider this to be cheating).
Requirement Gathering
This should be the first step towards your solution. Ask the interviewer questions relevant to the problem. Some of these could be:
- Who are the users of the system, are they internal customers, enterprise customers, or general public ?
- Where are the users located, are they geographically distributed or in a specific region ?
- How many concurrent users are going to use this service ?
- How many requests per second the service is expected to handle ?
- Is this a read heavy or write heavy system ?
- Does the system needs to be strongly consistent or can be eventually consistent ?
- If system deals with file uploads/downloads, how large files or data is uploaded or downloaded on an average ?
… and so on. Most of the questions will be related to the system you are building. But the idea of this exercise is to do :
- Capacity planning i.e. number of CPU cores needed, amount of memory required, throughput, disk space, caching requirements etc.
- Selecting the appropriate database type i.e. SQL vs. NoSQL vs. Graph DB
- Choosing an appropriate database schema i.e. normalized vs. denormalized.
- Choosing the optimum number of database partitions.
- Whether to go for a single leader or multiple leaders configuration.
- Should we go for geo-redundancy or regional redundancy of the service ?
- Will the service require a distributed queueing system to queue the incoming requests ?
- What kind of database indexing engine is appropriate i.e. B-Tree vs. LSM Tree and so on.
Most of the decisions and tradeoffs are driven by the requirements of the system. All the requirement gathering can be done at the start or as in most practical system design, requirement gathering is a continuous process.
High Level Flow Diagram — Read and Write Paths
This step requires drawing the different high level components required for a working prototype in production. The different component comprises of but not limited to are:
- Web server(s) to accept the incoming requests
- Load balancer(s) in case multiple web servers are serving requests.
- A queueing system like Kafka to queue the incoming requests in the event that there are too many concurrent requests.
- A rate limiter to handle too many concurrent requests.
- The application server(s) where the main service is running. Again this could be behind some load balancer.
- The database server(s) running the DB service (again possibly behind some load balancer). The database service usually will have multiple partitions and each partition will have multiple read replicas.
- Caching service to serve faster read requests.
- Some cloud based system to collect logs and run monitoring and observability of the different components.
… and so on. Note that these are just some common high level components that most distributed systems would use.
API Design
Many system design interviews will not focus on API Design but there are some very good companies that focus on API design. This round is specifically to understand how much the candidate understands about different API nuances for example:
- Define the CRUD (Create, Read, Update, Delete) operations for the system.
- Define apis for the CRUD operations.
- How to handle pagination during reads through the APIs ?
- How to handle SQL injection and sanitize user inputs in GET and POST queries ?
- How to serialize and deserialize the POST data into the HTTP request ?
- How to deal with error case scenarios and send appropriate HTTP response codes ?
…and so on.
Database Service
This is probably the most “discussed” part in system design interviews in big tech companies as it highlights lots of understanding of distributed system design. The different areas where the interviewers focusses on are as follows:
- Does the service need a distributed database or we can work with a distributed blob storage instead of a database ?
- Choosing the right database technology — SQL vs. NoSQL vs. Graph Databases vs. In Memory databases and the different tradeoffs associated with it.
- Whether a row-oriented filesystem is appropriate or a column oriented filesystem depending on query patterns ?
- Designing the table schemas based on the different read and write patterns.
- Defining different indexing strategies and comparing different data structures such as B-Tree vs LSM Tree.
- How to partition the database, the partitioning columns, number of partitions etc. ?
- How to read and write data across different partitions using Consistent Hashing ?
- How to choose the number of replicas for each database partition ?
- How to select the leader and readers based on the consistency requirements, i.e. strong vs eventual consistency ?
- How to sync multiple leaders ?
- What happens when a replica server crashes and a new replica joins ?
- What happens if a leader crashes and a new leader joins ? How leader election algorithm works ?
… and so on.
This section will probably be the longest if you are interviewing at FAANG compaies as there are lots of different strategies and tradeoffs one can discuss here. Good part is that this is also the most popular section among candidates and one can find very good resources over the internet for all of the above.
Caching Service
If you are designing your system for high throughput, caching is a very critical component. The design of a caching service is quite similar to a database service as often caching is implemented using in-memory databases with limited memory. Some areas where interviewer would want you to focus are:
- Do you need a cache and why ?
- What data are you going to cache and why ?
- What data strcutures are you going to use to cache the data ?
- How are you going to update the cache ?
- What kind of cache updation strategy will you use such as LRU vs LFU for your problem ?
- How many cache servers would be required ?
… and so on.
Quality and Reliability
Since reliability of a system is very critical for a product, this section allows the interviewers to evaluate candidates based on how do they think of different edge cases, failure scenarios and so on. Some pointers that the interviewer will look out for:
- How will you debug if your application crashes ?
- How will you setup logging ?
- What different metrics will you monitor from the VMs for e.g. CPU usage, memory usage etc. ?
- How will you autoscale you system based on the monitoring metrics ?
- Will you be setting up geo-redundancy or regional redundancy for your service ?
- How will you deal with issues like missing data or data duplication in your database or cache ?
- How are you going to setup distributed locking for common resources like filesystem ?
- How will you debug situations when write queries fails vs. when read queries fails ?
… and so on.
Security
While most companies do not expect candidates to handle security in their system design especially at or below mid-level but for principal and above levels, handling security in you design is of utmost importance. Here are some pointers that interviewers will look out for in your design:
- How to handle and sanitize user inputs before querying the database ?
- How to handle SQL injections ?
- How does authentication and authorization works for each of your component ?
- How are you going to setup private networks for your service and how to do active development over private networks ?
- How are you going to deal with data with various security levels such as general vs public vs confidential etc. ?
… and so on.
Incorporating quality and security into your system design will fetch bonus points during the interview.