System Design Interview Questions: Complete Guide for Beginners
Master system design fundamentals and ace your next system design interview
Introduction
System design interviews are a crucial part of the hiring process for software engineering roles, especially at top tech companies. These interviews test your ability to design scalable, reliable, and efficient systems that can handle millions of users.
This comprehensive guide covers everything you need to know about system design interviews, from fundamental concepts to real-world design problems.
What to Expect in a System Design Interview
- 45-60 minute interview session
- Open-ended problem (e.g., "Design a URL shortener")
- Discussion of requirements and constraints
- High-level architecture design
- Deep dive into specific components
- Discussion of trade-offs and alternatives
System Design Fundamentals
1. What is System Design?
System design is the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements. It involves:
- Understanding requirements
- Identifying constraints
- Designing the architecture
- Choosing technologies
- Considering scalability and reliability
2. Key Principles of System Design
Core Principles
- Scalability: System should handle growth in users, data, and traffic
- Reliability: System should work correctly even when components fail
- Availability: System should be accessible when needed (uptime)
- Performance: System should respond quickly to user requests
- Maintainability: System should be easy to update and modify
- Security: System should protect data and prevent unauthorized access
3. System Design Interview Framework
Follow this structured approach:
- Clarify Requirements: Ask questions about scope, scale, and features
- Estimate Scale: Calculate traffic, storage, and bandwidth requirements
- Design High-Level Architecture: Draw major components and their interactions
- Design Core Components: Deep dive into critical parts
- Scale the Design: Discuss bottlenecks and solutions
- Identify Trade-offs: Discuss pros and cons of your design
Example: Design a URL Shortener
Step 1: Clarify Requirements
- What's the scale? (100M URLs/day)
- What's the URL length? (7 characters)
- What features? (shorten, redirect, analytics)
Step 2: Estimate Scale
- Write operations: 100M/day = ~1,160 writes/sec
- Read operations: 100:1 read:write ratio = 116,000 reads/sec
- Storage: 100M URLs * 500 bytes = 50GB/year
Step 3: High-Level Design
- Application servers
- Database (SQL/NoSQL)
- Cache layer
- Load balancer
Scalability Concepts
Q1. What is the difference between horizontal and vertical scaling?
Answer:
| Vertical Scaling (Scale Up) | Horizontal Scaling (Scale Out) |
|---|---|
| Add more power to existing machine | Add more machines to the system |
| Increase CPU, RAM, storage | Add more servers |
| Easier to implement | More complex to implement |
| Limited by hardware | Virtually unlimited |
| Single point of failure | Better fault tolerance |
| Expensive at scale | Cost-effective at scale |
Q2. What is CAP Theorem?
Answer: CAP Theorem states that in a distributed system, you can only guarantee two out of three properties:
- Consistency: All nodes see the same data simultaneously
- Availability: System remains operational
- Partition Tolerance: System continues despite network failures
CAP Theorem Trade-offs:
CP (Consistency + Partition Tolerance)
- Example: MongoDB, HBase
- Sacrifices availability
- Good for: Financial systems, critical data
AP (Availability + Partition Tolerance)
- Example: Cassandra, DynamoDB
- Sacrifices consistency
- Good for: Social media, content delivery
CA (Consistency + Availability)
- Not possible in distributed systems
- Only works in non-partitioned systems
Q3. What is ACID in databases?
Answer: ACID properties ensure reliable database transactions:
- Atomicity: All or nothing - transaction either completes fully or not at all
- Consistency: Database remains in valid state after transaction
- Isolation: Concurrent transactions don't interfere with each other
- Durability: Committed changes persist even after system failure
Q4. What is BASE in NoSQL databases?
Answer: BASE is an alternative to ACID for NoSQL databases:
- Basically Available: System is available most of the time
- Soft State: System state may change over time
- Eventual Consistency: System will become consistent over time
BASE vs ACID:
ACID (SQL Databases)
- Strong consistency
- Immediate consistency
- Example: MySQL, PostgreSQL
BASE (NoSQL Databases)
- Eventual consistency
- High availability
- Example: Cassandra, DynamoDB
Database Design
Q5. When to use SQL vs NoSQL?
Answer:
| SQL (Relational) | NoSQL |
|---|---|
| Structured data | Unstructured/semi-structured data |
| ACID compliance needed | High scalability needed |
| Complex queries | Simple queries, high volume |
| Fixed schema | Flexible schema |
| Vertical scaling | Horizontal scaling |
| Examples: MySQL, PostgreSQL | Examples: MongoDB, Cassandra |
Q6. What is database sharding?
Answer: Sharding is the process of splitting a database into smaller, more manageable pieces called shards. Each shard is stored on a separate database server.
Sharding Strategies:
- Range-based Sharding: Split by value ranges (e.g., user IDs 1-1000 on shard 1)
- Hash-based Sharding: Use hash function to determine shard (e.g., hash(user_id) % num_shards)
- Directory-based Sharding: Use lookup table to find shard
Example: Sharding by User ID
Shard 1: Users 1-1,000,000
Shard 2: Users 1,000,001-2,000,000
Shard 3: Users 2,000,001-3,000,000
Benefits:
- Distributes load
- Improves performance
- Enables horizontal scaling
Challenges:
- Cross-shard queries
- Rebalancing data
- Increased complexity
Q7. What is database replication?
Answer: Replication is the process of copying data from one database server to another to improve availability and performance.
Types of Replication:
- Master-Slave (Primary-Secondary): One master for writes, multiple slaves for reads
- Master-Master: Multiple masters, both can handle reads and writes
Master-Slave Replication:
Master (Primary)
├── Write operations
└── Replicates to slaves
Slave 1 (Secondary)
├── Read operations
└── Receives updates from master
Slave 2 (Secondary)
├── Read operations
└── Receives updates from master
Benefits:
- Read scalability
- High availability (failover)
- Geographic distribution
Q8. What is database indexing?
Answer: An index is a data structure that improves the speed of data retrieval operations on a database table.
Types of Indexes:
- Primary Index: Unique index on primary key
- Secondary Index: Index on non-primary key columns
- Composite Index: Index on multiple columns
- B-tree Index: Most common, balanced tree structure
- Hash Index: For equality searches
Caching Strategies
Q9. What is caching and why is it important?
Answer: Caching stores frequently accessed data in fast storage (memory) to reduce latency and database load.
Benefits:
- Reduces database load
- Improves response time
- Reduces bandwidth usage
- Improves user experience
Common Caching Solutions:
- Redis: In-memory data store, very fast
- Memcached: Distributed memory caching
- CDN: Content Delivery Network for static content
Q10. What are different caching strategies?
Answer:
- Cache-Aside (Lazy Loading): Application checks cache first, then database
- Write-Through: Write to cache and database simultaneously
- Write-Back (Write-Behind): Write to cache first, database later
- Refresh-Ahead: Proactively refresh cache before expiration
Cache-Aside Pattern:
1. Check cache for data
2. If found (cache hit), return data
3. If not found (cache miss):
a. Query database
b. Store result in cache
c. Return data
Example:
if (cache.exists(key)) {
return cache.get(key);
} else {
data = database.get(key);
cache.set(key, data, ttl);
return data;
}
Q11. What is CDN (Content Delivery Network)?
Answer: CDN is a network of distributed servers that deliver content based on geographic location of users.
How CDN Works:
- User requests content
- Request routed to nearest CDN server
- If content cached, return immediately
- If not cached, fetch from origin server and cache
Benefits:
- Reduces latency
- Reduces origin server load
- Improves availability
- Better user experience globally
Load Balancing
Q12. What is load balancing?
Answer: Load balancing distributes incoming network traffic across multiple servers to ensure no single server is overwhelmed.
Load Balancing Algorithms:
- Round Robin: Distribute requests sequentially
- Least Connections: Send to server with fewest active connections
- Weighted Round Robin: Round robin with server capacity weights
- IP Hash: Route based on client IP address
Load Balancer Architecture:
Client Request
↓
Load Balancer
├── Server 1
├── Server 2
└── Server 3
Benefits:
- Distributes load evenly
- Improves availability
- Enables horizontal scaling
- Handles server failures
Q13. What is the difference between Layer 4 and Layer 7 load balancing?
Answer:
| Layer 4 (Transport Layer) | Layer 7 (Application Layer) |
|---|---|
| Routes based on IP and port | Routes based on HTTP headers, URL, cookies |
| Faster, less CPU intensive | Slower, more CPU intensive |
| No content inspection | Content-aware routing |
| TCP/UDP level | HTTP/HTTPS level |
| Example: HAProxy (TCP mode) | Example: NGINX, HAProxy (HTTP mode) |
Common System Design Questions
Q14. Design a URL Shortener (like bit.ly)
Requirements:
- Shorten long URLs to 7 characters
- Redirect short URL to original URL
- Handle 100M URLs per day
High-Level Design:
- Application Layer: Web servers to handle requests
- Database: Store mappings (short URL → long URL)
- Cache: Redis for frequently accessed URLs
- Load Balancer: Distribute traffic
Key Components:
- URL Encoding: Base62 encoding (a-z, A-Z, 0-9) for 7 characters = 62^7 = 3.5 trillion URLs
- Database Schema: (short_url, long_url, created_at, expires_at)
- Caching: Cache popular URLs (80-20 rule: 20% URLs get 80% traffic)
Q15. Design a Chat System (like WhatsApp)
Requirements:
- 1-on-1 messaging
- Group messaging
- Real-time delivery
- Handle 50M daily active users
High-Level Design:
- Client: Mobile/web app
- API Gateway: Route requests
- Chat Service: Handle messaging logic
- Message Queue: Kafka/RabbitMQ for async processing
- Database: Store messages (SQL for metadata, NoSQL for messages)
- WebSocket Server: Real-time bidirectional communication
- Notification Service: Push notifications
Key Challenges:
- Real-time delivery (WebSockets)
- Message ordering
- Offline message delivery
- Scalability (millions of concurrent connections)
Q16. Design a News Feed System (like Facebook)
Requirements:
- Users can post updates
- Users can follow other users
- News feed shows posts from followed users
- Handle 1B users, 500M daily active users
Approaches:
- Pull Model (Fan-out on Read): Fetch posts when user requests feed
- Push Model (Fan-out on Write): Push posts to followers' feeds when posted
- Hybrid Model: Push for active users, pull for inactive users
High-Level Design:
- User Service: Manage users and relationships
- Post Service: Handle post creation
- Feed Service: Generate and serve news feeds
- Cache: Store pre-computed feeds for active users
- Database: Store posts, user relationships, feeds
Q17. Design a Distributed Cache
Requirements:
- Store key-value pairs
- Fast read/write operations
- Distributed across multiple servers
- Handle server failures
Key Components:
- Consistent Hashing: Distribute keys across servers
- Replication: Store copies on multiple servers
- Eviction Policy: LRU (Least Recently Used) when cache is full
- Cache Invalidation: TTL (Time To Live) or manual invalidation
Consistent Hashing Benefits:
- Minimal rehashing when servers added/removed
- Even distribution of keys
- Handles server failures gracefully
Example:
Server 1: Keys hash to 0-33
Server 2: Keys hash to 34-66
Server 3: Keys hash to 67-99
If Server 2 fails:
Server 1: 0-66
Server 3: 67-99
Design Patterns
Q18. What is Microservices Architecture?
Answer: Microservices is an architectural approach where applications are built as a collection of small, independent services.
Characteristics:
- Each service is independently deployable
- Services communicate via APIs (REST, gRPC)
- Each service has its own database
- Services are organized around business capabilities
Benefits:
- Independent scaling
- Technology diversity
- Fault isolation
- Team autonomy
Challenges:
- Increased complexity
- Network latency
- Data consistency
- Distributed system challenges
Q19. What is API Gateway Pattern?
Answer: API Gateway is a single entry point for all client requests, routing them to appropriate microservices.
Responsibilities:
- Request routing
- Authentication and authorization
- Rate limiting
- Load balancing
- Request/response transformation
- API versioning
API Gateway Architecture:
Client
↓
API Gateway
├── User Service
├── Order Service
├── Payment Service
└── Notification Service
Benefits:
- Single entry point
- Centralized cross-cutting concerns
- Simplified client communication
- Better security
Q20. What is Message Queue?
Answer: Message queue is a communication method where services communicate asynchronously by sending messages to a queue.
Benefits:
- Decouples services
- Handles traffic spikes
- Improves reliability
- Enables async processing
Popular Message Queues:
- RabbitMQ: General-purpose message broker
- Apache Kafka: High-throughput, distributed streaming
- Amazon SQS: Managed message queue service
- Redis Pub/Sub: Lightweight pub/sub messaging
Message Queue Flow:
Producer → Queue → Consumer
Example: Order Processing
1. Order Service sends order to queue
2. Payment Service processes payment
3. Inventory Service updates stock
4. Notification Service sends confirmation
Benefits:
- Services don't need to be online simultaneously
- Can handle high load
- Better fault tolerance
Interview Tips
Preparation Tips
- Practice Common Questions: URL shortener, chat system, news feed, search engine
- Understand Fundamentals: Scalability, databases, caching, load balancing
- Study Real Systems: Read about how companies like Google, Facebook, Amazon design systems
- Practice Drawing: Get comfortable drawing diagrams and explaining them
- Think Out Loud: Explain your thought process during practice
During the Interview
- Ask Questions: Clarify requirements, constraints, and scale
- Start Simple: Begin with basic design, then add complexity
- Estimate Scale: Calculate traffic, storage, and bandwidth needs
- Discuss Trade-offs: Explain pros and cons of your choices
- Be Flexible: Be ready to modify your design based on feedback
Common Mistakes to Avoid
- Jumping to solutions without understanding requirements
- Over-engineering the solution
- Not discussing trade-offs
- Ignoring scalability and performance
- Not considering failure scenarios
- Forgetting about security and authentication
Key Metrics to Remember
| Metric | Typical Value |
|---|---|
| Read requests per second | 100K - 1M |
| Write requests per second | 1K - 100K |
| Storage per user | 1GB - 10GB |
| Cache hit ratio | 80-90% |
| Database query time | < 10ms |
| API response time | < 200ms |
Sign up here with your email
ConversionConversion EmoticonEmoticon