System Design Interview Practice Questions With Sample Solutions

In a system design interview, you’re given an open-ended problem like designing a popular system. The goal is to assess how you approach large-scale problems, make architectural decisions, and consider scalability, reliability, and maintainability.

Practicing system design questions is important because it helps you learn to structure your thoughts, ask the right questions, and cover critical aspects (like data storage, caching, and fault tolerance).

Why Practice System Design?

Recruiters want to see if you can design systems that handle millions of users or large amounts of data. This involves understanding how different components (databases, caches, queues, etc.) work together and how to scale them. For beginners, starting with common scenarios and reviewing sample solutions is a great way to build confidence.

Below are 10 practice system design questions commonly asked in interviews, each with a sample solution outline. The solutions focus on the approach and key considerations without diving into unnecessary complexity.

10 System Design Practice Questions with Sample Solutions

1. Design a URL Shortener (like Bit.ly)

Sample Solution Approach:

Core Functionality: Accept a long URL and return a short URL code. When the short URL is visited, the service redirects to the original long URL.
Unique ID Generation: Generate a unique short code for each URL. This can be done by encoding an auto-incrementing ID in base62 (using letters and digits) or by using a hash function. Ensure uniqueness by checking for collisions (if using hashing) or by having a separate ID generator service.
Database Choice: Use a fast key-value store or relational database to map short codes to original URLs. Each entry is a pair: ShortCode -> OriginalURL. A NoSQL store is often chosen for simplicity and scalability, but an SQL database with an index on the shortcode can also work for moderate scale.
Scalability: For high traffic, deploy multiple instances and use a load balancer. Partition (shard) the database based on short code ranges or use a distributed NoSQL database to handle billions of URLs. Frequently accessed URLs can be cached in memory for quicker retrieval.
Additional Considerations: Implement analytics (track click counts), and consider an expiration policy for links (if needed). Also, ensure the system handles redirects quickly (HTTP 301/302 responses) for a good user experience.

Learn how to design a URL Shortener.

2. Design a Rate Limiter

Sample Solution Approach:

Purpose: Prevent abuse by limiting how many requests a user or service can make in a given time frame (e.g., “100 requests per minute”).
Basic Idea: Maintain a count of requests per user (or API key) and reset or decay the count over time. If the limit is exceeded, further requests are rejected until the window resets.
Token Bucket vs. Leaky Bucket: Use token bucket or leaky bucket algorithms for rate limiting:
- Token Bucket: Tokens are added to a bucket at a fixed rate (say 5 tokens/sec). Each request consumes a token. If the bucket has tokens, the request passes; if empty, the request is limited. This allows bursts up to the bucket size, then enforces steady rate.
- Leaky Bucket: Imagine a bucket leaking at a steady rate. Incoming requests are added to the bucket; they are processed at the fixed leak rate. If requests come in too fast and the bucket overflows, they are dropped. This smooths out bursts by processing at a constant rate.
Implementation: For a distributed system, store counters or token buckets in a fast in-memory store like Redis (with keys for each user and timestamps). Ensure atomic updates when consuming tokens. Use a sliding window or fixed window counter approach for simplicity (e.g., count requests in the last 60 seconds).
Scalability: Deploy the rate limiter as a middleware service or as part of an API gateway. It should be lightweight and very fast. Use replication or cluster for the in-memory store to avoid single points of failure.

3. Design a Distributed Cache System

Sample Solution Approach:

Goal: Improve read performance by caching frequently accessed data in memory (distributed across multiple servers). Examples include designing something like Redis or Memcached clusters.
Data Distribution: Use consistent hashing to distribute key-value pairs across cache nodes. This ensures load balancing and that when a node is added/removed, only a subset of keys need to move.
Cache Invalidation: One of the hardest problems. Cache invalidation strategies include:
- TTL (Time-to-Live): Each cache entry expires after a short time, ensuring data is refreshed periodically.
- Write-Through: On data update, write to the cache and database simultaneously to keep them in sync.
- Cache-Aside (Lazy Loading): On a cache miss, fetch from database, then store it in cache. On update or delete, invalidate (remove) the cache entry so next read goes to DB.
Consistency: Accept eventual consistency between cache and source of truth (small window where stale data might be served) or implement stronger consistency with slightly more complexity (like versioning data or update propagation).
Scalability & Fault Tolerance: Run cache servers on multiple machines. Use a coordinator or discovery service so clients know current cache nodes. If a cache node goes down, the system should redistribute keys to healthy nodes. Also consider LRU eviction or size limits to avoid running out of memory.
Usage: Front your primary database with this cache layer to handle hot data. Ensure that cache failures fall back to the database (with possibly higher latency) rather than causing system downtime.

4. Design a Messaging Queue (like Kafka or RabbitMQ)

Sample Solution Approach:

Core Idea: A messaging queue decouples producers and consumers, allowing asynchronous communication. Producers publish messages to a queue or topic, and consumers read those messages (often at their own pace).
Publish/Subscribe Model: Design the system to support Pub-Sub. A topic can have multiple subscribers. For example, Kafka topics allow a consumer group to each get a share of messages (for load balancing), or multiple separate consumers to each get all messages (for fan-out).
Partitioning: For scalability, split the queue/topic into partitions. Each partition is a log of messages that can be stored and read independently. This allows multiple consumers to process in parallel (each partition is handled by one consumer in a group) and improves throughput. Messages within a partition remain ordered.
Reliability: Ensure no message is lost. Use durable storage for messages (e.g., write to disk) and replication across servers. Acknowledge messages: consumers send an ack when processed; if a message isn’t acked (consumer died), it should be retried or handed to another consumer.
Delivery Semantics: Aim for at-least-once delivery (default in many systems; a message will be retried if a consumer fails, possibly causing a duplicate). If needed, design for exactly-once processing using deduplication or transactional consumption (more complex).
Scalability & Components: The system consists of brokers (servers that store and serve messages). Use a cluster of brokers with a coordination service (like ZooKeeper) to manage the cluster state (for Kafka). A load balancer or client library logic can direct producers/consumers to the correct broker/partition. Ensure monitoring of queue length and consumer lag for health.

Sample Solution Approach:

Features: Users follow others and see a news feed of posts. The feed should show relevant or recent posts from people they follow, possibly ranked by time or popularity.
Feed Generation: Two common approaches:
- Push Model (Fan-out on write): When a user posts, push that post into the feed list of all their followers (e.g., insert into a timeline database for each follower). This makes reading the feed fast (just read precomputed feed), but writing can be heavy if a user has millions of followers.
- Pull Model (Fan-out on read): Store posts in a central repository and fetch on-demand. When a user loads their feed, query the database for recent posts from all users they follow, then merge results. This can be slow at read time but avoids massive writes.
- In practice, a hybrid is used: pre-compute feeds for average users (push), and handle extremely high-follower users with pull or specialized handling.
Real-Time Updates: Use WebSockets or long-polling to push new posts to users in real time (e.g., "X just tweeted" appearing without refresh). Alternatively, the app can periodically poll for new posts.
Ranking Algorithm: For beginner design, you can assume a simple chronological feed (most recent first). In advanced systems, a ranking service might reorder posts based on relevance (interests, engagement, etc.).
Data Storage: Use a NoSQL database or wide-column store to maintain user feeds and posts (for example, Cassandra or MongoDB, which can handle large volumes of data and fast writes). Store each post with an ID, author, timestamp, and content. Feeds could be stored as lists of post IDs for each user.
Scalability: Partition data by user or by post ID to spread load. Use caching to store feeds for active users in memory. A load balancer and multiple application servers handle the feed generation requests. Ensure the design can support high fan-out (celebrities posting) and high fan-in (many users reading).

Learn how to design Twitter.

6. Design a File Storage Service (like Dropbox or Google Drive)

Sample Solution Approach:

Overview: Users upload, download, and share files. The system should store files reliably and allow access from anywhere. Key points are handling large files, ensuring durability, and scaling storage.
File Storage & Sharding: Break files into chunks (e.g., 4MB each). Store chunks across a distributed storage cluster. Each chunk is saved on multiple storage nodes (for redundancy). This way, large files can be downloaded/uploaded in parallel and one server doesn’t hold the entire file.
Metadata Management: Maintain a metadata service (database) that tracks files and their chunks. For each file, store metadata like filename, owner, and a list of chunk IDs/locations. A relational database or distributed NoSQL store can be used for this metadata. This service is critical; consider making it highly available (replication, leader election for master node).
Uploading/Downloading Workflow: When a user uploads, the service:
1. Receives the file, splits it into chunks.
2. Stores each chunk on a storage node (possibly multiple copies).
3. Records the chunk locations in the metadata DB. For download, the service looks up the chunk locations and streams them to the user, reassembling the file.
Scalability: Use horizontal scaling for storage nodes; new servers can be added to increase capacity. Use a load balancer or a coordinator to route chunk requests to the right node. Use content hashes or checksums for data integrity (ensure no corruption in transit).
CDN Integration: For frequently accessed files or videos, integrate with a Content Delivery Network (CDN). The CDN caches files on edge servers closer to users, reducing latency and load on the core storage service.
Extra Features: Implement versioning (keeping older versions of files) and sharing permissions. These add complexity but can be handled by additional metadata (like file version history, access control lists).

Learn how to design Dropbox.

7. Design an E-Commerce System

Sample Solution Approach:

Scope: An e-commerce platform includes browsing products, managing a cart, placing orders, processing payments, and handling inventory. It’s a broad system, so clarify which parts to focus on (usually the checkout flow).
Microservices Architecture: Consider splitting into services for modularity (especially in a large system). For example: Product Service (catalog, product details), Cart Service, Order Service, Payment Service, User Account Service, etc. These services communicate through APIs or messages. However, for a beginner, a well-structured monolith can also be described, focusing on components.
Cart Management: When a user adds items to their cart, store the cart in a database or in-memory cache (like Redis) associated with the user’s session or ID. Carts should persist so users can come back later. Ensure cart data is consistent (item price shouldn’t change unnoticed – consider showing current price at checkout).
Order Processing: When the user checks out:
1. Inventory Check: Verify stock for each item and lock or reduce inventory count to prevent overselling.
2. Payment Handling: Interact with a payment gateway (like Stripe/PayPal) to charge the user. This may be an external API call. Handle success or failure (if failure, release any held inventory).
3. Order Creation: Create an order record in the database with items, user info, shipping address, etc. Mark order status (e.g., Pending, Completed, Failed).
4. Confirmation: On successful payment, confirm the order to the user and maybe send a notification/email. If using microservices, the Order Service might publish an "order placed" event for other services (like a Warehouse Service for fulfillment or Notification Service).
Scalability & Reliability: Use a robust relational database for order records (for ACID properties) and possibly a NoSQL for product catalog (for flexible schema and high read throughput). Introduce a message queue for processing actions asynchronously (for example, to process orders or send notifications without slowing the user request). Implement redundancy for critical components (multiple app servers behind a load balancer, a primary-secondary DB setup for failover).
Performance: Cache frequent queries (like product details pages). Use search indices for product search. Ensure the web servers can handle spikes (like sales events) by auto-scaling or having enough headroom.
Security: Don’t store sensitive payment data directly; rely on third-party payment processors. Use HTTPS to secure transactions.

Learn how to design e-commerce system.

8. Design a Video Streaming Service (like YouTube or Netflix)

Sample Solution Approach:

Content Ingestion: Creators upload videos, or movies are ingested into the system. Each video is processed by an encoding pipeline to create multiple resolutions/bitrate versions (e.g., 240p, 480p, 1080p) to support different internet speeds and device capabilities.
Storage & CDN: Store the encoded video files (often in small segments for streaming, a few seconds each) in a distributed storage. Use a Content Delivery Network (CDN) to cache and serve video segments to users globally. This minimizes buffering by having servers closer to users. The main system uploads content to the CDN edge servers.
Streaming Protocol: Use adaptive bitrate streaming protocols like HLS or DASH. The client video player requests segment files and can switch between quality levels on the fly based on current bandwidth. Essentially, the client fetches a playlist/manifest file and then sequentially downloads video chunks.
Metadata & Catalog: Maintain a database for video metadata (title, description, length, view count, etc.) and user data (profiles, watch history, subscriptions). This could be a relational database for reliability or a NoSQL store if needing to scale reads heavily.
Video Serving Workflow: When a user hits play:
1. The client requests a video manifest (list of chunk URLs).
2. The client then requests chunks from the CDN. If the CDN doesn’t have a chunk, it pulls from origin (the central storage) and then caches it.
3. The video plays as chunks stream in. The client monitors connection and adjusts quality as needed.
Scalability: The system should handle many concurrent streams. Key is offloading delivery to CDNs and using multiple encoding servers for processing uploads. Use load balancers and possibly microservices: e.g., a Streaming Service to handle user playback requests, a Encoding Service, a User Service for profiles, etc.
Additional Features: Implement recommendation systems (suggest next videos) and analytics (track what users watch to optimize content distribution). These rely on big data processing but can be noted as separate components that consume event logs of user behavior.

Learn how to design video streaming service.

9. Design a Search Engine (like Google)

Sample Solution Approach:

Web Crawler: Have a component that crawls web pages (or documents) to gather content. The crawler follows links and downloads pages, storing raw data for processing. This runs continuously to discover new or updated content.
Indexing: Process crawled pages to build an inverted index. An inverted index maps keywords to the list of documents (and positions) where they appear. This structure allows fast lookup of documents by a query term. The indexing system parses text, filters out stop-words, stems or normalizes words, and then updates the index.
Data Storage: The index can be huge, so use a distributed storage (like a cluster of servers each holding a portion of the index). Partition the index by terms (e.g., index 'a-m' on one server, 'n-z' on another) or by document IDs. Use compression to store indices efficiently on disk. Also maintain a document store for the page content or snippets.
Query Processing: When a user searches, the query is broken into terms, and the index is consulted to find candidate pages that match those terms. Then a ranking algorithm is applied to sort results by relevance. Ranking factors could include keyword frequency, page popularity (e.g., PageRank or number of backlinks), freshness, and user location or personalization.
Optimization: Use caching for common queries (store the top results in memory). Use autosuggestion/trie for prefix matching as the user types (optional feature).
Architecture & Scaling: Deploy multiple search clusters. A query dispatcher can send the query to multiple servers in parallel (each handling part of the index) and then merge results. Ensure each component is distributed: multiple crawlers working together, multiple indexers updating shards of the index, and multiple query servers for redundancy. Use load balancing at the query level to handle many requests.
Freshness and Maintenance: Periodically update the index with new data from crawlers (incremental indexing). Also handle removals (e.g., deleted pages) by removing them from the index. Monitor performance to re-balance index shards if one grows too large.

Sample Solution Approach:

Core Features: Riders can request rides via a mobile app, drivers get those requests, and the system matches them. Key challenges are real-time vehicle tracking, dispatching the nearest driver, and handling dynamic pricing.
System Components:
- Client Apps: Mobile apps for riders and drivers, which send location updates and requests to the backend.
- Location Tracking: The driver app continually sends GPS updates to the server. Use a geospatial indexing technique (like a quadtree or geohash grid) to bucket drivers by location. This allows querying “drivers near a location” efficiently.
- Matching/Dispatch Service: When a rider requests a ride, the service finds an available driver nearby. It queries the location index for drivers within X miles, then picks the best (e.g., the closest or those with highest ratings). The request is sent to that driver’s app; if they don't accept, send to the next option.
- Ride Management: Once a driver accepts, lock that driver to the rider. Both apps get each other’s info and start the ride. The system continues to track the ride status (start, ongoing, completed).
Scalability: Real-time operations are handled in memory for speed. Use efficient pub-sub or push notifications to update driver apps (e.g., via Firebase or sockets) and rider apps (driver en route, etc.). Partition the dispatch system by region (so, separate instances handling different cities or zones) to scale and reduce latency.
Surge Pricing: A monitoring service checks supply and demand in each area. If demand far exceeds available drivers, it triggers surge pricing (multiply fares). The pricing service updates the fare rates which the matching service uses when quoting a price to riders. This component can use statistics (number of waiting requests vs free drivers) to adjust prices dynamically.
Data Storage: Use databases to store persistent data: rider and driver profiles, ride history, payments. For active real-time data (available drivers, ongoing rides), keep them in fast data stores or in-memory. Also use a queue for ride events (for example, to send ride info to a billing service or notification service asynchronously).
Reliability & Fault Tolerance: Ensure no single point of failure. Multiple dispatch servers per region with a coordination mechanism (so if one fails, another takes over). Drivers and riders should still be able to function if one component goes down (degrade gracefully or quickly failover). Also, implement security for payments (likely integrate with a payment processor when charging rider at end of ride).

Learn how to design Uber.

Best Practices for Answering System Design Questions

Clarify Requirements: Always start by asking clarifying questions. Understand the scope – what features are in or out, expected scale (users, data size, QPS), and any specific goals (e.g., low latency, high availability, consistency needs).
Structure Your Response: Outline the high-level architecture first. Identify the major components (clients, servers, databases, external services, etc.) and how data flows between them. This shows the interviewer you have a plan before zooming into details.
Discuss Trade-Offs: There’s no one “right” design – each decision (SQL vs NoSQL, monolith vs microservices, in-memory vs disk, consistency vs availability) has pros and cons. Explain why you choose a certain approach and mention the trade-offs. For example, “Using NoSQL will give us scalability and schema flexibility, but we might sacrifice some consistency on multi-item transactions – which is acceptable for this use case.”
Consider Scalability: Talk about how the system can handle growth. Mention techniques like load balancing, sharding databases, caching, using CDNs, and horizontal scaling (adding more servers) vs vertical scaling. If a component could become a bottleneck, describe how to scale it out or partition the work.
Plan for Fault Tolerance: Design for reliability. Mention redundancies: multiple servers (so one failure doesn’t take down the system), data replication (to avoid data loss), and backups. Also consider what happens during network issues or high load – perhaps queue requests, use circuit breakers, or degrade non-essential features.
Communicate Clearly: In an interview, speak as you draw the system on the whiteboard (or virtual board). Use clear terminology and check if the interviewer is following or if they have hints. A structured approach (requirements -> high-level design -> component details -> scaling and trade-offs) is often appreciated.

Final Thoughts & Key Takeaways

Practicing these system design questions will build your confidence for the real interview.

We covered critical concepts like database choices, caching strategies, messaging systems, and scaling techniques across different scenarios.

Remember, system design is as much about communication as it is about architecture. Always articulate your thought process and be open to feedback or hints.

Key takeaways: Focus on the fundamentals – scalability, reliability, consistency, and maintainability. There’s no perfect design, so justify your decisions and acknowledge alternatives. Lastly, consider doing mock system design interviews with a friend or colleague to simulate the pressure and get comfortable with the format. With consistent practice and a clear approach, you’ll be well-prepared to tackle system design interviews. Good luck!

Recommended Resources

Check out Grokking System Design Fundamentals or Grokking the System Design Interview to learn how to design important system design interview questions.

For practicing advanced problems, explore 50 Advanced System Design Interview Questions.

System Design Interview Practice Questions With Sample Solutions

10 System Design Practice Questions with Sample Solutions

1. Design a URL Shortener (like Bit.ly)

2. Design a Rate Limiter

3. Design a Distributed Cache System

4. Design a Messaging Queue (like Kafka or RabbitMQ)

5. Design a Social Media News Feed (like Twitter)

6. Design a File Storage Service (like Dropbox or Google Drive)

7. Design an E-Commerce System

8. Design a Video Streaming Service (like YouTube or Netflix)

9. Design a Search Engine (like Google)

10. Design a Ride-Sharing System (like Uber)

Best Practices for Answering System Design Questions

Final Thoughts & Key Takeaways

Recommended Resources