0% completed
Let's design a notification system that works seamlessly for various applications, whether it's a social media platform, an e-commerce site, or a productivity tool.
Difficulty Level: Medium
1. What is a Notification Service?
Did you get those helpful alerts on your phone? Like when a friend likes your post, your order ships, or you have a meeting reminder? That’s a notification service in action! It sends timely updates to keep you informed without having to check your apps all the time. Whether it’s about a comment, a deal, or a task, this system makes sure you get important info right when you need it.
In this lesson, we’ll design a notification system that keeps users updated with real-time alerts, from messages to reminders. We’ll explore the essential components to build a service that delivers timely notifications across various channels.
2. Core Entities of a Notification Service
To design an effective notification service, it's important to understand the high-level flow of notifications and the core entities involved. This understanding helps in creating a system more efficiently.
At a high level, the notification process can be viewed in terms of how notification moves from the origin (where the event occurs) to the end user. Let’s consider a practical example to illustrate this:
Let's understand all the entities involved in the process
Client: The source of the notification. For example, Emma's Instagram app sends a request to the notification server when she posts a new photo.
Notification Server: Acts as a central hub that processes and manages incoming notification requests. It receives the notification request from the client (e.g., Emma’s Instagram app), organizes it, and queues it for delivery to the users.
Notification Executor: This component sends the notifications to the end-users. It picks up the queued notifications from the notification server and delivers them through the appropriate channels (e.g., email, push notification).
Notification Query Service: This service handles user requests for notification details. For example, John might check his notification history or search for specific notifications he received. The service queries the relevant data and returns the requested information, leveraging a cache to speed up responses for frequently accessed notification details.
Users: The recipients of the notifications. For instance, John receives a notification about Emma’s new photo, and when he interacts with it, Emma receives a notification about John’s interaction.
Now that we have a basic understanding of the process let's move toward the system design process by exploring the requirements and goals of the system.
3. Requirements and Goals of the System
We'll focus on the following set of requirements while designing the notification system:
Functional Requirements
Event Detection: The system should capture and process events that necessitate notifications, such as new messages, updates, or alerts.
Notification Types: The system must support various notification methods, including push notifications, emails, SMS, and in-app alerts.
Notification Handling: The system should support both bulk and single notification handling. It should efficiently send notifications to individual users as well as broadcast messages to multiple users simultaneously.
Searching Feature: The system should provide a robust search functionality for finding specific notifications and managing user preferences.
Delivery Preferences: Users should be able to specify their preferred channels (such as push, email, SMS) and set timing preferences for receiving notifications.
Non-Functional Requirements
High Availability: The notification service must be highly available, ensuring consistent notification delivery.
Low Latency: The system should maintain an acceptable latency, aiming for notification delivery within 200ms.
Scalability: The system should handle a large volume of notifications and users without performance degradation.
Reliability: The system should ensure that notifications are not lost and are delivered accurately, even under high load conditions.
4. Capacity Estimation and Constraints
Let's assume we have 1 million daily active users (DAU), each triggering an average of 10 notifications per day. This translates to 10 million notifications daily.
Storage Estimation: Assuming each notification record is 200 bytes on average (including notification ID, user ID, content, timestamp, and status), to store all the notifications for one day, we would need approximately 2 GB of storage.
10 million notifications * 200 bytes = 2 GB/day
To store one month of notification history, we would need approximately 60 GB of storage.
2 GB/day * 30 days ≈ 60 GB
Besides notification records, we also need to store additional data such as user preferences and notification logs. This calculation does not account for data compression and replication.
Bandwidth Estimation: If our service is processing 10 million notifications per day, and each notification request/response pair is approximately 1 KB (including headers and metadata), this will result in about 116 KB of incoming and outgoing data per second.
(10 million notifications * 1 KB) / 86400 seconds ≈ 116 KB/s
Since each notification involves both an incoming request (to trigger the notification) and an outgoing response (to deliver the notification), we need the same amount of bandwidth for both upload and download.
5. System APIs
We can have SOAP or REST APIs to expose the functionality of our notification service. The following could be the definitions of the APIs for managing notifications:
- Retrieve the user data
/FetchUserData
Request Parameters:
api_dev_key (string): The API developer key of a registered account. Used to throttle users based on their allocated quota.
userId (string): The unique identifier of the user.
Response: 200 OK: Returns a JSON object containing the user data.
{ "id": "12345", "name": "John Doe", "notificationPreferences": { "email": true, "push": false } }
- Fetch Notifications
GET /FetchNotification
Request Parameters:
api_dev_key (string): The API developer key of a registered account. Used to throttle users based on their allocated quota.
userId (string): The unique identifier of the user.
filter (string, optional): Criteria to filter notifications (e.g., unread, type).
Response:
200 OK: Returns a JSON object containing the list of notifications for the user.
{ "notifications": [ { "id": "notif123", "type": "post_like", "message": "John liked your post", "timestamp": "2024-06-12T12:00:00Z" }, { "id": "notif124", "type": "shipment", "message": "Your order has shipped", "timestamp": "2024-06-12T15:00:00Z" } ] }
- Query Notifications
POST /QueryNotifications
Request Parameters:
api_dev_key (string, required): The API developer key of a registered account. Used to throttle users based on their allocated quota.
userId (string, required): The unique identifier of the user.
query (string, required): A user-defined query string to filter notifications. This can include criteria like date range, type, read status, and other custom filters.
Response:
200 OK: Returns a JSON object containing the filtered list of notifications for the user.
{ "notifications": [ { "id": "notif123", "type": "post_like", "message": "John liked your post", "timestamp": "2024-06-12T12:00:00Z", "readStatus": "unread" }, { "id": "notif124", "type": "shipment", "message": "Your order has shipped", "timestamp": "2024-06-12T15:00:00Z", "readStatus": "read" } ] }
- Send Notification
POST /SendNotification
Request Parameters:
api_dev_key (string): The API developer key of a registered account. Used to throttle users based on their allocated quota.
userId (string): The unique identifier of the user.
notificationType (string): The type of notification (e.g., email, SMS, push).
message (string): The content of the notification.
Response:
200 OK: Indicates that the notification was successfully queued for delivery.
{ "status": "success", "message": "Notification sent successfully." }
These APIs will enable our notification service to interact seamlessly with other components and external systems, ensuring that notifications are dispatched accurately and in real-time to meet user expectations and system requirements.
6.Database Schema
The database schema for the notification system is designed to efficiently store, manage, and retrieve notifications, user preferences, and event data. It supports timely notification delivery across various channels, ensuring high performance, scalability, and flexibility for diverse applications like social media, e-commerce, and productivity tools.
Here’s a breakdown of each table and its purpose:
Users Table: Stores user information and their preferences for receiving notifications via different channels (email, SMS, push).
Notifications Table: Logs details of each notification to be sent, including type, message content, status, and timestamps.
Notification Logs Table:: Records actions taken on each notification, such as sending status and any errors, to track delivery and troubleshoot issues.
Event Detection Table: Captures events that trigger notifications, storing event type, data payload, and detection time.
This schema supports seamless and scalable notification delivery, adapting to various user preferences and ensuring reliable communication across applications.
7. High Level Design
Let's look at this High-Level Design (HLD) diagram that maps out the architecture and workflow of our notification service.
The detailed workflow would look like this:
-
User A sends a single notification or queries notification details, while User B sends multiple notifications at once or queries notification details.
-
The Notification Server receives the requests, validates them, and identifies whether they are single or bulk. It retrieves recipient preferences from the Main DB to determine how notifications should be customized and delivered.
-
The server customizes notifications based on recipient preferences and adds them to a queue for orderly processing.
-
The Notification Execution Service sends notifications via preferred channels (desktop, mobile, tablet) and logs delivery status and any issues.
-
The Notification Query Service handles queries about notification details by fetching the relevant information from the Main DB. The service then sends the query response back to User A or User B, providing details such as delivery status, recipient interactions, and any issues encountered.
-
Clients B, C, and D receive notifications on different devices, completing the notification delivery loop.
This interconnected process ensures efficient and timely delivery of notifications, adapting to user preferences and enabling seamless communication across applications. But this design has some core issues.
-
Querying the Main Database for user preferences causes delays under heavy load. The Notification Query Service adds latency, impacting real-time processing.
-
Single points of failure (SPOF) in key components can disrupt the process, causing missed or delayed notifications due to a lack of redundancy.
-
Handling diverse notification types for different channels (push, email, SMS) adds complexity, potentially leading to inconsistencies and delays.
-
Processing different types of notifications (single vs. bulk) and various formats requires complex logic, increasing the risk of delays and errors.
How will we reduce delays from querying the main database?
We can minimize these delays by implementing a caching layer. By storing user preferences in a fast, in-memory cache like Redis, we avoid the need to query the main database for each notification. This approach reduces the load on the main database and speeds up data retrieval, significantly enhancing real-time processing. Studies show that caching can reduce data access times by up to 90%, improving performance during peak loads.
How will we address single points of failure (SPOF) in key components?
To address Single Points of Failure (SPOF) in key components such as the Notification Server and the UserData database, we implement redundancy and failover mechanisms. For the Notification Server, multiple instances are deployed and managed through a Load Balancer, ensuring traffic is evenly distributed and rerouted if an instance fails. For the UserData database, we use replication strategies like primary-secondary or multi-primary replication to maintain copies of data across different nodes. This setup is supported by automated failover systems to ensure data availability and continuity in case of server failures.
How can we handle the complexity of diverse notification types for different channels?
We can address this complexity by integrating third-party services for each notification type.
-
iOS Push Notifications: Use the Apple Push Notification Service (APNS) to deliver notifications to iOS devices. The iOS app registers for notifications and receives a device token from APNS. The backend server sends the notification request to APNS, which routes it to the iOS device for display.
-
Android Push Notifications: Use Firebase Cloud Messaging (FCM) to send notifications to Android devices. The Android app registers for notifications and gets a registration token from FCM. The server sends the notification request to FCM, which routes it to the Android device for display.
-
SMS Notifications: Use services like Twilio to send SMS notifications. The application sends an SMS to Twilio’s API, which processes the message and routes it to the recipient's carrier network. The carrier delivers the SMS and updates the application on delivery status.
-
Email Notifications: Use services like SendGrid or Mailchimp to send emails. The application sends email details to the Notification Executor, which forwards them to the email service. The service routes the email to the recipient’s provider and confirms delivery to the Notification Executor.
How can we manage the heavy resource usage of generating notifications?
We can manage this by using message queues and background workers. Tasks like generating notifications and interacting with third-party services can be placed in a message queue (e.g., Kafka or RabbitMQ), and background workers can process these tasks asynchronously. This setup prevents the main system from becoming overloaded, keeping it responsive and efficient even during high traffic.
This modular approach abstracts the complexities of each notification type, simplifies processing, and ensures accurate and timely delivery.
Now let's see how our system looks after incorporating these changes.
8. Detailed Component Design
Now let's understand how these services will work:
Notification Server
The Notification Server is the central coordinator in our notification system, orchestrating the receipt, processing, and delivery of notifications. It efficiently handles various notification types—single and bulk—and manages queries related to notifications. By integrating with external services and maintaining comprehensive logging and monitoring, the Notification Server ensures that notifications are accurately delivered to users through their preferred channels in a timely manner.
Workflow Diagram
Here's an illustration of the Notification Server's workflow, incorporating its key functionalities and query management:
-
Receive Notification: The server captures incoming notification requests triggered by various events from User A (single notifications) and User B (bulk notifications).
-
Request Validation: It checks each notification request to ensure all necessary details are correct and complete, filtering out invalid or incomplete requests to maintain system integrity.
-
Fetch User Preferences: The server retrieves user-specific delivery preferences to tailor notifications according to the recipients' preferred channels (e.g., email, SMS, push notifications).
-
Organize Notification: Validated notifications are organized for processing. Single notifications proceed directly, while bulk notifications are queued for batch processing by workers.
-
Pass to Executor: The server forwards processed notifications to the executor, which is responsible for dispatching them through the appropriate external services (like APNS, Firebase, Twilio, SendGrid) to reach different clients (desktop, tablet, mobile).
-
Log and Monitor: The server logs the delivery status of each notification and monitors system performance. Centralized logging, along with tools like Prometheus and Grafana, track important metrics such as request rates, error rates, and response times. Error tracking systems (like Sentry) spot and report issues quickly. Alert systems notify administrators of any real-time anomalies, ensuring reliable performance and prompt issue resolution.
-
Handle Queries: Users can query the notification system to search for notification details. The Notification Query Service processes these queries, fetching data from caches and the UserData DB to provide real-time or historical information about notifications.
Now, a concern might come to mind. Let’s understand this concern and how we will manage it in our system.
How is the Notification Server monitored and maintained to ensure reliable performance?
To keep tabs on everything, we can use centralized logging to record all notification activities. Tools like Prometheus or Grafana track important metrics like request rates, error rates, and response times, giving us a clear picture of system health. Error tracking systems (like Sentry) quickly spot and report issues, while alert systems notify administrators of any real-time anomalies, making sure problems are fixed quickly and keeping the system running reliably.
Notification Query Service
The Notification Query Service is designed to handle user requests for notification data and manage preferences efficiently. It processes queries by validating them, checking a cache for quick responses, and using ElasticSearch for complex searches. If the requested data isn’t available in the cache, it queries the database. This approach ensures rapid data retrieval and high scalability while maintaining up-to-date and consistent notification data.
-
User Sends Query: Users submit their queries to the Notification Query Service.
-
Query Validation: The service validates these queries to ensure they are correctly formed and authorized.
-
Cache Check: The Query Processor first checks the Cache Layer to see if the requested data is already available for a quick response.
-
ElasticSearch Utilization: If the data is not in the cache, the service uses ElasticSearch to perform detailed searches, efficiently handling complex queries.
-
Database Query: For data not found in the cache or ElasticSearch, the service queries the Database Layer (UserData DB and replicas) to retrieve the necessary information.
-
Returning Results and Cache Update: The service returns the fetched data to the user and updates the cache to streamline future queries, ensuring efficient and scalable data retrieval.
Notification Executor
The Notification Executor uses specialized APIs to deliver different types of notifications. These APIs are tailored to handle the specific formatting and delivery requirements for each channel. Let’s explore how it works:
- iOS Push Notifications: The Notification Executor uses the Apple Push Notification Service (APNS) to deliver notifications to iOS devices. This service manages the necessary data to ensure that the notification reaches the user’s device correctly and is displayed as intended.
Let's see how it works.
When the iOS app registers for push notifications, it receives a unique device token from APNS via the OS. This token is sent to the backend server, which stores it and sends notification requests to APNS upon relevant events. APNS then routes these notifications to the iOS device, where they are displayed to the user.
- Android Push Notifications:
The Notification Executor uses Firebase Cloud Messaging (FCM) to deliver notifications to Android devices. This service handles the necessary data to ensure that the notification reaches the user’s device accurately and is displayed properly.
Let's see how it works.
In the illustrated workflow, the app server sends an encrypted payload to Firebase Cloud Messaging (FCM), which forwards it to the device; the device decrypts the message, displays the notification, and optionally fetches more data from the server to update app content when the user interacts with it.
tions or delivery statuses are logged by the app or server for tracking and troubleshooting.
- SMS Notifications:
The Notification Executor uses third-party SMS services such as Twilio or Nexmo to deliver notifications to mobile devices via SMS. These services handle the necessary data to ensure that the message reaches the user’s phone accurately and is displayed properly. Let's consider Twilio as our service and understand its workflow:
+-------------------+ +---------------------+ | Application | | Twilio API | +--------+----------+ +---------+-----------+ | | | 1. Send Message | +--------------------------->| | | |<---------------------------+ | 2. Process Message | | | +--------+----------+ +---------+-----------+ | Carrier Network |<--------| Twilio | +--------+----------+ +----------------------+ | | | 3. Route Message | | | v v +-------------------+ +---------------------+ | Recipient | | Status Update | +-------------------+ | to | | Application | +----------------------+
In the illustrated workflow, an application sends an SMS to Twilio's API with the recipient’s details. Twilio verifies and processes the message, routing it through the carrier network for delivery to the recipient's phone. The carrier confirms delivery back to Twilio, which updates the message status to "Delivered" and sends this status update to the application.
- Email Notifications: The Notification Executor uses services like SendGrid or Mailchimp to deliver notifications via email. These services manage the necessary data to ensure that emails reach the user's inbox accurately and are displayed properly.
+-------------------+ +--------------------------+ | Application | | Notification Executor | +--------+----------+ +---------+----------------+ | | | 1. Send Email Details | +--------------------------->| | | |<---------------------------+ | | +--------+----------+ +---------+----------------+ | Email Service |<--------| Notification Executor | | (SendGrid/Mailchimp) | +--------------------------+ +--------+----------+ +---------+----------------+ | | | 2. Forward Details | | | v v +-------------------+ +--------------------------+ | Email Provider | | Status Update to | | (Recipient) | | Notification Executor | +--------+----------+ +--------------------------+ | | | 3. Deliver Email | | | v v +-------------------+ +--------------------------+ | Recipient | | Application | +-------------------+ +--------------------------+
As illustrated above, an application sends the email details (subject, body, recipient) to the Notification Executor, which forwards them to an email service like SendGrid or Mailchimp. The email service processes and formats the email, routes it to the recipient’s email provider, and upon successful delivery to the recipient’s inbox, updates the status and informs the Notification Executor.
9. Rate Limiting
Imagine you post a vacation photo on a social media app. Instantly, you get notifications for likes, comments, and new followers. Meanwhile, your shopping app alerts you about a flash sale, and your productivity app reminds you of an upcoming meeting. These notifications keep you updated without constantly checking your apps.
However, if every like on your photo sent an immediate notification, you’d quickly be overwhelmed with alerts. Or imagine if a bug caused the shopping app to send you sale alerts non-stop. Rate limiting steps in to control the number of notifications sent in a set timeframe, preventing system overload and ensuring you aren’t bombarded with too many alerts at once.
For a deeper dive into implementing rate limiting, check out the lesson "Designing an API Rate Limiter".
Rate limiting helps prevent notification bursts, but what if some notifications fail to be delivered. This brings us to the concept of retry mechanisms.
How Does the Retry Mechanism Handle Failed Notifications? When a notification fails to send, the retry mechanism adds it to a retry queue and attempts to resend it later. It uses increasing delays between retries (exponential backoff) to give time for temporary issues to resolve. This method keeps retrying until the notification is successfully delivered or the retry limit is reached, at which point it logs the failure and alerts administrators for further action. This process helps ensure notifications are eventually delivered without overwhelming the system.
Table of Contents
- What is a Notification Service?
- Core Entities of a Notification Service
- Requirements and Goals of the System
- Capacity Estimation and Constraints
- System APIs
6.Database Schema
- High Level Design
- Detailed Component Design
Notification Server
Notification Query Service
Notification Executor
- Rate Limiting