What are different rate limiting algorithms?

Question

Design Gurus · Accepted Answer

Rate limiting is a crucial technique in controlling the amount of traffic a server receives within a specified time frame. It's used to prevent overuse of resources, improve server reliability, and ensure fair usage among users. Rate limiting is common in API management to prevent abuse and to manage traffic effectively.

Different Rate Limiting Algorithms:

1. Fixed Window Counter

Description: Divides time into fixed windows and counts the number of requests in each window.
Example: If the limit is 100 requests per hour, and a user makes 100 requests in the first half-hour, they will be blocked for the remaining half-hour, even if the server is underutilized during that time.

2. Sliding Log

Description: Keeps a time-stamped log of requests. It checks whether adding a new request would exceed the rate limit, considering the time frame.
Example: If the limit is 100 requests per hour, each incoming request is checked against the log of requests in the past hour. Older entries are discarded.

3. Sliding Window Counter

Description: A hybrid of the fixed window and the sliding log, offering a balance between efficiency and precision. It combines the fixed window's simplicity and the sliding log's accuracy.
Example: If the limit is 100 requests per hour, the server counts requests in the current window and a fraction of the requests from the previous window, based on the time elapsed.

4. Token Bucket

Description: Uses tokens to control traffic flow. Tokens are added to a bucket at a regular rate and requests consume tokens. If the bucket runs out of tokens, new requests are denied.
Example: A bucket can hold 10 tokens and 1 token is added every 10 seconds. A request needs 1 token to pass. If there's a sudden burst of 15 requests, only 10 can go through, and subsequent requests must wait for new tokens.

5. Leaky Bucket

Description: Requests are added to a queue (bucket) and processed at a fixed rate to smooth out burst traffic.
Example: If the bucket size is 10 and the rate is 1 request per second, and a burst of 20 requests comes in, the first 10 are queued and processed at 1 per second, while the rest are either queued (if the bucket can hold them) or discarded.

Application of Rate Limiting

APIs and Web Services: To control traffic and prevent abuse.
Network Traffic: To control data flow in networks.
Application Servers: To prevent overload and ensure fair usage.

In implementing rate limiting, it's crucial to choose an algorithm that aligns with the system's needs, balancing between fairness, efficiency, and resource utilization.

Check out how to design Rate Limiter.

Learn about system design concepts in Grokking the System Design Interview course.