System Design: Rate Limiter

4 minute read

What is Rate Limiter?

Rate Limiter is used to control the rate of traffic sent by client, it limit the numbers of client requested allowed to be sent over a specific period.

it puts a limit on the number of request a service could entertain.

It’s generally used as a defensive layer for a service to avoid their excessive usage.

Design

Requirements

limit the number of requests a client can request the API within a time window
make the limitation configurable
make the limitation applicable across the clusters

when a request is received, rate limiter suggest whether the request should be forwarded to server or not

Components

At high level, we need a counter to keep track of how many requests are sent from user. And DB might not be a good option due to its latency and slowness.

In memory cache might be a better options in this scenario.

rule DB: consisting the rules defined by service owner,
- each rule specify the number of requests allowed for particular client per unit of time.
rule retriever: a background service, which periodically check for any modification of the rules in database
throttle rules cache: cache consists of the rules retrieved from rule database, to serve the rate-limiter request faster than persistent storage.
decision maker: make the decision against the rules in the cache. this component works on one of rate-limiting algorithms
client identifier builder: generate unique id for a request received from client, this is considered as a key to store the user data.

Integrations

First we need to understand, where to put the rate limiter.

We can create a Rate Limiter middleware. Now that micro services have become widely popular and rate limiting is usually implemented within a component called API Gateway

server receive a request, client identifier builder create id for request, then forward it to decision-maker
rules are stored on the disk, workers frequently pull rules from disk, and store them in cache,
decision-maker determine the services required by request, then check the cache against the number of request allowed, as well as the rule provided by service owner
if not exceed the count limit, then forward it to request processor

Some Considerations

rate conditions

this will happen in a situation of high concurrency request patterns.

this happens in “get then set” approach, where the current counter is retrieved, incremented and then pushed back to database.

the problem is, client can send a very high rate of requests to bypass the rate limiting control.

solution is, allow the quota divided into multiple places and divide the load on them

rate limitation operations

each request will retrieve, update and push back the count to the respective cache. this approach could cause the latency if there is a high number of requests.

solution is, we divide the work into offline and online parts.

when client’s request is received, system will just check the count

system will update the counter offline.

Rate Limiting Algorithms

token bucket algorithm

system add a new token to the bucket after specific times
new incoming token will be discarded if the tokens in capacity equals to its capacity
one incoming request will remove one token from the bucket.
if there is no token left when a request is incoming, then this request will be rejected

fixed window counter

dive the time into fixed intervals, called windows
assign counter to each windows
when a specific window receives a request, the counter is incremented by one

the problem with this algorithm, is a burst of traffic might occur at the edges of two windows,

sliding window log algorithm

when a request arrives, its arrival time is stored in hash map.
logs are sorted based on the time stamps
request are allowed depending on the size of log and arrival times.

example

1st request come at 1:00, time window is marked from 1:00 - 2:00
2nd request come at 1:20, window counter is less then threshold, then accept the request
3rd request come at 1:45, exceed windows threshold, request is rejected
4th request come at 2:25, start a new window, keep one from last window, so the updated window is 1:25 - 2:25

Race Condition Handling

We can use sorted set provided by Redis.

each user has a sorted set associated with them, key and value and identical, and equal to the time when action were attempted.
when user attempt to perform action, we first drop all element of set which occurred before the interval
then fetch all element of the set
add current timestamp, which can using ZADD
set TTL equals to rate limiting interval to save the space

Retrospect

this components was asked in one of my interviews, I was stuck at the part of maintain the window counter efficiently for each client, which remind me, after having a high level understanding, still need to dive into the details of each ends
rate limiter is not as easy as this article described, this article doesn’t cover how the rate limiter works under distributed patterns.

Chengze Li

System Design: Rate Limiter

What is Rate Limiter?

Design

Requirements

Components

Integrations

Some Considerations

Rate Limiting Algorithms

Race Condition Handling

Retrospect

You May Also Enjoy

Algorithm: Leetcode Contest 418

Algorithm: Leetcode Contest 417

Algorithm: Leetcode Contest 414

Algorithm: Leetcode Contest 412