System Design: Rate Limiter
What is Rate Limiter?
Rate Limiter is used to control the rate of traffic sent by client, it limit the numbers of client requested allowed to be sent over a specific period.
it puts a limit on the number of request a service could entertain.
It’s generally used as a defensive layer for a service to avoid their excessive usage.
Design
Requirements
- limit the number of requests a client can request the API within a time window
- make the limitation configurable
- make the limitation applicable across the clusters
when a request is received, rate limiter suggest whether the request should be forwarded to server or not
Components
At high level, we need a counter to keep track of how many requests are sent from user. And DB might not be a good option due to its latency and slowness.
In memory cache might be a better options in this scenario.
- rule DB: consisting the rules defined by service owner,
- each rule specify the number of requests allowed for particular client per unit of time.
- rule retriever: a background service, which periodically check for any modification of the rules in database
- throttle rules cache: cache consists of the rules retrieved from rule database, to serve the rate-limiter request faster than persistent storage.
- decision maker: make the decision against the rules in the cache. this component works on one of rate-limiting algorithms
- client identifier builder: generate unique id for a request received from client, this is considered as a key to store the user data.
Integrations
First we need to understand, where to put the rate limiter.
We can create a Rate Limiter middleware. Now that micro services have become widely popular and rate limiting is usually implemented within a component called API Gateway
- server receive a request, client identifier builder create id for request, then forward it to decision-maker
- rules are stored on the disk, workers frequently pull rules from disk, and store them in cache,
- decision-maker determine the services required by request, then check the cache against the number of request allowed, as well as the rule provided by service owner
- if not exceed the count limit, then forward it to request processor
Some Considerations
rate conditions
this will happen in a situation of high concurrency request patterns.
this happens in “get then set” approach, where the current counter is retrieved, incremented and then pushed back to database.
the problem is, client can send a very high rate of requests to bypass the rate limiting control.
solution is, allow the quota divided into multiple places and divide the load on them
rate limitation operations
each request will retrieve, update and push back the count to the respective cache. this approach could cause the latency if there is a high number of requests.
solution is, we divide the work into offline and online parts.
when client’s request is received, system will just check the count
system will update the counter offline.
Rate Limiting Algorithms
token bucket algorithm
- system add a new token to the bucket after specific times
- new incoming token will be discarded if the tokens in capacity equals to its capacity
- one incoming request will remove one token from the bucket.
- if there is no token left when a request is incoming, then this request will be rejected
fixed window counter
- dive the time into fixed intervals, called windows
- assign counter to each windows
- when a specific window receives a request, the counter is incremented by one
the problem with this algorithm, is a burst of traffic might occur at the edges of two windows,
sliding window log algorithm
- when a request arrives, its arrival time is stored in hash map.
- logs are sorted based on the time stamps
- request are allowed depending on the size of log and arrival times.
example
- 1st request come at 1:00, time window is marked from 1:00 - 2:00
- 2nd request come at 1:20, window counter is less then threshold, then accept the request
- 3rd request come at 1:45, exceed windows threshold, request is rejected
- 4th request come at 2:25, start a new window, keep one from last window, so the updated window is 1:25 - 2:25
Race Condition Handling
We can use sorted set provided by Redis.
- each user has a sorted set associated with them, key and value and identical, and equal to the time when action were attempted.
- when user attempt to perform action, we first drop all element of set which occurred before the interval
- then fetch all element of the set
- add current timestamp, which can using ZADD
- set TTL equals to rate limiting interval to save the space
Retrospect
- this components was asked in one of my interviews, I was stuck at the part of maintain the window counter efficiently for each client, which remind me, after having a high level understanding, still need to dive into the details of each ends
- rate limiter is not as easy as this article described, this article doesn’t cover how the rate limiter works under distributed patterns.