New in ASP.NET Core 7: Rate Limiting Middleware

In the latest version of ASP.NET Core 7, a new middleware is available for managing rate limits. What does it do? How do you set it up? In the latest installment in this series, we introduce you to the fundamentals of Rate Limiting.
What Is Rate Limiting?
Rate limiting is a method of controlling the flow of requests for a resource.
The method is based on the definition of a metric, usually a numeric value, which represents the upper or lower permitted limit for a resource. Beyond this limit, the system will be unable to guarantee request processing. In some situations, adding time to the equation helps define a rate-limiting window.
If this limit is reached, the system will propose an alternative, from simply refusing the request to implementing deferred processing.
Why Set up Rate Limiting?
The concept of rate limiting is often misunderstood as a reduction in service quality when, in fact, the opposite is true.
A rate limiter improves the security of our applications. It acts as a flow controller that prevents malicious or fraudulent use of our systems. This makes us less vulnerable to certain attack types, such as denial-of-service (DoS), distributed denial-of-service (DDoS), brute force attacks, and illegal web scraping.
A rate limiter protects against significant changes in operating costs. For example, a rate limiter is highly recommended for applications that use an auto-scaling model because it protects against any changes in workload that are excessive or unnecessary.
In addition to security and cost optimization aspects, a rate limiter can be used in so-called “As a Service” applications to control how often specific resources can be accessed based on the pricing plans set by the service provider.
Rate Limiting in ASP.NET Core 7
Microsoft announced the introduction of a new System.Threading.RateLimiting package in the new .NET 7 release. This will support rate limiting by implementing the four most common algorithmic models:
- Token bucket
- Concurrency
- Fixed window limiter
- Sliding window limiter
In addition to these four models, there is also a new middleware for adding rate limiters to ASP.NET Core applications.
This new middleware is distributed via the Microsoft.AspNetCore.RateLimiting package.
The two new packages are available in the latest version of the .NET 7 software development kit (SDK), which you can download here: https://dotnet.microsoft.com/en-us/download/dotnet/7.0.
The rest of this post will review the various aspects of this new announcement.
Configuring “Rate Limiting” in an ASP.NET Core Application
Before we get into the code, let’s define the problem framework.
We want to expose an API route that will return weather forecasts.
For the reasons we mentioned at the beginning of the article (safety, cost control, etc.), we’ll use a rate limiter to meet the following requirements:
- Requirement 1: Our API must allow a maximum of five requests per connected device every ten seconds.
- Requirement 2: If the limit is exceeded, our API should return HTTP code 429 with the following message: “you have reached the maximum permitted number of requests for your IP address ({@ip_address}).”
Adding the Rate Limiting Middleware to Our Application
The technique is the same as for all ASP.NET Core middleware:
- Use the “services.AddRateLimiter()” extension method to add the required services and configuration to the Inversion of Control (IoC).
- Add the middleware to our ASP.NET Core pipeline with the “app.UseRateLimiter()” The order of addition is critical to avoid interfering with the operation of other middleware, such as cache management or the authentication system.
Selecting and Configuring the Rate Limiter
We’ll use a FixedWindowRateLimiter to meet the first requirement (we’ll talk more about this limiter later in the article):
- PermitLimit: shows how many requests are allowed.
- Window: the duration of a limit cycle, ten seconds in this case. After each cycle, the PermitLimit value will be reset to five new authorized requests.
Changing the Limit Reached Return Message
If the limit is reached, the middleware will reject the requests and send the request initiator an error message with the code “503 – service unavailable.”
This error code indicates that our API cannot handle the request. This could lead to a misinterpretation of the message by some service users, as it is not a question of service unavailability but rather that the number of authorized requests has been exceeded.
In this case, we’ll use the “RejectionStatusCode” option to change the return code to one that is more appropriate: the HTTP code “429 – too many requests.”
To change the return message, we will use the “OnRejected” function, which lets us use “HttpContext” to specify a more detailed response.
Partitions and Rate Limiter Variations
In the previous example, we used “HttpContext” to create a partition key when creating the rate limiter.
Partition keys provide an appealing way of varying the use of multiple limiters to meet a variety of scenarios.
To illustrate what this means in practice, let’s add some new rules for our API.
Requirement 3: In the next few days, we will be offering a subscription package with the following terms:
- Free package: users of this package will only be able to make five requests per minute
- Standard package: users of this package will have 60 requests per minute
- Premium package: users with a Premium package will not be subject to any limits
The code below shows we can use three different kinds of the same limiter to meet this new rule.
Application Models for Rate Limits
In the examples we’ve shown so far, we have only used one type of limiter, Fixed Window. What about the other three algorithms?
In this section, we’ll revisit the unique aspects of each model.
Token Bucket Limiter Model
The Token Bucket Limit model is based on a token bucket system, as the name suggests. For a request to be processed, a valid token must be available. After the request is processed, the token is destroyed. The total capacity of the bucket determines how many tokens are allowed.
The system must check the bucket for each request to see if any tokens are left. If there are, a token will be assigned to that request, and the request will be processed. If there are no tokens left, the request will be rejected.
You can add tokens to the bucket at fixed, predetermined times. On the other hand, and this is what makes this model unique, the total number of available tokens can never exceed the bucket capacity. If it does, the tokens will be lost.
The Fixed Window Limit/Fixed Window Counter Model
The fixed-window rate limiter model is based on a two-variable limit: a counter for allowed requests and a time interval called the limiter window. The window is regarded as a fixed and indivisible entity in this model.
The counter will go down by one unit for each request received during the time interval. If the counter reaches zero before the end of the time window, any subsequent requests will be rejected.
The counter will be reset to its original value at the start of each new window cycle. New requests will then be allowed again.
Sliding Window Limit/Sliding Window Counter Model
The sliding window limiter differs from the fixed window limiter in that it does not limit requests based on an indivisible unit of time. Instead, the limiter window is split into multiple equal-length parts.
For example, if we have a 20-minute window with two segments in each window, we get two 10-minute windows.
Any new requests will be assigned to the current segment window. The number of requests will be deducted from the total number of requests received and will not be reassigned for the time being.
At the end of each segment interval, the limiter window slides one segment unit to the right. This frees up a segment on the left, and the number of requests assigned to that segment will be reallocated.
The sliding direction (left, right) is given so you can build an image to help you understand how this model works.
Here is a simulation of the above code to help you see the steps and how the different metrics change:
- Legends:
- Interval/segment: represents the window of the current “S” segment.
- Available: this is the number of requests allowed for the current segment “S.”
- Accepted: shows how many requests were received in the “S” segment window.
- Reallocated: this is the number of requests authorized again after segment “S-2” is released.
- Carried forward: this is the number of requests authorized for the new segment “S+1” at the end of the “S” segment period.
- Simulation:
- Interval/segment: represents the window of the current “S” segment.
At time T=0, the current segment is S2 (in bold in the table).
Time | Available (S) | Accepted (S) | Reallocated (S-2) | Carried forward (S+1) |
0 /S1 | 40 | 0 | 0 | 0 |
0 /S2 | 40 | 10 | 0 | 30 |
10/S3 | 30 | 5 | 0 | 25 |
20/S4 | 25 | 10 | 10 | 25 |
30/S5 | 25 | 20 | 5 | 10 |
30/S6 | 10 | 0 | 10 | 20 |
The Concurrency Limit Model
The goal of this model is to limit the number of requests a resource can process simultaneously.
For each request, one unit is deducted from the total number of authorized requests, and this number is locked while the request is being processed. Once the limit is reached, any future requests will be rejected.
When a current request is finished, and the lock is removed, the counter is increased by one.
I find this model the easiest to understand but the least predictable.
Testing the Rate Limiter
Before we wrap up, it’s important to remember that deploying a rate limiter within a production environment will inevitably have consequences. These may be positive or negative. For this reason, you must make sure your rate limiters are functioning correctly by conducting extensive testing before releasing them into production.
You can perform load tests with tools such as JMeter and Azure Load Testing to check that the results produced by your rate limiters meet your needs.
Here is the result that JMeter gave us for our first scenario:
More About Rate Limiting
This post covered some of the basic features of the new Rate Limiting middleware. There are still many other features to discover, such as policy management, authentication management, reverse proxy support, etc.
You can find further details and other useful information in this announcement: Rate Limiting for .NET.
Want to learn more about the new features in.NET 7? Here are the rest of the posts in this series: