Rate limiting is essential for most production applications. At a minimum, it prevents floods of traffic from crashing services, and it mitigates an app’s vulnerability to a variety of attacks. Adding and effectively configuring rate limiting is no simple task though. There are many rate limiting tools and strategies to choose from, and it’s difficult to find resources that concretely explain how to use rate limiting to improve an app’s security and resilience.
This is the first article in a series that will bring together the theory and practice of rate limiting, using contextualized real-world examples. In each article, I’ll demonstrate how to add different rate limiters to a Hono app, and discuss the technical and business requirements they fulfill.
I’ll begin with hono-rate-limiter
, a low-lift solution that addresses many common rate limiting needs. While fairly simple to use, it will help illustrate key rate limiting concepts that will remain relevant throughout the series:
- Appropriately identifying clients
- Configuring the allowed request rate
- Performantly tracking client requests
- Communicating limits and handling rejected requests
In the next article, I’ll do a deeper dive into rate limiting algorithms and design, using the Upstash rate limiter to create custom rate limiting middleware. Finally, I’ll walk through how to roll your own rate limiting solution, shifting focus from high-level strategies and patterns to more specific implementation concerns.
First though, I’ll briefly introduce the “whats” and “whys” of rate limiting to provide some context for the following discussion of hono-rate-limiter
implementations. This article won’t be especially technical, but it will help to have a basic familiarity with Hono apps, as well as the business and engineering requirements that affect backend development.
What is rate limiting anyway?
Simply put, rate limiters control how often a resource can be accessed, and by whom. To accomplish this, most limiters use some kind of identifier (e.g., user ID or client IP) to track requests from a given source, and an algorithm to determine if and when requests from each source should be allowed.
The specifics vary by algorithm and implementation, but most limiters are defined by:
- A limit on how many requests can be made within a defined window of time
- The cost of each request counted against the limit
- How often a client’s limit is refilled or reset, allowing additional requests
Client consumption is measured in tokens (or as the sum of request timestamps), typically persisted in some kind of database. Each time the client makes a request, the rate limiting algorithm is used to determine whether the number of recent requests exceeds the limit. Excessive requests are usually rejected, with a Retry-After
header specifying when the client can try again.
Why rate limit at all?
Rate limiting refers to a broad variety of policies and implementations with equally-diverse goals. It is most commonly deployed to manage service usage and capacity, but it also plays a vital role in auth workflows. Naturally, where and how this limiting is implemented depends greatly on the abuse vectors it guards against.
Resource Exhaustion
An app’s ability to process requests is essentially finite. If too many requests come in at once, an unprotected system could crash, or auto-scale to the tune of tens of thousands of dollars.
Floods of traffic may reflect an innocent spike in engagement—thousands of users flocking to buy tickets—or a Denial of Service (DoS) attack intended to block legitimate requests or bring down the server. In either case, rate limiting is used to cap processed requests at a manageable (and affordable) level.
Resource Exploitation
Even if an app can process all the traffic it’s receiving, it may not be serving clients equally or fairly. Especially in the case of paid services—where users expect a defined level of access—it’s vital to constrain how often clients can access resources.
Without account-based rate limits, a minority of users can hog the app’s capacity—e.g., through high-volume scraping, causing other users’ requests to lag or fail. Users can also exceed the consumption rate they’re paying for, leading to a “hidden” loss of revenue.
Note that “capacity” doesn’t refer only to how many requests a service can handle. Both malicious and innocent request patterns can over-consume other app resources, like the availability of merchandise on a digital marketplace (aka Inventory Denial).
Account Hijacking
Not all vulnerabilities are a matter of capacity. Rate limiting in auth workflows is primarily deployed against Brute Force and Credential Stuffing attacks. Brute force attacks attempt to gain access to a system by repeatedly guessing a user’s password, while credential stuffing blindly plugs known credentials into different services, seeking instances of their reuse.
Hono Rate Limiter
The easiest way to mitigate these risks in Hono apps is with hono-rate-limiter
. Built to satisfy basic rate limiting needs in the Hono ecosystem, hono-rate-limiter
is a port of the popular express-rate-limiter
. While much of the API and core logic are the same, the library leverages Hono’s type system to share limiter results with downstream logic. It also includes Cloudflare-specific tooling to support development with the workerd
runtime.
Like many Hono tools, hono-rate-limiter
exports middleware that can be added app-wide, or to a single route or handler. This flexibility is especially important in the context of rate limiting, where multiple limiters may be layered to address different requirements. A data endpoint, for example, might need one rate limit to prevent resource exhaustion and another to enforce account-specific limits by subscription tier.
Window + Limit
As their name suggests, rate limiters constrain the number of requests allowed within a given period. This is typically expressed as a max
or limit
of requests, paired with a time interval that represents how often a client’s available requests are either reset or refilled. These are the primary levers used to configure and tune a rate limiter’s behavior, and each rate limiting algorithm leverages them differently to balance performance and accuracy.
Calculating client request rates
Fixed Window algorithms—like the one used by hono-rate-limiter
, divide time into fixed-length windows (measured in milliseconds), and maintain a counter tracking the number of requests made in each. Setting the limit
to 100 and windowMs
to 60k, for example, allows 100 requests every minute. Requests that exceed the window limit are rejected until windowMs
has elapsed, at which point the counter is reset.
import { Hono } from 'hono';import { rateLimiter } from 'hono-rate-limiter';
const app = new Hono() .use( rateLimiter({ windowMs: 60 * 1000, limit: 100, // ... }), );
Fixed Window limitations
The Fixed Window algorithm is the simplest and most performant, but it has a few key drawbacks. Since it uses fixed windows in time, the algorithm is too rigid to support occasional traffic bursts, even when they’re within the allowed long-term average.
It is also vulnerable to abuse at the window transition. By maxing out requests just before the current window expires, and just after the next begins, malicious clients can momentarily double the limit for a burst of requests.
In the next article, I’ll introduce more flexible and accurate options. Their increased efficacy comes at the cost of logical complexity and performance though, so for many use-cases the Fixed Window algorithm’s strengths outweigh its weaknesses.
Calibrating rate limits
The appropriate configuration for a rate limiter depends largely on the problem it’s meant to solve. Each use-case has distinct priorities and constraints, and effective rate limits are the product of balancing one against the other.
Limiters enforcing tiered API access balance resource costs against subscription prices, while those protecting auth routes prioritize security while allowing for some user error. Likewise, limiters preventing resource exhaustion should reflect average consumption relative to capacity.
When first adding a rate limiter, it’s best to choose a more conservative limit in order to ensure that all users have fair access and that service isn’t disrupted. This initial configuration can then be tuned over time to better reflect actual request patterns.
Sophisticated rate limiters may use dynamic limits to account for fluctuations throughout the day, or to handle traffic spikes related to planned events like a promotion or content release. In most cases though, using server metrics to regularly tune static rate limits is good enough.
Uniquely identifying clients
Both the standard and Cloudflare-specific middleware require a keyGenerator
, which takes Hono Context
as an argument and must return a string key. This key is used to distinguish between records in the limiter store, so for most use-cases it should be unique to each client.
While many rate limiting examples use the client IP for convenience, this isn’t recommended for production apps. IP addresses aren’t guaranteed to be unique, and bad actors can easily spoof IPs or obscure them with a VPN.
A user or account ID is the best option for most use-cases, since their uniqueness makes it easier to apply rate limits fairly and accurately. This is essential for premium APIs and other subscription services, which promise customers a defined level of access for each payment tier.
When to rate limit by IP
Rate limiters protecting pre-login or unprotected routes are a notable exception. Uniquely identifying clients is more difficult in these cases, but also less relevant to the limiters’ role, which is focused on security and capacity management more-so than regulating individual client usage.
In auth workflows, rate limiters make attempts to gain unauthorized access to the system logistically challenging or prohibitively expensive. Since users only need to submit their credentials once, any IPs that exceed a reasonable rate limit are suspicious, and may even be blocked after repeated violations.
Managing service capacity is a more generalized requirement, but is likewise decoupled from individual client usage. Rate limiting can be used to prevent apps from over-consuming downstream APIs, including third-party services. It can also ensure that resource use doesn’t exceed server capacity, or unexpectedly trigger expensive auto-scaling.
Safeguarding user privacy
Since rate limiting databases maintain a record of API usage by key, it’s also vital to consider user privacy when limiting by IP, or other personally-identifiable data. This also includes identifiers like geolocation and device fingerprints, which will be covered in greater detail later in the series.
Storing personally-identifiable data makes users vulnerable to doxxing, targeted cyber attacks, and government overreach. It can also make apps liable to laws like the EU’s GDPR or California’s CCPA, which are designed to protect users by regulating the collection and storage of personal data. Fortunately, there are a few simple steps we can take to protect users and their privacy:
- When collecting or storing identifying data is necessary, clearly communicate to users what data is used and why.
- Obscure identifying data like IPs with one-way hashes (e.g.,
HMAC
), and regularly clear stale records from the limiter database.
As with every engineering problem, ethical rate limiting is a matter of compromise. We can’t safeguard both the application and its users without dealing with at least some identifying data. This shouldn’t be seen as an excuse to cut corners though, but rather as a challenge to better understand rate limiting tools and how to deploy them responsibly.
Generating keys from request data
Rate limiting on protected API routes should take place downstream from auth middleware. This avoids wasting resources limiting an unauthorized request, and ensures that the user or account ID is available when hono-rate-limiter
is called.
import { Hono } from 'hono';import { rateLimiter } from 'hono-rate-limiter';
type AppEnv = { Variables: { accountId: string; }}
const app = new Hono() .use(authMiddleware) .use( rateLimiter<AppEnv>({ // ... keyGenerator: (c) => { return c.var.accountId; }, }), );
Accessing the ID type-safely in the keyGenerator
is a two-step process:
- First, the ID must be
set
inContext
. Some Hono auth middleware do this internally, but others—like the built-in Bearer Auth—do not. In these cases, an additional layer of custom middleware is necessary. - Then, the variable must be added to a custom
AppEnv
type, passed to therateLimiter
middleware as a generic parameter. This type will be applied to thekeyGenerator
’sContext
argument, making the value available viac.var
orc.get
.
Note that this approach is truly type-safe only when the set
invocation and AppEnv
type are coupled. While this isn’t entirely possible due to Hono’s optimistic approach to type-safety, using factory helpers like createHandlers
or createApp
can help keep types in sync with the implementation.
Persisting request records
To keep track of client requests, rate limiters rely on some kind of mid-term persistence. Records don’t need to be maintained forever, but a stateless rate limiter would have no way to know how often a client was hitting the backend, so it would have to naively allow (or reject) every request.
Why not use local memory?
Many rate limiting examples or bare-bones implementations simply track client requests locally, often using a Map
stored in memory. While this can be helpful during development, or for small apps running in a persisted environment, it isn’t a good solution at scale.
Local memory implementations break down in edge environments—where memory isn’t persisted long-term, and in distributed systems—where multiple app instances must be kept in sync. A local solution also means that the limiter and application logic share memory and compute, making scaling more difficult.
Connecting to remote databases
Remote in-memory databases are the most common solution, as they minimize latency in the rate limit layer. Since some algorithms require multiple database calls, atomicity is another important consideration when choosing a storage solution.
Atomic operations are completed as a single unit, even if they have multiple steps. This prevents race conditions that could result in conflicting limits results. I’ll discuss this topic more in the next article on rate limiting algorithms.
Redis is a popular choice, both because it is extremely performant, and because it makes it easy to run multi-step operations atomically. Other in-memory databases like Memcached or Cloudflare KV can also be used, though they are geared towards caching and don’t offer robust support for atomic operations.
Application architecture is arguably the most important factor though. Modern development patterns like distributed applications and edge functions introduce challenges and constraints that affect not just where data is stored, but also how it’s managed.
Distributed limiting
While a centralized store makes rate limit records available across requests and app instances, it introduces a request-processing bottleneck. A flood of requests from instances around the world can bog down the database, causing even legitimate requests to lag or time-out.
In distributed applications, centralized stores can also introduce significant latency. After reaching the handler, each request must first be diverted to the limiter—possibly in a different region—before returning to the handling server for processing.
In the era of edge functions (and runtimes), it’s no surprise that distributed rate limiters are increasingly popular. This strategy involves colocating a limiter data store with each app instance, thereby substantially reducing latency and avoiding a single point of failure.
Of course, distributed rate limiters bring their own set of challenges. As with any distributed database, updates must be synchronized across instances. This ensures that users can’t bypass the limit simply by changing their “location” with a VPN. Special care must also be taken to mitigate race conditions, especially when using non-unique identifiers like IPs.
Data storage with hono-rate-limiter
At the heart of hono-rate-limiter
are its stores. These are adapters that implement the Fixed Window algorithm and communicate with the persistence layer. The library includes a Redis store compatible with a most Redis clients, as well as stores for Cloudflare KV databases and Durable Objects. It can also be paired with a half-dozen third-party stores, and developers can build their own to connect with a different database or use an alternative algorithm.
Setting up a store is pretty straightforward, though implementations and configuration options will depend on the store and database selected. In the simplest case, the initialized database client is passed to the store constructor. For more information on specific integrations, please refer to the hono-rate-limiter
documentation.
import { RedisStore } from "@hono-rate-limiter/redis";import { createClient } from "redis";
const client = createClient({ // redis[s]://[[username][:password]@][host][:port][/db-number] url: "redis://alice:foobared@awesome.redis.server:6380",});
app.use( rateLimiter({ store: new RedisStore({ client }), }),);
Note that database options aren’t necessarily compatible with every architecture. Since the Cloudflare Workers runtime doesn’t support TCP connections, connecting with standard Redis databases would require a custom HTTP client. To use Redis with a Worker, an edge-compatible implementation like Upstash Redis is recommended. Connectivity issues aside, this will also make it easier to keep database instances (and rate limits) in sync.
Using the Cloudflare Rate Limit binding
An adapter for Cloudflare’s Rate Limit binding is also available as a standalone middleware. In this case, a store isn’t required, since the binding handles the algorithm and database connection internally. This also results in some important configuration and implementation differences.
The Cloudflare Rate Limit binding requires a window of either 10 or 60 seconds, restricting configurability. Limits are local to each location the Worker runs in, so a client could bypass limits by directing requests to different Worker instances.
import { cloudflareRateLimiter } from "@hono-rate-limiter/cloudflare";
type AppType = { Bindings: { // `RateLimit` type available globally through @cloudflare/workers-types RATE_LIMITER: RateLimit; }}
app.use( cloudflareRateLimiter<AppType>({ rateLimitBinding: (c) => c.env.RATE_LIMITER, keyGenerator: generateClientKey, }),);
Since the Cloudflare Rate Limit API doesn’t require a hono-rate-limiter
store, it’s unclear which rate limit algorithm it uses. Based on this (somewhat dated) article though, it’s seems safe to assume it’s a Sliding Window Counter. In the next article, I’ll cover the differences between it and the Fixed Window in detail. For now it’s enough to know that the Sliding Window takes the same arguments but offers better accuracy and burst support, making it a better candidate for an enterprise product.
Implementation differences
Though the stores use the same algorithm, there is at least one significant difference in their implementation. When using hono-rate-limiter
with Redis, the windowMs
option is used to automatically delete windows when they expire, improving limiter performance. This behavior isn’t available when storing rate limit data with Cloudflare products like KV Stores or the Rate Limit binding though. Instead, the rate limiting clients exported by @hono-rate-limiter/cloudflare
check whether the tracked window has expired on each request, and manually reset the counter when relevant.
Communicating rate limits
To improve user experience—and to help clients avoid accidentally hitting rate limits—it’s helpful to share rate limit and usage data with clients. In the simplest case, servers return a 429
error when a rate limit is exceeded. This isn’t especially helpful though, regardless of whether the client is an internal front-end, or an end-user. For this reason, both success and error responses often include rate limit information in their headers.
Rate limiters protecting auth logic from attacks are a notable exception. In these cases it is recommended to share as little as possible. Any details about the limit or when it resets make it easier to bypass limits, automate attacks, or avoid detection.
Rate limit headers
Though the standard format for rate limit headers has changed over time, the information shared has remained largely consistent:
- Policy: The service’s request limit and window length in seconds.
- Limit: How many requests the client is allowed within a given window.
- Remaining: The number of allowed requests remaining (
limit - consumed
). - Reset: When the limit will be reset, in seconds.
By default, hono-rate-limiter
uses the Draft 6 standard, setting each value in its own RateLimit-*
header, and including only non-zero Reset values. It also supports the Draft 7 standard, which combines all fields except for Policy into a single RateLimit
header, and always includes a Reset
value. In both cases, a Retry-After
header is set on error responses to let clients know when they can try again.
import { Hono } from 'hono';import { rateLimiter } from 'hono-rate-limiter';
const app = new Hono() .use( rateLimiter({ // ... standardHeaders: "draft-7", // or "draft-6" }), );
I’m not aware of any technical motivations for using one or the other, though Draft 7 may be preferable as it’s the most recent (available) standard. Since hono-rate-limiter
’s implementation of the two standards is essentially the same, the choice may come down to personal preference for the ergonomics of one or the other.
Customizing rate limiters for specific use-cases
Adding rate limiting to a Hono app with hono-rate-limiter
is pretty simple, without incurring vendor lock-in or conforming to a prescriptive API. While it may not be sufficient for all use-cases, it’s a solid and easily-extensible starting point. Its greatest limitation may be its use of the Fixed Window algorithm, but this can be addressed with a custom store.
I hope that this has been a helpful introduction to rate limiting in the Hono ecosystem. In the next article I’ll cover rate limiting algorithms in greater depth, addressing their trade-offs and explain how to extend hono-rate-limiter
with a custom store. If there’s something I’ve missed, or that you’d like me to cover in greater depth, feel free to open an issue in the blog repo!