Cooling Down Hot Partitions in Azure Cosmos DB
When working with Azure Cosmos DB, one of the primary design activities you will undertake is appropriate partition design. Here's how to go about it.
Jun 08, 2023 • 4 Minute Read
When working with Azure Cosmos DB, one of the primary design activities you will undertake is appropriate partition design. Partitioning allows you to chunk the data within individual containers, creating distinct subsets called logical partitions. Using partitioning in this way enables Cosmos DB to make good on its promise of “limitless horizontal scalability.”
So let’s dive in and find out a bit more about what makes partitions both cool and hot.
Even Distribution is Cool
Logical partitions are defined by a key attribute in the data, such as a customer identifier, a warehouse location, or a product category. These are appropriately called partition keys. Selecting a good partition key isn’t always easy though! The idea is to select a logical partition key that will naturally result in even distribution across the underlying physical partition infrastructure.
For example, if you have hundreds of thousands of customers in a single region, but 90 percent of them are located in the city of Springfield, then setting a partition key of “city” would result in an uneven distribution. This would then lead to high traffic and storage needs for that city’s customers, and create what is referred to as a “hot” partition.
…Hot Partitions, Not So Cool
Now, sometimes “hot” can be good: hot tamales, hot dogs, and hot days by the swimming pool. Hot partitions aren’t quite as fun. These regularly exceed the throughput made available to them, which results in throttling and failed connections. Some hot partitions can also put storage pressure on physical resources that, while scalable to a point, do have practical limitations.
Microsoft provides several ways to set and tune input on your Cosmos DB databases and containers. Throughput configuration and careful partitioning choices can often avoid hot partitions, and you can learn all about throughput, partition key design and other hot topics in my course, DP-420: Designing and Implementing Cloud-Native Applications Using Microsoft Azure Cosmos DB. However, there are times when even a good partition key, combined with careful throughput configurations, are not enough to stay cool.
Hierarchical Partitioning Provides Some Shade
Fortunately, Microsoft recently released two new capabilities that can help. The first is hierarchical partitioning, which allows you to choose up to three levels of partitioning.
For example, suppose you work for a hotel chain, where many of the franchise owners have more than one location. It might make sense to choose the FranchiseID property as a key, but if that turned out to still result in hot partitions, you can now partition first by FranchiseID and then by HotelID. And if you’re still in hot water, try one more level, such as a building number or floor number.
This is similar to the existing synthetic key partitioning capability in Cosmos DB, but with far less complexity for design development and implementation. This feature is still in preview, is not required for the DP-420 exam and, therefore, is not yet included in my certification course. But you can read more about it here: Hierarchical partition keys in Azure Cosmos DB (preview) | Microsoft Learn
Targeted Throughput when You Need a Firehose
The second preview feature, which I am even more excited about, is the ability to assign throughput across physical partitions.
By default, Azure Cosmos DB divides up provisioned throughput equally across all physical partitions. That means that while hot partitions are choking on their own smoke, data in low-traffic partitions are sitting in the air conditioning and probably wishing they could turn it down a bit.
For these scenarios, you now have the ability to redistribute your provisioned throughput across physical partitions, without having to increase overall throughput based on the hottest partition. It’s a little like having zoned thermostats. Suppose you have 10,000 RU/s provisioned. You can decide to assign the majority of it, 6,000 RU/s, to the hot partition and the rest to the cooler rooms of your Cosmos house.
To learn more about how to sign up for this preview feature, along with current limitations and caveats, visit Redistribute throughput across partitions (preview) in Azure Cosmos DB | Microsoft Learn.
In the meantime, another unrelated life tip: If pool parties are but a dream for you, and it feels like Rome is burning down all around, whip out the marshmallows and pull up a chair. This too shall pass.