- Lab
- A Cloud Guru
Optimize Query Performance in Cosmos DB for NoSQL
In this lab, you will be presented with a short series of real-world scenarios that describe performance issues in a Cosmos DB implementation. For each scenario, you will be provided with a list of optimization ideas to research and consider. Some of the suggested optimizations include adjusting throughput on a container, customizing or adjusting indexing policies, Time to Live (TTL) settings, and consistency levels. After researching and selecting the optimization idea most likely to introduce improvement, you will use the Azure portal to apply the solution. No coding experience is required, but familiarity working with Azure Cosmos DB for NoSQL databases, and navigating the Azure portal will provide you with a headstart. However, novices who are willing to do a little research using the links provided, along with the lab guide and solution videos, should also be able to complete the lab successfully.
Path Info
Table of Contents
-
Challenge
Scenario 1: Optimize for Inconsistent Request Traffic
Your first consultation is with a team the works with the ADT (admissions, discharges, and transfers) data for a hospital administrative application. ADT messages arrive in the
Messages
container on theADT
database. The team is plagued by 429 errors and clear evidence of throttling during high-traffic times that coincide with elective surgery check-ins early in the morning, and late evening admissions in emergency rooms. Team members have tried setting the container throughput to 10,000 RU/s to prevent throttling during high-traffic periods, but more than half the time, the container is consuming only about 4,000 RU/s. The team lead knows they are underutilizing the higher throughput setting and wasting a lot of money, but she is not sure how to address it. Given the optimization ideas below, what do you recommend? You may want to navigate to the Cosmos DB account, database, and container to check current settings before you settle on any solutions.After selecting one or more optimization ideas, navigate to the appropriate resource(s) and apply your solution(s).
Optimization Ideas
- Modify throughput on a container
- Modify the indexing policy on a container
- Modify TTL on a container
- Modify the consistency level on the account
-
Challenge
Scenario 2: Optimize Common Read Queries
Your second meeting is with a data engineer, who supports the data analytics team. Users are complaining of extremely slow performance for nearly all query executions. The analytics team works with data on the
Analytics
database, in a container calledPrimaryCareProviders
.The data engineer has identified the fact that nearly all of the queries are identical, except that the users return different properties in different forms, depending on the report or visualization they are populating. Specifically, the
WHERE
clause is very common:SELECT c.last_name ,c.gender ,c.age ,c.city ,c.pcp_id FROM c WHERE c.pcp_id = [some value]
The engineer has also identified a few queries that fail entirely if they include this
ORDER BY
clause:ORDER BY c.age
What can you suggest? You may want to navigate to the Cosmos DB account, database, and container to check current settings before you settle on any solutions.
After selecting one or more optimization ideas, navigate to the appropriate resource(s) and apply your solution(s).
Optimization Ideas
- Modify throughput on a container
- Modify the indexing policy on a container
- Modify TTL on a container
- Modify the consistency level on the account
-
Challenge
Scenario 3: Reduce Storage Costs
Your third consultation is with the Internet of Things (IoT) team, who manage hospital sensors, which stream device messages to a container on Cosmos DB for NoSQL. The messages are first stored in a container called
DeviceIntake
, and then an Azure Function with a trigger on theDeviceIntake
change feed sends the data along to Event Hubs. An Azure Stream Analytics job pulls from Event Hubs to transform and augment the data before sending it along to another Cosmos DB container calledDeviceMessages
. Both containers are on a database calledIoT_Team
. The architecture is working well, which is somewhat surprising, given that the team spun up both containers with just the normal default settings and have made no adjustments to those defaults.No clients read from the
DeviceIntake
container; once an audit process runs every night to ensure that all raw messages inDeviceIntake
have safely arrived inDeviceMessages
, no other read operations are performed on that container. All read operations are executed againstDeviceMessages
, and the typical access pattern involves point reads; SQL queries are rare.There are no reported performance issues, but upper management has seem some cost management reports from the IT manager, and they are very concerned by how rapidly the storage costs are racking up. Given that the cost is high enough to get the attention of upper management, the team is eager to hear your ideas for quickly addressing storage costs.
You may want to navigate to the Cosmos DB account, database, and containers to check current settings before you settle on any solutions.
After selecting one or more optimization ideas, navigate to the appropriate resource(s) and apply your solution(s).
Optimization Ideas
- Modify throughput on a container
- Modify the indexing policy on a container
- Modify TTL on a container
- Modify the consistency level on the account
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.