Use Azure Resource Manager (ARM) to Deploy Azure Log Alerts
In this guide, you will learn how to write log queries to detect issues in your environment and deploy alert rules to notify your team of failures.
Oct 27, 2020 • 9 Minute Read
Introduction
Azure Monitor captures telemetry, diagnostics, activity, and metric data about your Azure resources. This data can be very useful in determining the health of your Azure environment. However, it is only useful when the data is surfaced to administrators and users. You want to be notified so you can fix any service outages or critical bugs in your apps as soon as possible.
In this guide, you will learn how to write log queries to detect issues in your environment and deploy alert rules to notify your team of failures. This guide will use Azure Resource Manager (ARM) templates.
Azure Resource Manager Templates
ARM templates are a set of JSON files you can create to define different aspects of your Azure infrastructure, including action rules, alert rules, and other resources. They are useful since you can easily replicate rules and make widespread changes by changing a few items in the JSON configuration.
A common scenario is to replicate Azure resources by creating a staging environment for testing and a more stable production environment. With ARM templates, you can define your infrastructure once, easily keep your staging and production environments sychronized.
Action Groups
One of the prerequisites of defining a log-based alert rule is creating an action group. An action group is a combination of zero or more of the following events that will be triggered by an Azure alert:
- Sending an email or an SMS to a user
- Triggering an automation runbook
- Triggering an Azure function
- Triggering a logic app
- Triggering a webhook
- Triggering an IT service management tool such as ServiceNow
The webhook trigger provides the most flexible integration with third-party systems. You can use it to integrate with services like Slack and Opsgenie. If your Azure environment is complex and uptime is critical, then using the webhook action and integrating with a tool like Opsgenie will give you the most flexibility. Azure Monitor can manage querying of telemetry and activity data, while Opsgenie allows you to create more sophisticated workflows such as on-call schedules, escalations, and notifications.
To define an Action Group that alerts to Opsgenie using ARM, create a azuredeploy.json, demonstrated below:
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"actionGroupName": {
"type": "string"
},
"actionGroupShortName": {
"type": "string"
},
"webhookUrl": {
"type": "string"
}
},
"resources": [
{
"name": "[parameters('actionGroupName')]",
"type": "microsoft.insights/actionGroups",
"apiVersion": "2019-06-01",
"location": "Global",
"tags": {
},
"properties": {
"groupShortName": "[parameters('actionGroupShortName')]",
"enabled": true,
"webhookReceivers": [
{
"name": "opsgenie",
"serviceUri": "[parameters('webhookUrl')]",
"useCommonAlertSchema": true
}
]
}
}
]
}
The ARM template above consists of two sections:
- resources: The action group that has an Opsgenie webhook. Note that it is an array and you can add more.
- parameters: Allows you to customize your template. This is useful for creating the same resource types with different names. You may want to add environment prefixes to your resources, such as staging- and prod-.
To deploy this ARM template you can use the Azure CLI:
rg=rg-alerts-pl
az group create -n $rg -l australiaeast
az group deployment create -g $rg --template-file Azuredeploy.json
These commands use the bash shell to create a resource group and deploy the action group into that resource group. When you run these commands, you will be prompted to supply the values for the parameters. You can obtain the webhookUrl from Opsgenie's Azure integration page.
You can also supply the parameters by supplying a parameters file described in Microsoft's documentation.
Log-based Queries
Now that you have set up integration with Opsgenie via an action group, you can start creating a log-based query. To do this, create an Application Insights instance and create a scheduledQuery against it.
rg=rg-alerts-pl
az extension add --name application-insights
az monitor app-insights create -a demo-insights-pl -l australiaeast -g $rg
Then, extend the ARM template file you created earlier with a scheduledquery resource and a few extra parameters to customize it.
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"actionGroupName": {
"type": "string",
"defaultValue": "demo-ag-group"
},
"actionGroupShortName": {
"type": "string",
"defaultValue": "aaa"
},
"webhookUrl": {
"type": "string",
"defaultValue": "https://api.opsgenie.com/v1/json/azure?apiKey=xxxxxxxxxxxxxxxxxxxxxxxxx"
},
"alertRules": {
"type": "array",
"defaultValue": [{
"alertRuleName": "demo-ar-exceptions",
"alertDescription": "new exception occured in demo-insights-pl",
"resourceName": "demo-insights-pl",
"resourceGroup": "rg-alerts-pl",
"query": "exceptions | union (traces | where (severityLevel > 2))"
}]
}
},
"resources": [
{
"name": "[parameters('actionGroupName')]",
"type": "microsoft.insights/actionGroups",
"apiVersion": "2019-06-01",
"location": "Global",
"tags": {
},
"properties": {
"groupShortName": "[parameters('actionGroupShortName')]",
"enabled": true,
"webhookReceivers": [
{
"name": "opsgenie",
"serviceUri": "[parameters('webhookUrl')]",
"useCommonAlertSchema": true
}
]
}
},
{
"name": "[parameters('alertRules')[copyIndex()].alertRuleName]",
"type": "microsoft.insights/scheduledqueryrules",
"location": "[resourceGroup().location]",
"apiVersion": "2018-04-16",
"dependsOn": [
"[resourceId('microsoft.insights/actionGroups', parameters('actionGroupName'))]"
],
"tags": {
},
"copy": {
"name": "alertscopy",
"count": "[length(parameters('alertRules'))]"
},
"properties": {
"description": "[parameters('alertRules')[copyIndex()].alertDescription]",
"enabled": "true",
"source": {
"query": "[parameters('alertRules')[copyIndex()].query]",
"authorizedResources": [
],
"dataSourceId": "[resourceId(parameters('alertRules')[copyIndex()].resourceGroup, 'microsoft.insights/components', parameters('alertRules')[copyIndex()].resourceName)]",
"queryType": "ResultCount"
},
"schedule": {
"frequencyInMinutes": 5,
"timeWindowInMinutes": 5
},
"action": {
"severity": "3",
"aznsAction": {
"actionGroup": [
"[resourceId('microsoft.insights/actionGroups', parameters('actionGroupName'))]"
]
},
"throttlingInMin": 60,
"throttleConsecutiveWindowCount": 0,
"trigger": {
"thresholdOperator": "GreaterThan",
"threshold": 1
},
"odata.type": "Microsoft.WindowsAzure.Management.Monitoring.Alerts.Models.Microsoft.AppInsights.Nexus.DataContracts.Resources.ScheduledQueryRules.AlertingAction"
}
}
}
]
}
There are a few interesting aspects to this code:
- You added the alertRules parameter, which is an array. This allows you create multiple alert rules that are connected to the same action group.
- scheduledQuery uses the copyIndex() function to create multiple resources based on the alertRules array. This makes it easy to add, modify, or remove alert rules.
- Different aspects of the alert rules are configurable. For example, you can configure the alert to only trigger if more than five exceptions have been observed or configure an alert to only trigger once every 24 hours to prevent spamming your alerting system.
- defaultValue has been added to the parameters for brevity. You may prefer to omit this and keep the parameters in a separate file.
Conclusion
Alerting is a powerful tool for keeping up to date on changes on status changes for your system. To learn more about how to set up and best configure these with your Opsgenie integration, read Best Practices for Incident Management on Slack with OpsGenie.