Implementing Distributed Tracing with Azure's Application Insights

Distributed systems can become easier to debug and diagnose using tracing with Microsoft Azure’s Application Insights.

Sep 24, 2020 • 5 Minute Read

Introduction

As software engineers, understanding distributed systems is essential to enhance, maintain, and deliver business value. By implementing tracing, distributed systems can become easier to debug and diagnose.

This guide will demonstrate how to implement tracing using Microsoft Azure’s Application Insights. In this guide, tracing refers to writing events from related systems such that an end-to-end transaction log can be reconstructed. You will learn how to create child requests as part of a larger operation. This is useful for viewing all logs in context. A basic understanding of installing and using Application Insights is assumed knowledge for this guide.

Understanding the Problem

Distributed systems present challenges that makes diagnosing issues harder:

Several systems participate in serving a single user request.
A concurrent system interleaves trace events from different requests

Monitoring the system becomes more difficult. Finding all the logs associated with one request is a nightmare.

Distributed Tracing in Application Insights

Application Insights can solve this problem by allowing you to construct a hierarchy of events. These events can occur within the same application process or across several application processes.

To describe an event that occurs in the same process as a child, call StartOperation and give it a name. In the code snippet below, use RenderEmail as an example of an in process child operation.

          public void RenderEmail(string emailType, EmailViewModel emailViewModel)
{
  using var operation = _telemetryClient.StartOperation<DependencyTelemetry>($"Render email {emailType}");
  operation.Telemetry.Type = "Email";
  operation.Telemetry.Properties["EmailType"] = emailType;
  body = _razorViewToStringRenderer.RenderViewToString(emailViewModel);
}
    

Note that the above code example uses using declarations, which are new to C# 8. When this method is called inside an ASP.NET Core API request, Application Insights constructs a hierarchy.

Traces can also be correlated across process boundaries too. If two systems collaborate in a subscriber/publisher relationship, their relationship can be inferred. This is illustrated below.

System A receives an HTTP POST message request
System A then places a message on a queue.
System B is a worker that processes the message

Note: when using Azure Service Bus with Azure App Services, this hierarchy is created automatically if both apps have Application Insights configured.

To achieve this in code, the message producer (System A) needs to write System.Diagnostics.Activity.Id somewhere in the message. On the message consumer side (System B), the following code needs to be added.

          public async Task Process(T message, CancellationToken cancellationToken)
{
  using var operation = _telemetryClient.StartOperation<RequestTelemetry>(new Activity($"Process message").SetParentId(message.ParentId));
  try
  {
    await _inner.Process(message, cancellationToken);
    operation.Telemetry.Success = result.Success;
  }
  catch (Exception ex)
  {
    operation.Telemetry.Success = false;
    throw;
  }
}
    

The above describes how a hierarchy of trace events can be created using System.Diagnostics.Activity and the StartOperation method. This helps to find all trace events involved in the context of a single user request.

Use Cases

With a hierarchy of traces configured in your app, many debugging scenarios are made much simpler. Some example of these include:

A distributed system has intermittent performance issues. The distributed traces can show timing information that can be examined to pinpoint the issue. N+1 database requests or a particularly slow API will immediately become apparent.
A daily batch job is executed, how is it performing? With operations logged in the way described above, it is easy to understand how long jobs take to run? How many messages are processed each day?
A request involves multiple resources. With the distributed trace, you can see all the child operations and understand how the request was fulfilled.

Conclusion

Diagnosing distributed systems can be simple with these techniques. If you would like to learn more, you can read about the technical details by reading How Application Insights Correlates Telemetry.