.NET Distributed Tracing 2: Instrumenting with OpenTelemetry

This post is part 2 of a series on .NET distributed tracing and OpenTelemetry, showing how to use the built in support for OpenTelemetry in modern .NET to instrument your distributed application for tracing and logging, for a (brilliant) connected future.

.NET Distributed Tracing 1: Modern distributed tracing
.NET Distributed Tracing 2: Instrumenting with OpenTelemetry
.NET Distributed Tracing 3: OpenTelemetry Collector and Azure Monitor

We have already seen in part 1 how distributed tracing is supported in .NET via W3C Trace Context propagation, with automatic (or mostly automatic) support across HttpClient calls and messaging.

We will now go further than logging and look at tracing. Tracing looks at the different units of work (spans) done during an operation (trace), how they are connected, and the timings of the different components. This is an important tool for investigating performance issues in distributed systems.

An example distributed trace timeline, across multiple components, viewed in Jaeger, one of many supported tools:

As well as looking at individual traces timings can be aggregated across the system to find the slowest areas, and identify anomalies.

What is OpenTelemetry?

OpenTelemetry logo

OpenTelemetry is an industry movement that arose from the W3C Trace Identifier standardisation, and the combination of two open source projects OpenCensus and OpenTracing.

It provides an open source, vendor neutral framework for observability, supporting traces, metrics, and logs, and with automatic instrumentation provided out of the box.

OpenTelemetry is strongly supported by industry, and has been quickly implemented by many vendors in the instrumentation space, and supports many programming languages. Many platforms have made it (or are making it) the default for interoperability.

It is rare for a new standard to achieve such rapid acceptance, and I have been impressed by how fast OpenTelemetry has been adopted.

OpenTelemetry support in .NET

OpenTelemetry is supported in the core of .NET, with System.Diagnostics.Activity updated to be the .NET implementation of OpenTelemetry span.

System.Diagnostics.Activity supports the same operations and internal structure as span, althought it has kept the same name and interface as it is already widely used in .NET code. It supports W3C Trace Context identifiers and propagation, and OpenTelemetry tags, baggage, and events.

Internal systems, such as HttpClient, ASP.NET, and Azure Service Bus support activity (span) creation, types, and attributes.

There are plug in libraries that will automatically (no additional code) instrument Entity Framework, SQL Server, and other Microsoft components.

Many third party libraries also now either support OpenTelemetry directly (e.g. MassTransit), or have plug in instrumentation available (e.g. PostgreSQL).

Distributed tracing has also started being added to Azure, with one of the first services being IoT Hub distributed tracing for device-to-cloud messages, allowing you to trace IoT messages end-to-end.

Adding OpenTelemetry to your .NET project

One of the benefits of OpenTelemetry is the automatic instrumentation, so there is not a lot to do except reference the libraries and then set up the configuration.

Importantly, you don't need to make any changes to existing code to take advantage of the tracing, and if you are using the standard LoggerMessage / ILogger<T> interface for logging, then that also works.

These example are also available in Github: https://github.com/sgryphon/dotnet-distributed-tracing-examples

Existing .NET code

Our example system consists of a web app, with a browser interface, that calls a back end service over HTTP. that then accesses a PostgreSQL database via Entity Framework. The web app also sends a message using MassTransit over a RabbitMQ message bus to a worker application.

The full complex tracing example is available in GitHub, and uses Docker engine to run RabbitMQ, PostgreSQL, and the Adminer interface for PostgreSQL.

Of important note is that the application code uses standard .NET, with nothing specific to OpenTelemetry, using standard LoggerMessage / ILogger<T> and HttpClient calls. In these examples the LoggerMessage pattern is used for high performance logging.

[HttpGet]
public async Task<string> Get(System.Threading.CancellationToken cancellationToken)
{
    Log.Warning.WebAppForecastRequestForwarded(_logger, null);
    var result = await _httpClient.GetStringAsync("https://localhost:44301/WeatherForecast", cancellationToken);
    await _publishEndpoint.Publish<Demo.WeatherMessage>(new { Note = $"Demo Message" }, cancellationToken);
    return result;
}
...
public static readonly Action<ILogger, Exception?> WebAppForecastRequestForwarded =
    LoggerMessage.Define(LogLevel.Warning,
        new EventId(4001, nameof(WebAppForecastRequestForwarded)),
        "TRACING DEMO: WebApp API weather forecast request forwarded");

The example uses the MassTransit library to call to RabbitMQ for messaging, but there is nothing specific to OpenTelemetry in any of the messaging configuration or handling code.

public async Task Consume(ConsumeContext<WeatherMessage> context)
{
    Log.Warning.WorkerMessageReceived(_logger, context.Message.Note, null);
    await Task.Delay(TimeSpan.FromMilliseconds(200), context.CancellationToken);
}

The back end service similarly has a straight forward implementation of Entity Framework calling to PostgreSQL:

public IEnumerable<WeatherForecast> Get()
{
    _weatherContext.WeatherServiceRequests.Add(new WeatherServiceRequest() {Note = "Demo Note"});
    _weatherContext.SaveChanges();
    Log.Warning.ServiceForecastRequest(_logger, null);
    ...
}

Instructions are provided on GitHub to create the sample application from scratch, or you can use the premade version.

Configuring OpenTelemetry

To use OpenTelemetry no change is required to any of the functional code of the application. We simply need to reference the OpenTelemetry libraries in the host, and then configure OpenTelemetry in the application startup builder, defining the resource, adding automatic instrumentation libraries, and setting exporters.

This example adds instrumentation for AspNetCore, HttpClient, and PostgreSQL. MassTransit already has built in support for OpenTelemetry. The Jaeger exporter is used. Note that some of the libraries are still in pre-release.

dotnet add Demo.WebApp package OpenTelemetry.Extensions.Hosting --prerelease
dotnet add Demo.WebApp package OpenTelemetry.Instrumentation.AspNetCore --prerelease
dotnet add Demo.WebApp package OpenTelemetry.Instrumentation.Http --prerelease
dotnet add Demo.WebApp package OpenTelemetry.Exporter.Jaeger

dotnet add Demo.Service package OpenTelemetry.Extensions.Hosting --prerelease
dotnet add Demo.Service package OpenTelemetry.Instrumentation.AspNetCore --prerelease
dotnet add Demo.Service package Npgsql.OpenTelemetry
dotnet add Demo.Service package OpenTelemetry.Exporter.Jaeger

dotnet add Demo.Worker package OpenTelemetry.Extensions.Hosting --prerelease
dotnet add Demo.Worker package OpenTelemetry.Exporter.Jaeger

To configure OpenTelemetry we first need to define the resource. In OpenTelemetry a resource represents an entity that is producing telemetry, such as a service, Kubernetes pod, device, etc. A resource has a number of properties such as name, version, and the OpenTelemetry library being used.

The OpenTelemetry specification defines resource semantic conventions for standard names of attributes.

In the code below we add the default attributes (name, version - taken from the semantic AssemblyInformationalVersion, and library), along with additional attributes, following the convention naming standards, for the host, operating system, and environment.

var entryAssembly = System.Reflection.Assembly.GetEntryAssembly();
var entryAssemblyName = entryAssembly?.GetName();
var versionAttribute = entryAssembly?.GetCustomAttributes(false)
    .OfType<System.Reflection.AssemblyInformationalVersionAttribute>()
    .FirstOrDefault();
var serviceName = entryAssemblyName?.Name;
var serviceVersion = versionAttribute?.InformationalVersion ?? entryAssemblyName?.Version?.ToString();
var attributes = new Dictionary<string, object>
{
    ["host.name"] = Environment.MachineName,
    ["os.description"] = System.Runtime.InteropServices.RuntimeInformation.OSDescription,
    ["deployment.environment"] = builder.Environment.EnvironmentName.ToLowerInvariant()
};
public var resourceBuilder = ResourceBuilder.CreateDefault()
    .AddService(serviceName, serviceVersion: serviceVersion)
    .AddTelemetrySdk()
    .AddAttributes(attributes);

During application build, we then add OpenTelemetry services and configure them with the resource, the automatic instrumentation, additional sources (in this case MassTransit), and the exporter we want.

Note: The code below registers all the instrumentation we are using; in practice each of the components may have different instrumentation, e.g. only the back end service is using PostgreSQL

builder.Services.AddOpenTelemetryTracing(configure =>
{
    configure
        .SetResourceBuilder(resourceBuilder)
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddNpgsql()
        .AddSource("MassTransit")
        .AddJaegerExporter();
});

And that is it. We don't need to change any application code.

Tracing results in Jaeger

When the project is run, you can see the full trace path of requests in a local Jaeger instance, which can also be run via Docker.

Start the dependencies (e.g. via Docker compose: docker compose -p demo up -d) and then run the application and make a few requests in the web interface. Then browse to the Jaeger interface to see the results at http://localhost:16686/

Select the service Demo.WebApp and search, which will show you a graph of all traces that service is involved in, along with some key details such as the number of spans and number of errors.

You can click into a trace (see screen at top of article) to see the full detail and timings for a single trace as it moves through the distributed application.

Tracing is useful to identify the depenencies between components, and for investigating performance issues to see where bottlenecks are.

Displaying the system architecture

The trace relationships between components can also be used to generate a system architecture diagram, useful to understand which components call each other, and how frequently.

Being able to diagram the actual runtime dependencies in a complex distributed application is valuable in trying to understand the application behaviour.

Next steps

Have a look at the example application, and see how easy it is to add OpenTelemetry support to your existing distributed applications, and the amount of diagnostic information that is available.

You can either configure OLTP, or custom, exporters for many instrumentation providers. You can also set up an OpenTelemetry Collector to forward logs and traces to a destination such as Azure Monitor.

The OpenTelemetry Protocol, and the OpenTelemetry Collector, including how to connect to Azure Monitor, are covered in the next article in this series: .NET Distributed Tracing 3: OpenTelemetry Collector and Azure Monitor

Thumbnail picture from: https://pixabay.com/illustrations/pulse-trace-healthcare-medicine-163708/