tracing-datadog-apm
uses the tracing crate
to produce application traces in a format usable by Datadog's APM service.
From the tracing README:
tracing
is a framework for instrumenting Rust programs to collect structured, event-based diagnostic information.tracing
is maintained by the Tokio project, but does not require thetokio
runtime to be used.
By using the tracing crate we are able to produce structured trace information in asynchronous and in multi-threaded environments, which are exactly the kinds of environments typically being monitored with Datadog APM.
Add this line to your Cargo.toml
:
tracing-datadog-apm = "0.0.1"
That will provide an implementation of tracing::Subscriber
that will pick up
trace data, as well as a Datadog client equipped to pass trace data to the
Datadog agent running on the machine.
If you want APM instrumentation out of the box, this crate aims to provide a few common ones. Each instrumentation can be enabled as a feature. Right now the crate provides:
actix-web
middleware
tracing-datadog-apm = { version = "0.0.1", features = ["actix-web"] }
First, make sure your application also has a Datadog agent running in the background. See more info here.
Then, in your code, create a Datadog client. If you're using the latest version of the
Datadog agent, and the agent is running on the same machine as your application,
you can use .create_default()
to make a client with the correct values.
let datadog_client = tracing_datadog_apm::datadog_client::Client::create_default();
If your Datadog agent is running on another machine or on a different port than the default
8126, you can provide your own config by using .create_with_config()
and passing in an instance
of ClientConfig
. ClientConfig
lets you chain method calls to set the fields you want.
let datadog_client = tracing_datadog_apm::datadog_client::Client::create_with_config(
tracing_datadog_apm::datadog_client::ClientConfig::new()
.datadog_agent_host("foo")
.datadog_agent_port(1234),
);
Next, create a Datadog tracing Subscriber
. This will take ownership of the Datadog
client created in the previous step. It will also take a TracingSubscriberDatadogConfig
,
which maps span names to service names and span type.
The span name is the name of the span according to the tracing
library.
The service name and span type are properties of the span when it gets into Datadog
(notice that both tracing
and Datadog have a notion of a span).
use tracing_datadog_apm::subscriber::{TracingSubscriberDatadog, TracingSubscriberDatadogConfig};
let datadog_tracing_subscriber = TracingSubscriberDatadog::new(
datadog_client,
TracingSubscriberDatadogConfig::new()
.add_mapping(
SpanName("http.request"),
(ServiceName("my-service-rest"), SpanType::Web),
)
.add_mapping(
SpanName("database.query"),
(ServiceName("my-service-database"), SpanType::Db),
)
.add_mapping(
SpanName("redis.request"),
(ServiceName("my-service-cache"), SpanType::Cache),
)
.add_mapping(
SpanName("custom.function"),
(ServiceName("my-service-function"), SpanType::Custom),
),
);
There are 4 available SpanType
s defined by Datadog: Web
, Db
, Cache
, and
Custom
.
Note that each service name provided will show up as a separate service in the APM dashboard, and any time a trace from one service touches multiple services, each service will show up in the latency breakdown graph with a percentage of time spent in that service.
Example: See the % of Time Spent by Service
graph in the lower right.
You can also map multiple span names and span types to the same service name. In that case that service will show up in APM, but it will have a dropdown that allows you to select the spans you want to see displayed.
In your application, you will need to be using the tracing
crate. You should set the Datadog Subscriber
as the global subscriber (in future
iterations it will also be available as a Layer
so it can be combined with other
Subscriber
s).
tracing::subscriber::set_global_default(datadog_tracing_subscriber)
.expect("Setting tracing default failed");
The easiest way to instrument things in an async, multi-threaded environment is
to use the tracing
instrument
attribute macro.
#[instrument(name = "call.other_service", skip_all, fields(resource, http_method, http_url, http_status_code, error_msg))]
async fn call_other_service(user_uuid: uuid::Uuid) -> bool {
if let Ok(host) = env::fetch("OTHER_SERVICE_HOST") {
// define the URL and method for our REST request
let url = format!("{}/users/{}", host, user_uuid.to_hyphenated());
let method = "GET";
// Record some data in the current span about this request:
// (the current span being a `call.other_service` span created
// when we enter this function, thanks to that `instrument` tag)
//
// - `resource` is telling us what specifically is being hit in this span.
// It will show up (along with the other `resource` types) on the APM
// dashboard for spans of the type `call.other_service`
//
// - `http.method` and `http.url` are just some metadata we're collecting
//
let current_span = tracing::Span::current();
current_span.record("resource", &*format!("{} {}", method, "Other Service"));
current_span.record("http_method", &*method);
current_span.record("http_url", &*url);
// Make the REST request (this stuff is arbitrary, I'm just using
// reqwest as an example)
let client = reqwest::Client::new();
match client.get(&url).send().await {
Ok(resp) => {
// Record the response status code as some more metadata
let current_span = tracing::Span::current();
current_span.record("http.status_code", resp.status().as_str());
resp.status().is_success()
}
Err(e) => {
// Alternately, if the request fails, record the error message
let err_msg = format!("Could not call other service with error: {}", e);
let current_span = tracing::Span::current();
current_span.record("error.msg", &*err_msg);
log::error!("{}", err_msg);
false
}
}
} else {
// Record an error message if we failed getting the env var
let err_msg = "Failed to fetch OTHER_SERVICE_HOST env var";
let current_span = tracing::Span::current();
current_span.record("error.msg", err_msg);
log::error!("{}", err_msg);
false
}
}
To instrument
properly, pick a name for the span (this will map to a service and
span type, as defined in the TracingDatadogSubscriberConfig
- if it's not set in the
config, it will be ignored). Then skip
the function args (they will be passed
along by default, unless you skip them). Then define what fields you would actually like to pass along.
The acceptable field names are:
trace_id
- the id of the current trace - normally does not need to be passed explicitlyparent_id
- the id of the parent span - normally does not need to be passed explicitlyresource
- resource name within the given spanstart
- the start time in nanos from the Unix epoch - normally doesn't need to be passed explicitlyhttp_method
- metadata for http requests (in or out)http_url
- metadata for http requests (in or out)http_status_code
- metadata for http requests (in or out)error_type
- type of the error that occurred (a string)error_msg
- accompanying error messageerror_stack
- the whole error stack if you have it as a string
At bare minimum, all spans should have a resource
. For Web
spans this is easy:
what's the resource for the REST request? For a Db
span it is usually the
SQL query but with placeholder values, i.e. SELECT $1 FROM table WHERE id = $2;
.
Cache
spans will also often have a query that can serve as the resource
.
Custom
spans can do whatever they'd like.
Web
spans will also typically include the multiple http
and error
parameters.
The other span types can make use of the error
parameters if they need.
For more information on spans, check out
these docs
and for the instrument
attribute macro,
look here.
For more information on Datadog's treatment of traces and spans, refer to their documentation on tracing.