Opinionated set of monitors and SLOs for your Datadog trace metrics.
It creates a service time monitor and an error rate monitor per resource name for the given operation. There are default threshold values for the monitors but those can be updated.
It also creates an SLO for the service time monitors (looking at all of them) and an SLO for the error rate monitors (again, one SLO looking at all of the monitors). Note that there is a 20 monitor limit currently for an SLO, so if you've got more than 20 resources you may want to use this module more than once (ceil(num_resources / 20) times in fact).
There are also monitors on the SLO error budgets, so if the error budget is exhausted you can get an alert. SLOs span 3 time horizons (7 days, 30 days, 90 days) and there are separate monitors for each time horizon.
Name | Version |
---|---|
terraform | >= 1.0.0 |
Name | Version |
---|---|
datadog | n/a |
No modules.
Name | Type |
---|---|
datadog_monitor.error_rate_slo_monitors | resource |
datadog_monitor.high_error_rate_monitors | resource |
datadog_monitor.high_service_time_monitors | resource |
datadog_monitor.service_time_slo_monitors | resource |
datadog_service_level_objective.error_rate_slo | resource |
datadog_service_level_objective.service_time_slo | resource |
Name | Description | Type | Default | Required |
---|---|---|---|---|
critical_error_rate | Threshold error rate we want to alert at | number |
0.005 |
no |
critical_service_time | Threshold service time (amount of time request takes on server) we want to alert at in seconds | number |
0.5 |
no |
env | Environment your traces are tagged with | string |
n/a | yes |
notify | Notification handle for alerts. Must be of the form @pagerduty-{service} or @slack-{channel} etc. depending on the integration you're using | string |
n/a | yes |
operation | Trace metric the queries will look at. Called 'operation' in the APM dashboard | string |
n/a | yes |
resource_names | The resources you want to monitor by name. Check APM dashboard to see what your service has | list(string) |
n/a | yes |
service | Service name your traces are tagged with | string |
n/a | yes |
team | Team that owns these monitors | string |
n/a | yes |
No outputs.