Refactor metrics #32

lackstein · 2024-01-12T18:49:36Z

Metrics have been refactored so that they are namespaced under faktory. rather than a configurable value. Per-queue metrics have been replaced with a single metric with a queue label.

The existing metrics have been maintained so that we can migrate our Datadog monitors and dashboards without downtime.

New metrics are:

Name	Type	Description
faktory.ops.connections	Gauge	Faktory client network connections
faktory.jobs.working	Gauge	Current number of jobs being processed
faktory.jobs.scheduled	Gauge	Current number of scheduled jobs
faktory.jobs.retries	Gauge	Current number of jobs to be retried
faktory.jobs.dead	Gauge	Current number of dead jobs
faktory.jobs.enqueued{queue}	Gauge	Number of jobs in {queue}
faktory.jobs.latency{queue}	Gauge	The time between now and when the oldest queued job was enqueued
faktory.jobs.pushed{queue, jobtype}	Counter	Total number of jobs pushed
faktory.jobs.fetched{queue, jobtype}	Counter	Total number of jobs fetched
faktory.jobs.processed{queue, jobtype, status, dead}	Histogram	Timing for jobs that have been ACKed or FAILed. `status` is one of `success` or `fail`. `dead` is a boolean present for failed jobs.

elldritch

Small questions on exact metric types chosen. Generally LGTM.

metrics/task.go

elldritch · 2024-01-18T23:55:22Z

metrics/middleware.go


 		if mh.Reservation() != nil {
 			if err := m.StatsDClient().Timing(m.PrefixMetricName("succeeded.time"), time.Duration(time.Since(mh.Reservation().ReservedAt())), tags, 1); err != nil {
 				util.Warnf("unable to submit metric: %v", err)
 			}
+			if err := m.StatsDClient().Timing("faktory.jobs.processed", time.Duration(time.Since(mh.Reservation().ReservedAt())), tags, 1); err != nil {


Should this one be a Gauge?

I don't think so. If it were a Gauge then we'd only have metrics about how long the most recently completed task ran for. I'd rather have a histogram so that we can see the distribution of runtimes for each task type / queue.

Add new metrics

b380cff

lackstein requested a review from elldritch January 12, 2024 18:49

elldritch approved these changes Jan 18, 2024

View reviewed changes

Add histogram of queue latency

ab9ce97

lackstein merged commit 5eb2e6d into main Feb 6, 2024
2 checks passed

lackstein deleted the nl/new-metrics branch February 6, 2024 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor metrics #32

Refactor metrics #32

lackstein commented Jan 12, 2024

elldritch left a comment

elldritch Jan 18, 2024

lackstein Jan 23, 2024

Refactor metrics #32

Refactor metrics #32

Conversation

lackstein commented Jan 12, 2024

elldritch left a comment

Choose a reason for hiding this comment

elldritch Jan 18, 2024

Choose a reason for hiding this comment

lackstein Jan 23, 2024

Choose a reason for hiding this comment