Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Refactor metrics #32

Merged
merged 2 commits into from
Feb 6, 2024
Merged

Refactor metrics #32

merged 2 commits into from
Feb 6, 2024

Conversation

lackstein
Copy link
Member

PLAT-189 PLAT-463 PLAT-482

Metrics have been refactored so that they are namespaced under faktory. rather than a configurable value. Per-queue metrics have been replaced with a single metric with a queue label.

The existing metrics have been maintained so that we can migrate our Datadog monitors and dashboards without downtime.

New metrics are:

Name Type Description
faktory.ops.connections Gauge Faktory client network connections
faktory.jobs.working Gauge Current number of jobs being processed
faktory.jobs.scheduled Gauge Current number of scheduled jobs
faktory.jobs.retries Gauge Current number of jobs to be retried
faktory.jobs.dead Gauge Current number of dead jobs
faktory.jobs.enqueued{queue} Gauge Number of jobs in {queue}
faktory.jobs.latency{queue} Gauge The time between now and when the oldest queued job was enqueued
faktory.jobs.pushed{queue, jobtype} Counter Total number of jobs pushed
faktory.jobs.fetched{queue, jobtype} Counter Total number of jobs fetched
faktory.jobs.processed{queue, jobtype, status, dead} Histogram Timing for jobs that have been ACKed or FAILed. status is one of success or fail. dead is a boolean present for failed jobs.

@lackstein lackstein requested a review from elldritch January 12, 2024 18:49
Copy link

@elldritch elldritch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small questions on exact metric types chosen. Generally LGTM.


if mh.Reservation() != nil {
if err := m.StatsDClient().Timing(m.PrefixMetricName("succeeded.time"), time.Duration(time.Since(mh.Reservation().ReservedAt())), tags, 1); err != nil {
util.Warnf("unable to submit metric: %v", err)
}
if err := m.StatsDClient().Timing("faktory.jobs.processed", time.Duration(time.Since(mh.Reservation().ReservedAt())), tags, 1); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this one be a Gauge?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. If it were a Gauge then we'd only have metrics about how long the most recently completed task ran for. I'd rather have a histogram so that we can see the distribution of runtimes for each task type / queue.

@lackstein lackstein merged commit 5eb2e6d into main Feb 6, 2024
2 checks passed
@lackstein lackstein deleted the nl/new-metrics branch February 6, 2024 19:23
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants