Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add a prometheus-compatible metrics endpoint format to pebble check URL #118

Open
mthaddon opened this issue May 5, 2022 · 13 comments
Open
Labels
Feature A feature request Needs Design Needs a design or spec

Comments

@mthaddon
Copy link
Contributor

mthaddon commented May 5, 2022

As documented in https://juju.is/docs/sdk/pebble#heading--check-health-endpoint-and-probes:

"""
As of Juju version 2.9.26, Pebble includes an HTTP /v1/health endpoint that allows a user to query the health of configured checks, optionally filtered by check level with the query string ?level=<level>. This endpoint returns an HTTP 200 status if the checks are healthy, HTTP 502 otherwise.
"""

It would be great if there was an option to expose this in a format that prometheus could ingest. If so, we could easily integrate charms with COS by configuring them to hit this pebble endpoint and ingest the data exposed there.

@benhoyt
Copy link
Contributor

benhoyt commented May 6, 2022

@mthaddon I don't know much (anything :-) about Prometheus endpoint formats: can you briefly describe what it would look like for this endpoint to be Prometheus compatible?

@jnsgruk
Copy link
Member

jnsgruk commented May 6, 2022

This feels like something @simskij might have opinions on...

@simskij
Copy link
Member

simskij commented May 6, 2022

This would be really useful. I've not had a look at what is already there yet, so I'll refrain from having an opinion on the current implementation for now.

However, the Prometheus Exposition Format is a very basic and extremely stable specification. In its simplest form, exposed metric consists of a metric name, optionally a set of labels and then a numeric value. The parts are separated eith a single space and entries are separated with a newline.

The spec for the exposition format is available here.

A basic one could look like:

pebble_status{instance="something", check="some-check"} 1

Where 1 is a boolean value

@benhoyt
Copy link
Contributor

benhoyt commented May 9, 2022

Okay, so we'd expose an HTTP endpoint, /v1/metrics or similar, with some metrics about health checks (and potentially other Pebble things) in that format. One consideration is security: we intentionally reduced the /v1/health endpoint down to just the "healthy: true" result to avoid exposing any additional information over this unauthenticated HTTP endpoint. Specifically, we wanted to avoid exposing the details of all the health checks. Any thoughts about whether that's a real issue?

@simskij
Copy link
Member

simskij commented May 9, 2022

Any thoughts about whether that's a real issue?

Yeah, this would open up for potential fingerprinting of the services. This could be solved by adding basic auth to the metrics endpoint and allowing the charm to access these credentials somehow. We currently lack the support for scrape job authentication in the Prometheus charm (or at least it hasn't been thoroughly tested), but that would be fairly trivial to support.

One considering is security: we intentionally reduced the /v1/health endpoint down to just the "healthy: true"

I wouldn't alter it. Adding /v1/metrics sounds like a better game plan in that case as it closer matches the data it would expose.

@shipperizer
Copy link

guys have you got any updates on this issue?
any help that can be given on this?

our use case is that we wanna monitor the status of our app(s) across restarts

Happy to lend a hand on it

@benhoyt
Copy link
Contributor

benhoyt commented Jul 11, 2023

I think given that Pebble is used across a bunch of projects now, we'd need a spec for this (and auth sounds like it might take a bit of thinking about). I like the idea, but don't have bandwidth right this cycle to spec and implement this. @shipperizer Would you be willing to create a spec for it? Proposal for /v1/metrics HTTP endpoint, what's it returns, discussion of auth, and so on.

@shipperizer
Copy link

@benhoyt can give it a crack, hopefully

a couple of considerations are:

  • basic auth is what prometheus supports at best afaik (please point me somewhere else if wrong) so can't do more than that if aim is to be scraped
  • as MVP a "process_down_period" (or similar) where we track slots when app is down is all I can think of (and need at the same time)

if those points above are good enough I will cut some time to work on it next pulse

@benhoyt
Copy link
Contributor

benhoyt commented Jul 11, 2023

If basic auth is what Prometheus supports, then yeah, we'll need to use that (the tricky design thing will be how to configure/get credentials).

I'd be thinking something more along the lines of pebble_service{name="srv1"} 1 and pebble_check{name="chk1"} 0 would be more like it. But I'm not very familiar with how these types of things are done in Prometheus metrics endpoints normally. @simskij will have ideas (he's posted one above too).

Given the design considerations above, I still think we need a formal spec and spec review meeting with a few invested players (Simon, Gustavo, me at least). So probably don't start on implementation until you have a spec semi-approved.

@shipperizer
Copy link

@benhoyt agreed, will try to get the specs in the next pulse

@shipperizer
Copy link

specs shared internally

@benhoyt
Copy link
Contributor

benhoyt commented Mar 13, 2024

@shipperizer I forget -- where did we get to with this?

@benhoyt benhoyt added Feature A feature request Needs Design Needs a design or spec labels Mar 13, 2024
@shipperizer
Copy link

@shipperizer anywhere unfortunately, i lost track of it, spec is somewhere in my drive, can dig it out and we can have another chat about it

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Feature A feature request Needs Design Needs a design or spec
Projects
None yet
Development

No branches or pull requests

5 participants