-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[feature request] auto choose retention policies based on timestamp when querying #2625
Comments
A time range alone is not sufficient to identify which retention policy is desired. There is nothing to prevent two series with identical measurement name and tag sets from existing in separate retention policies with overlapping time ranges. Therefore it is not possible for the system to know which series is intended if the retention policy is not provided. A workaround for now is to keep all data for a given dashboard in the same retention policy. It does require maintaining multiple dashboards. |
@beckettsean, this is certainly from our POV a fairly important feature and I dont think your workaround really works. Let me give an example of the problem we have - we capture some data every second. Lets say IO blocks out (which is in telegraf). You need data at 1 second granularity for some types of troubleshooting, but in most cases on a graph it would be crazy to worry about 1 second data. Lets imagine a common case - a dashboard showing all metrics per server. It might default to show the last hour (3600 data points/server). Per day 86400, per month >2.5 million points. Per hour and perhaps per day will just work but nobody in their right mind would attempt to keep metrics at 1 second granularity over a year and then graph them - while InfluxDB can downsample them its going to have to pull a crazy number of metrics from disk for that query (and, in the real world, it would likely be >1 server on a graph; we also have plans to store data in some cases at a small number of microseconds delta). We also have a basic disk space problem - we are already capturing many hundreds of GB of 1s and 10s metrics per day. The sane pattern is to keep 1 second for 24 hours, 1 minute for a week, 5 minute for a month and once an hour for a year (or something similar). This is how just about every other system (graphite, Ganglia, etc.) handle it. This we can sort of do with a Continuous query in InfluxDB, to copy the down sampled data to a new database (although we have to delete the 1 second data manually). The problem is now we have a Grafana problem - we can only query either downsampled, or original data, from a single graph. This means that a user who looks at a 1 hour graph (1 second granularity) then zooms out to see the last month and we have to change the database. Which Grafana does not support. There are two ways to approach this:
My personal preference would be (3), but I suspect thats not an architectural starter (although if you would be willing to accept that as an option, we might be able to find somebody to work on it and send you a PR). This leaves us with (1) or (2). THis ticket strikes me as asking for (1). DO you think its best to attack this via means of this, or to track a issue more like (2) (for the InfluxDB project) |
A different way of solving is is to build a proxy between grafana and influx (we need this anyway to check user acces). Parse out the group by, measurement name and agregate of the incomming request at the proxy and apply a rule to change the measurement name (prepend a retention or a custom string fitting your data structure). Send this query to influx instead of the original. I think this is the most pratical solution at the moment. |
@PaulKuiper, funnily enough thats exactly how we plan to achieve this (we also have the ACL problem). Have you already worked on this? We may build this and open source it... or use somebody elses's if its already out there. |
Attached is a python file (in txt format, else I could not upload it), which you can use as a simple proxy between grafana and influx. It can greatly increase zoom speed. It assumes that the following continous queries are present for the measurement called "metric" : metric.1s.max Point your "data source" to port 3004 (or whatever you choose) instead of port 8086 in Grafana. |
+1 |
1 similar comment
+1 |
+1 !! |
I'll update it for influxdb 0.10 somewhere this month |
+1 |
1 similar comment
👍 |
I've done some work on @PaulKuiper proxy to work with 0.10 and put it here: |
With version 0.10 it's normal to have several values in each measurement.
Any ideas how to handle where there are several values? I was thinking in some batch processing with kapacitor which obtains all values for each measurement, and creates the appropriates CQs. |
I have make a small script to autogenerate RPs and CQs: https://gist.github.com/f4b6f5c8f6c2a51c3f60 |
@adrianlzt in CQs each tag or field must be explicitly named. It is possible to use So, to downsample multiple fields, the CQ would look something like this:
The |
Will be this fixed in next versions? El vie., 4 de marzo de 2016 2:20, Sean Beckett notifications@github.com
|
Follow #5750, which is the On Fri, Mar 4, 2016 at 9:13 AM, Adrián López notifications@github.com
Sean Beckett |
+1 for this issue. Without good data down-sampling / roll up, moving from whisper is likely to present problems for graphite graphs over long time periods. Ideally needs a way to set a default policy around rollup and retention, for all metrics of a specific type. |
+1 Coming from the old RRD world, this is obviously a big shift in mentality, I fully agree with #2625 (comment) - option 3 Then I never used graphite but read about it several times and I liked how you can set different retention policies per metrics if desired otherwise you get the default downsampling automatically. The influxdb approach is awkward and makes it hard to maintain in my beginner's opinion. I believe users want a time series databases that is efficient, fast and requires low maintenance. Managing RP and CQ with complex InfluxQL queries isn't obvious... My comments maybe irrelevant, I am still learning and reading mailing-list and github issues to figure out how I can set downsampling. Currently, I start to wonder why I waited so long for InfluxDB instead of just using Dixon's Graphite. Still, influxdb is fast to spin and play with the tutorial but then it gets more complicated when you really want to do something with it. |
Looking through old issues and I found this one. It seems related to #6910. |
Agreed. I think this one can basically be closed as a dup of #6910 |
I'm going to close this in favor of #6910. If a new issue gets created for this, it will be mentioned in that issue. |
Now, select chooses the default retention policy if not specified. It will be better if select auto choose retention policies for the same series based on timestamp when the select statement does not specify retention policy. This will simplify dashboard tools. This is what graphite already does.
If we have to change retention policy when we want older data, it is tedious because we have to edit the dashboard definition.
When we select from a series, we do not care what retention policies it has, we just want the datapoints.
The text was updated successfully, but these errors were encountered: