-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Recording rule and adhoc query produce different (floating point) result #2951
Comments
Queries run on the second, recording rules can happen at any millisecond within the second. |
The data has been scraped regularly but all values and dimensions have been constant over the last hour. All data comes from a single target. Can it still cause trouble if the rule evaluation happens concurrently to the scraping/ingesting? Or what are you implying with the second/milisecond precision? |
Hmm, you haven't demonstrated that the value hasn't changed. Try "changes" rather than "increase". |
Same result for changes. There are no changes in the date source
but several changes in the aggregate
|
Is the count() consistent? |
Yes, the count is consistent. I managed to track this down further. In the following examples I am in the query explorer on the UI and hit enter a couple of times in very short succession. As expected, the aggregated result does not change as it all happens within one rule evaluation interval:
After the time for a rule evaluation has passed, I get a slightly different result even though the underlying data has no changes:
If I do this for the non-aggregated ones I get different results on each query evaluation:
Sorting the resultset before hand gives a consistent result though, as we will make the same floating point error every time:
In total, this would explain why we get a different rule evaluation result every time, and thus lots of changes in the resulting time series. I believe it is fine that Prometheus makes these slight mistakes. I will have to correct my incoming data instead. |
This looks like normal floating point inaccuracy. |
Thanks for your help! (I will opt for the mailing list next time. It looked like a bug to me at first which is why I jumped to the tracker) |
@StephanErb In case you were still wondering why the result was stable for ad-hoc queries, but not for recording rules: within a single range query, all the individual time resolution steps share the same ordering for the underlying time series because they get attached to the AST of the expression in a particular order in the query preparation phase and then just used at every time step in that order. Rules are individual instant queries that get executed at every rule evaluation cycle, so multiple rule evaluations don't share the same underlying series order. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
What did you do?
Define an aggregation rule for an expensive query.
The underlying data looks something like this (simplified label set). Please note the large number of digits.
The metric has lots of dimensions:
The metric has not changed within the last hour:
What did you expect to see?
The underlying metric consists of several very slow moving counters. I therefore expect that the aggregation rule and the adhoc query produce the same results.
What did you see instead? Under which circumstances?
Plotting the adhoc query produces a flat line where as plotting the aggregated timeseries produces a non-linear one.
The actual difference is small but still noticeable.
Looking at the data this probably boils down to rounding errors in floating point math. But why does it differ for the recording rule and the adhoc query?
Environment
System information:
Linux 3.16.0-4-amd64 x86_64
Prometheus version:
prometheus, version 1.7.1 (branch: master, revision: 3afb3ff)
build user: root@0aa1b7fc430d
build date: 20170612-11:44:05
go version: go1.8.3
Prometheus configuration file:
The text was updated successfully, but these errors were encountered: