-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Add min rel accuracy stopping criterion #1744
base: main
Are you sure you want to change the base?
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
src/thread_timer.h
Outdated
// Accumulated time so far (does not contain current slice if running_) | ||
double real_time_used_ = 0; | ||
|
||
double cpu_time_used_ = 0; | ||
// Manually set iteration time. User sets this with SetIterationTime(seconds). | ||
double manual_time_used_ = 0; | ||
double manual_time_used2_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we need to track every timer as-is and as as a sum-of-squares?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea is to look at the time durations as independent identically distributed samples of a random variable. And then use second-order statistics to estimate mean, variance, and accuracy of the estimate of the mean. So for this, we need indeed to keep track of the sum of the squares of the time durations.
src/benchmark_runner.cc
Outdated
i.seconds >= | ||
GetMinTimeToApply() || // The elapsed time is large enough. | ||
(((i.seconds >= GetMinTimeToApply()) && | ||
(b.use_manual_time() && !IsZero(GetMinRelAccuracy()) && (std::sqrt(i.seconds2 / i.iters - std::pow(i.seconds / i.iters, 2.)) / (i.seconds / i.iters) / sqrt(i.iters) <= GetMinRelAccuracy()))) || // The relative accuracy is enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, i think this needs a bit of refactoring...
src/benchmark_runner.cc
Outdated
|
||
BM_VLOG(3) << "Next iters: " << next_iters << ", " << multiplier << "\n"; | ||
return next_iters; // round up before conversion to integer. | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand the changes to this function.
Does this even compile? multiplier
is a local variable, and it seems unused now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @LebedevRI. Thanks for looking at the PR. I must admit that I meant to push it to my own fork, rather than to the Google repository. This is why the PR is in a very early draft stage. But it's here now :) So maybe that's indeed a good opportunity to work on it together.
For this particular change, it's an unintentional leftover from a wrong manipulation. I'll delete it.
As you probably noticed, non-manual timers don't produce per-iteration times |
c242f5a
to
3bca691
Compare
Hi @LebedevRI. In the mean time, I have elaborated the PR more. It is now in a better state for discussion. Questions/issues that @romintomasetti and I are thinking of are:
Many thanks in advance! |
I don't have an answer to this question. Following replies assume "yes", but i'm undecided.
Yes, that use of manual time for GPU kernels is indeed very reasonable.
True per-iteration times are, a complicated topic.
That's my biggest problem honestly. The existing lack of max-time limit
This feature does not need to store per-iteration times, As i have said, true per-iteration times is a complicated topic, and storing them doubly so. |
Also, |
I'm not sure to understand entirely what you mean. The condition
seems to be such that even after the changes in this PR, the behavior is that the first repetition sets the number of iterations and the next repetitions then reuse that number. An issue may be that subsequent repetitions don't reach the required relative accuracy for the reused number of iterations. But isn't this issue also present for the current stopping criterion based on the time? |
Precisely, yes. Presumably that is not what you want if you explicitly request convergence-based run-time.
Sure. |
(To be noted, i'm just pointing out that failure mode, |
Hi @LebedevRI. In the mean time, I've thought a bit more about it. An answer may be to look at it as two steps. This PR could be a first step focused on introducing a new convergence criterion. And exploring changes to the behavior for multiple repetitions could be a follow-up. If I do think now about the behavior for multiple repetitions, I'm not sure I would call the issue a failure mode. From a statistical point of view, doing the repetitions using a fixed number of iterations may be quite sensible. An advantage would be that basic versions of theoretical results from statistics would give some support to the approach. Such as basic versions of the central limit theorem to characterize theoretically the variability of a partial sum of a fixed number of random variables. The current behavior for multiple repetitions, and keeping it unchanged for the new convergence criterion, also appears quite clear and easy to understand. The first repetition fixes the number of iterations. It can be done via a maximum number of iterations, a minimum number of time to apply, and/or, now, a minimum relative accuracy. Then, in subsequent repetitions, the fixed number of iterations is reused. Perhaps doing the experiment of trying to write a piece of doc about the new convergence criterion could be a good next step. Those would be my two cents, coming more from a background of statistics. |
6947d6c
to
d3990ce
Compare
Clean up the initial commit Further cleaning of initial commit. Add test. Improvements to comments thanks to review Reformat thanks to clang format. Static cast to avoid conversion warning
d3990ce
to
35285be
Compare
This is work in progress to try to add a statistical stopping criterion. While the existing statistics features seek to characterise the variability between repetitions of the benchmark, the statistical stopping criterion in this PR seeks to ensure that within a single repetition, the estimate of the mean duration has become sufficiently statistically accurate.
This is a draft.