-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[JSON] Benchmark IDs, Names, Grouping Support: Better Tooling #616
Conversation
Some test cases for the edge / adversarial cases:
|
✅ Build benchmark 1286 completed (commit 9a8871dc30 by @baameow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
At least can be read :)
The test coverage is bad.
None of the tests actually demonstrate how id
, family_id
behaves.
And yeah, even though it is of less importance given family_id
, the base_name
looks just wrong.
@@ -142,6 +158,11 @@ BENCHMARK(BM_Counters_Threads)->ThreadRange(1, 8); | |||
ADD_CASES(TC_ConsoleOut, {{"^BM_Counters_Threads/threads:%int %console_report " | |||
"bar=%hrfloat foo=%hrfloat$"}}); | |||
ADD_CASES(TC_JSONOut, {{"\"name\": \"BM_Counters_Threads/threads:%int\",$"}, | |||
{"\"base_name\": \"BM_Counters_Threads\",$", MR_Next}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still bothering me. It just looks completely wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full name is {base_name}\threads:{thread}\{repetitions}
. This PR provides access to the parts, and lets you re-assemble them and group them arbitrarily to your liking. We provide ID numbers for instances, so you can pick statistics, and provide family ID numbers, so you can differentiate between benchmarks which are -- for whatever reason -- registered with the same name.
In the output example you have:
this doesn't look right. Also, if the ID only serves to group foo:repeats/3 and foo:repeats/3_mean, but you want to also know when it's a statistical output vs a BM run, you could likely add a field for whether it's statistical. maybe a "stats" field that would be "mean" or "median" that could be queried. I'm still missing why unique IDs are necessary when we have any number of names that can be generated, that at least will be consistent across runs. |
✅ Build benchmark 1288 completed (commit b121917ed9 by @baameow) |
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the |
I'm still working on this, just as an FYI. I've just been terribly busy! I guess I'm not very good at rebasing / merges (or PGP and commit signing). Sorry about that; it looked okay on my local box but I guess I have some more work to do! |
@dominichamon I added the name "stats" with the name of the statistic (including for RMS and BigO). That mostly solves the problem, since there's a distinct way to tell runs apart now. I still think "family" is useful as an ID, since I would want to group things without having to hobble together the name. There are also cases where someone can register things of the exact same name. For example, below is a collection of degenerately named and used benchmarks. This is what the output looks like. With the name "stats", we can maybe get rid of #include <benchmark/benchmark.h>
static void adversarial(benchmark::State& state) {
int x = 0;
for (auto _ : state) {
x += 2;
benchmark::DoNotOptimize(x);
}
}
BENCHMARK(adversarial);
BENCHMARK(adversarial);
static void common_prefix(benchmark::State& state) {
for (auto _ : state) {
}
}
static void common_prefix_mean(benchmark::State& state) {
for (auto _ : state) {
}
}
static void using_the_underscore_poorly_(benchmark::State& state) {
for (auto _ : state) {
}
}
BENCHMARK(common_prefix)->Repetitions(3);
BENCHMARK(common_prefix_mean)->Repetitions(3);
BENCHMARK(using_the_underscore_poorly_)->Repetitions(3);
void complexity(benchmark::State& state) {
for (auto _ : state) {
for (int i = 0; i < 1024; ++i) {
benchmark::DoNotOptimize(&i);
}
}
state.SetComplexityN(state.range(0));
}
BENCHMARK(complexity)->Range(1, 1 << 18)->Complexity(benchmark::o1);
BENCHMARK(complexity)->Range(1, 1 << 18)->Complexity();
BENCHMARK(complexity)->Range(1, 1 << 18)->Complexity([](int64_t) {
return 1.0;
});
std::vector<int> random_vector(int64_t size) {
std::vector<int> v;
v.reserve(static_cast<int>(size));
for (int i = 0; i < size; ++i) {
v.push_back(static_cast<int>(std::rand() % size));
}
return v;
}
void complexity_rms(benchmark::State& state) {
auto v = random_vector(state.range(0));
const int64_t item_not_in_vector = state.range(0) * 2;
for (auto _ : state) {
benchmark::DoNotOptimize(std::find(v.begin(), v.end(), item_not_in_vector));
}
state.SetComplexityN(state.range(0));
}
BENCHMARK(complexity_rms)
->RangeMultiplier(2)
->Range(1 << 10, 1 << 16)
->Complexity(benchmark::oN);
BENCHMARK(complexity_rms)
->RangeMultiplier(2)
->Range(1 << 10, 1 << 16)
->Complexity([](int64_t n) -> double { return static_cast<double>(n); });
BENCHMARK(complexity_rms)
->RangeMultiplier(2)
->Range(1 << 10, 1 << 16)
->Complexity();
static void many_range_rep(benchmark::State& state) {
for (auto _ : state) {
}
}
BENCHMARK(many_range_rep);
BENCHMARK(many_range_rep)->Repetitions(3);
BENCHMARK(many_range_rep)->Range(8, 8 << 10);
BENCHMARK(many_range_rep)->Range(8, 8 << 10)->Repetitions(3); |
PR tips:
Else there is a lot of unrelated noise in the PR. |
add some tests around looking for backslahes or forward slashes, to handle Linux/Windows differences and the need for escaping add stats name to tests
CLAs look good, thanks! |
@LebedevRI Thanks for the tips; I think I managed to clean it up. The googlebot seems to agree; I just need to check over the diffs to make sure they're no longer super messy... There's a few places where the clang-format er had fun, but it didn't seem to drastically alter the presentation of anything. If something looks wrong do let me know: otherwise, everything looks to be okay (just need to wait for Appveyor and Travis-CI to report that everything is fine). |
❌ Build benchmark 1367 failed (commit 4768af1dbe by @baameow) |
Thanks, this is indeed better. It would be best if you could split the |
@LebedevRI Done! #652 |
✅ Build benchmark 1368 completed (commit 7d7a0aec1d by @baameow) |
✅ Build benchmark 1371 completed (commit e023a94ade by @baameow) |
Ok, that is great, some progress. Now, how about let's try to split this a bit more. I think it would be completely uncontroversial to introduce that Can you split it into yet another PR? :) |
this is what copy-paste gets me: small errors that derp the build, waugh
✅ Build benchmark 1373 completed (commit ea8b468b39 by @baameow) |
…ting to report `stats_group` to be extremely clear
@LebedevRI Sure, if anyone else chimes in and says just the As a small update, I changed |
✅ Build benchmark 1386 completed (commit 08614d7ca6 by @baameow) |
✅ Build benchmark 1387 completed (commit e2f4089675 by @baameow) |
I had another look, and i still stand with the same opinion :/
So i would still suggest to split this up further... |
This is related to BaaMeow's work in google#616 but is not based on it. Two new fields are tracked, and dumped into JSON: * If the run is an aggregate, the aggregate's name is stored. It can be RMS, BigO, mean, median, stddev, or any custom stat name. * The aggregate-name-less run name is additionally stored. I.e. not some name of the benchmark function, but the actual name, but without the 'aggregate name' suffix. This way one can group/filter all the runs, and filter by the particular aggregate type. I *might* need this for further tooling improvement. Or maybe not. But this is certainly worthwhile for custom tooling.
…#675) This is related to @baameow's work in #616 but is not based on it. Two new fields are tracked, and dumped into JSON: * If the run is an aggregate, the aggregate's name is stored. It can be RMS, BigO, mean, median, stddev, or any custom stat name. * The aggregate-name-less run name is additionally stored. I.e. not some name of the benchmark function, but the actual name, but without the 'aggregate name' suffix. This way one can group/filter all the runs, and filter by the particular aggregate type. I *might* need this for further tooling improvement. Or maybe not. But this is certainly worthwhile for custom tooling.
…google#675) This is related to @baameow's work in google#616 but is not based on it. Two new fields are tracked, and dumped into JSON: * If the run is an aggregate, the aggregate's name is stored. It can be RMS, BigO, mean, median, stddev, or any custom stat name. * The aggregate-name-less run name is additionally stored. I.e. not some name of the benchmark function, but the actual name, but without the 'aggregate name' suffix. This way one can group/filter all the runs, and filter by the particular aggregate type. I *might* need this for further tooling improvement. Or maybe not. But this is certainly worthwhile for custom tooling.
@LebedevRI Ping pong: sorry for the quiet! I've been way too busy for my own good, but I'm back now and I have quite a bit of free time! I wanted to get up to speed again and help move forward some of the wanted things. Judging from #675, it looks like a stats_name (aggregate name) was pulled into a separate PR by you and handled nicely, yay! Should I go ahead and add a new PR for threads and repetition counts, since those are uncontroversial? I can probably get those done by Monday, and then we can figure out if additional identifiers are still worth it! |
I won't be pursuing the IDs. While helpful for bringing me to the exact right place for the data samples, I'm sure I can figure out other things to get around it. If someone else wants to put IDs in the benchmarks and samples, feel free. |
This pull request is the same as #599, but is rebased after the formatting changes and only includes the JSON changes, leaving out console and CSV for later (or never).
5 fields are now in place on JSON data
All tests were updated to look for, inspect, and demand the new fields be present.
There are 2 new generated IDs asides from what comes from the composed "name" attribute of each benchmark.
"id"
this is a number tied to each created instance of a benchmark, and is unique across every single instance of that benchmark. It uniquely ties a single registered benchmark to the statistics generated for that benchmark. It solves the original problem in #567.
"family"
This is a number tied to a single family of benchmarks: for example, a benchmark registered with
BENCHMARK(my_benchmark)->Range(8, 8 << 10);
will generate many, many different instances for the range, but they all conceptually belong to this same "family". Families are useful for knowing each intentionally registered group. It also gets past potential self-hostile users who register benchmarks under the exact same name: they can be differentiated by their family ID.The rest of the fields are a destructing of the typical name:
base_name
is the core name that is registered with the benchmark.threads
is the used thread count for the benchmark.repetitions
is the number of repetitions performed, set at the command line or directly on the benchmark itself.Each one of these allows for optimal and easy grouping, and allows individual runs of a repeating benchmark to have their stats group together with others.
name
parsing / shenanigans will continue towork as before.
Console output changes and CSV output changes are left for another Pull Request, potentially as a follow-up to this one.