[JSON] Benchmark IDs, Names, Grouping Support: Better Tooling #616

ghost · 2018-06-12T23:58:50Z

This pull request is the same as #599, but is rebased after the formatting changes and only includes the JSON changes, leaving out console and CSV for later (or never).

5 fields are now in place on JSON data

"base_name" (the name one registers with the benchmark)
"threads" (the number of threads for that instance of the benchmark)
"repetitions" (the number of repetitions for the benchmark)
"id" (the instance id of the benchmark)
"family" (the id of the family of benchmarks spawned by a single call + parameters passed to a RegisterBenchmark/BENCHMARK()/etc. call).

All tests were updated to look for, inspect, and demand the new fields be present.

There are 2 new generated IDs asides from what comes from the composed "name" attribute of each benchmark.

"id"

this is a number tied to each created instance of a benchmark, and is unique across every single instance of that benchmark. It uniquely ties a single registered benchmark to the statistics generated for that benchmark. It solves the original problem in #567.

"family"

This is a number tied to a single family of benchmarks: for example, a benchmark registered with BENCHMARK(my_benchmark)->Range(8, 8 << 10); will generate many, many different instances for the range, but they all conceptually belong to this same "family". Families are useful for knowing each intentionally registered group. It also gets past potential self-hostile users who register benchmarks under the exact same name: they can be differentiated by their family ID.

The rest of the fields are a destructing of the typical name: base_name is the core name that is registered with the benchmark. threads is the used thread count for the benchmark. repetitions is the number of repetitions performed, set at the command line or directly on the benchmark itself.

Each one of these allows for optimal and easy grouping, and allows individual runs of a repeating benchmark to have their stats group together with others. name parsing / shenanigans will continue to
work as before.

Console output changes and CSV output changes are left for another Pull Request, potentially as a follow-up to this one.

make sure id_ is defined

coveralls · 2018-06-13T00:30:05Z

Coverage increased (+0.3%) to 87.192% when pulling 69a9e90 on BaaMeow:feature/names_id_json_tooling into af441fc on google:master.

ghost · 2018-06-13T00:32:57Z

Some test cases for the edge / adversarial cases:

#include <benchmark/benchmark.h>

static void adversarial(benchmark::State &state)
{
	for (auto _ : state)
	{
	}
}
BENCHMARK(adversarial);
BENCHMARK(adversarial);

static void many_range_rep(benchmark::State &state)
{
	for (auto _ : state)
	{
	}
}
BENCHMARK(many_range_rep);
BENCHMARK(many_range_rep)->Repetitions(3);
BENCHMARK(many_range_rep)->Range(8, 8 << 10);
BENCHMARK(many_range_rep)->Range(8, 8 << 10)->Repetitions(3);

static void common_prefix(benchmark::State &state)
{
	for (auto _ : state)
	{
	}
}
static void common_prefix_mean(benchmark::State &state)
{
	for (auto _ : state)
	{
	}
}
BENCHMARK(common_prefix);
BENCHMARK(common_prefix_mean);

JSON output from above test-case benchmarks

AppVeyorBot · 2018-06-13T01:03:59Z

✅ Build benchmark 1286 completed (commit 9a8871dc30 by @baameow)

LebedevRI

Nice.
At least can be read :)
The test coverage is bad.
None of the tests actually demonstrate how id, family_id behaves.
And yeah, even though it is of less importance given family_id, the base_name looks just wrong.

src/complexity.cc

src/json_reporter.cc

LebedevRI · 2018-06-13T06:26:57Z

test/user_counters_test.cc

@@ -142,6 +158,11 @@ BENCHMARK(BM_Counters_Threads)->ThreadRange(1, 8);
 ADD_CASES(TC_ConsoleOut, {{"^BM_Counters_Threads/threads:%int %console_report "
                           "bar=%hrfloat foo=%hrfloat$"}});
 ADD_CASES(TC_JSONOut, {{"\"name\": \"BM_Counters_Threads/threads:%int\",$"},
+                       {"\"base_name\": \"BM_Counters_Threads\",$", MR_Next},


This is still bothering me. It just looks completely wrong.

The full name is {base_name}\threads:{thread}\{repetitions}. This PR provides access to the parts, and lets you re-assemble them and group them arbitrarily to your liking. We provide ID numbers for instances, so you can pick statistics, and provide family ID numbers, so you can differentiate between benchmarks which are -- for whatever reason -- registered with the same name.

dmah42 · 2018-06-13T08:00:01Z

In the output example you have:

      "repetitions": -3689348814741910324,
      "threads": -3689348814741910324,

this doesn't look right.

Also, if the ID only serves to group foo:repeats/3 and foo:repeats/3_mean, but you want to also know when it's a statistical output vs a BM run, you could likely add a field for whether it's statistical. maybe a "stats" field that would be "mean" or "median" that could be queried.

I'm still missing why unique IDs are necessary when we have any number of names that can be generated, that at least will be consistent across runs.

AppVeyorBot · 2018-06-17T21:34:22Z

✅ Build benchmark 1288 completed (commit b121917ed9 by @baameow)

googlebot · 2018-08-10T15:30:41Z

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

ghost · 2018-08-10T22:11:28Z

I'm still working on this, just as an FYI. I've just been terribly busy!

I guess I'm not very good at rebasing / merges (or PGP and commit signing). Sorry about that; it looked okay on my local box but I guess I have some more work to do!

ghost · 2018-08-15T14:56:51Z

@dominichamon I added the name "stats" with the name of the statistic (including for RMS and BigO). That mostly solves the problem, since there's a distinct way to tell runs apart now.

I still think "family" is useful as an ID, since I would want to group things without having to hobble together the name. There are also cases where someone can register things of the exact same name. For example, below is a collection of degenerately named and used benchmarks. This is what the output looks like.

With the name "stats", we can maybe get rid of id. I'm still hesitant to do so, though: you would need to parse multiple fields (base_name, stats, repetitions/iterations/threads/etc.) to make sure you had one unique grouping, especially in the case of degenerate names as illustrated below.

#include <benchmark/benchmark.h>

static void adversarial(benchmark::State& state) {
	int x = 0;
	for (auto _ : state) {
		x += 2;
		benchmark::DoNotOptimize(x);
	}
}
BENCHMARK(adversarial);
BENCHMARK(adversarial);

static void common_prefix(benchmark::State& state) {
	for (auto _ : state) {
	}
}
static void common_prefix_mean(benchmark::State& state) {
	for (auto _ : state) {
	}
}
static void using_the_underscore_poorly_(benchmark::State& state) {
	for (auto _ : state) {
	}
}
BENCHMARK(common_prefix)->Repetitions(3);
BENCHMARK(common_prefix_mean)->Repetitions(3);
BENCHMARK(using_the_underscore_poorly_)->Repetitions(3);

void complexity(benchmark::State& state) {
	for (auto _ : state) {
		for (int i = 0; i < 1024; ++i) {
			benchmark::DoNotOptimize(&i);
		}
	}
	state.SetComplexityN(state.range(0));
}
BENCHMARK(complexity)->Range(1, 1 << 18)->Complexity(benchmark::o1);
BENCHMARK(complexity)->Range(1, 1 << 18)->Complexity();
BENCHMARK(complexity)->Range(1, 1 << 18)->Complexity([](int64_t) {
	return 1.0;
});

std::vector<int> random_vector(int64_t size) {
	std::vector<int> v;
	v.reserve(static_cast<int>(size));
	for (int i = 0; i < size; ++i) {
		v.push_back(static_cast<int>(std::rand() % size));
	}
	return v;
}

void complexity_rms(benchmark::State& state) {
	auto v = random_vector(state.range(0));
	const int64_t item_not_in_vector = state.range(0) * 2;
	for (auto _ : state) {
		benchmark::DoNotOptimize(std::find(v.begin(), v.end(), item_not_in_vector));
	}
	state.SetComplexityN(state.range(0));
}
BENCHMARK(complexity_rms)
	->RangeMultiplier(2)
	->Range(1 << 10, 1 << 16)
	->Complexity(benchmark::oN);
BENCHMARK(complexity_rms)
	->RangeMultiplier(2)
	->Range(1 << 10, 1 << 16)
	->Complexity([](int64_t n) -> double { return static_cast<double>(n); });
BENCHMARK(complexity_rms)
	->RangeMultiplier(2)
	->Range(1 << 10, 1 << 16)
	->Complexity();

static void many_range_rep(benchmark::State& state) {
	for (auto _ : state) {
	}
}
BENCHMARK(many_range_rep);
BENCHMARK(many_range_rep)->Repetitions(3);
BENCHMARK(many_range_rep)->Range(8, 8 << 10);
BENCHMARK(many_range_rep)->Range(8, 8 << 10)->Repetitions(3);

LebedevRI · 2018-08-15T15:06:33Z

PR tips:

Fetch the remotes (the google/benchmark) repo
Rebase your local git master branch on top of git master of google/benchmark repo
Push that to your github repo clone
Rebase your WIP branch ontop of your new git master
[force]Push that to your github repo clone

Else there is a lot of unrelated noise in the PR.

add some tests around looking for backslahes or forward slashes, to handle Linux/Windows differences and the need for escaping add stats name to tests

googlebot · 2018-08-16T13:42:31Z

CLAs look good, thanks!

…_tooling

ghost · 2018-08-16T13:51:53Z

@LebedevRI Thanks for the tips; I think I managed to clean it up. The googlebot seems to agree; I just need to check over the diffs to make sure they're no longer super messy...

There's a few places where the clang-format er had fun, but it didn't seem to drastically alter the presentation of anything. If something looks wrong do let me know: otherwise, everything looks to be okay (just need to wait for Appveyor and Travis-CI to report that everything is fine).

AppVeyorBot · 2018-08-16T13:54:25Z

❌ Build benchmark 1367 failed (commit 4768af1dbe by @baameow)

LebedevRI · 2018-08-16T14:13:23Z

I think I managed to clean it up.

Thanks, this is indeed better.

It would be best if you could split the \\\\ change (+ tests!) into a separate pr,
because it can be merged right away, separately from the rest of this.

ghost · 2018-08-16T14:44:27Z

@LebedevRI Done! #652

AppVeyorBot · 2018-08-16T15:08:33Z

✅ Build benchmark 1368 completed (commit 7d7a0aec1d by @baameow)

AppVeyorBot · 2018-08-16T17:33:13Z

✅ Build benchmark 1371 completed (commit e023a94ade by @baameow)

LebedevRI · 2018-08-16T17:45:24Z

Ok, that is great, some progress. Now, how about let's try to split this a bit more.

I think it would be completely uncontroversial to introduce that stats field.
@dominichamon, @EricWF ?

Can you split it into yet another PR? :)

this is what copy-paste gets me: small errors that derp the build, waugh

AppVeyorBot · 2018-08-16T20:21:14Z

✅ Build benchmark 1373 completed (commit ea8b468b39 by @baameow)

…ting to report `stats_group` to be extremely clear

ghost · 2018-08-22T18:51:47Z

@LebedevRI Sure, if anyone else chimes in and says just the stats part is wanted I will be more than happy to break it off into another PR and leave the ID bits for later.

As a small update, I changed id in the JSON to be the much more descriptive "stats_group" label. It just makes a bit more sense than ìd.

AppVeyorBot · 2018-08-22T19:47:09Z

✅ Build benchmark 1386 completed (commit 08614d7ca6 by @baameow)

AppVeyorBot · 2018-08-22T20:54:27Z

✅ Build benchmark 1387 completed (commit e2f4089675 by @baameow)

LebedevRI · 2018-09-05T18:51:37Z

I had another look, and i still stand with the same opinion :/
I think this does the right thing, and does move in the right direction.
But. It implements loads of different (but related, yes) things at once:

Store existing data in the JSON:
1. threads
2. repetitions
3. aggregate name (statistic, rms, bigo)
  These three are great, and completely uncontroversial on their own. I'd merge them.
Add newish data:
- ID's.
- Store more name variants.
  Again, both of these most likely make sense.
  But It's really hard to tell (due to the diff's size) if they do the sane thing or not.
  Also, they really should have separate tests, that actually check the names/ids.

So i would still suggest to split this up further...

This is related to BaaMeow's work in google#616 but is not based on it. Two new fields are tracked, and dumped into JSON: * If the run is an aggregate, the aggregate's name is stored. It can be RMS, BigO, mean, median, stddev, or any custom stat name. * The aggregate-name-less run name is additionally stored. I.e. not some name of the benchmark function, but the actual name, but without the 'aggregate name' suffix. This way one can group/filter all the runs, and filter by the particular aggregate type. I *might* need this for further tooling improvement. Or maybe not. But this is certainly worthwhile for custom tooling.

@baameow

…#675) This is related to @baameow's work in #616 but is not based on it. Two new fields are tracked, and dumped into JSON: * If the run is an aggregate, the aggregate's name is stored. It can be RMS, BigO, mean, median, stddev, or any custom stat name. * The aggregate-name-less run name is additionally stored. I.e. not some name of the benchmark function, but the actual name, but without the 'aggregate name' suffix. This way one can group/filter all the runs, and filter by the particular aggregate type. I *might* need this for further tooling improvement. Or maybe not. But this is certainly worthwhile for custom tooling.

@baameow

…google#675) This is related to @baameow's work in google#616 but is not based on it. Two new fields are tracked, and dumped into JSON: * If the run is an aggregate, the aggregate's name is stored. It can be RMS, BigO, mean, median, stddev, or any custom stat name. * The aggregate-name-less run name is additionally stored. I.e. not some name of the benchmark function, but the actual name, but without the 'aggregate name' suffix. This way one can group/filter all the runs, and filter by the particular aggregate type. I *might* need this for further tooling improvement. Or maybe not. But this is certainly worthwhile for custom tooling.

ghost · 2018-12-14T03:36:51Z

@LebedevRI Ping pong: sorry for the quiet! I've been way too busy for my own good, but I'm back now and I have quite a bit of free time!

I wanted to get up to speed again and help move forward some of the wanted things. Judging from #675, it looks like a stats_name (aggregate name) was pulled into a separate PR by you and handled nicely, yay!

Should I go ahead and add a new PR for threads and repetition counts, since those are uncontroversial? I can probably get those done by Monday, and then we can figure out if additional identifiers are still worth it!

ghost · 2018-12-23T10:08:40Z

With #748 going live, if its accepted then the only thing that might be needed are some of the family IDs for the case of clashing and degenerate benchmark names. I'll make a new pull request with the last of that after #748 is reviewed.

Happy Holidays to everyone! 🎄

ghost · 2019-03-29T09:41:04Z

I won't be pursuing the IDs. While helpful for bringing me to the exact right place for the data samples, I'm sure I can figure out other things to get around it.

If someone else wants to put IDs in the benchmarks and samples, feel free.

BaaMeow added 2 commits June 11, 2018 21:06

make changes to support json tooling only and prepare to commit those

01862ac

update tests to check the JSON output

8f77f82

make sure id_ is defined

googlebot added the cla: yes label Jun 12, 2018

LebedevRI reviewed Jun 13, 2018

View reviewed changes

fix missing statistics adding of repetitions and threads

0f74958

googlebot added cla: no and removed cla: yes labels Aug 10, 2018

Merge remote-tracking branch 'root/master' into feature/json_tooling

919936e

add a singular, direct "stats name" variable

d6ffcbf

add some tests around looking for backslahes or forward slashes, to handle Linux/Windows differences and the need for escaping add stats name to tests

googlebot added cla: yes and removed cla: no labels Aug 16, 2018

Merge remote-tracking branch 'root/master' into feature/names_id_json…

aa557e4

…_tooling

update memory manager test; prepare to split off \\\\ and / PR...

5ff8075

ghost mentioned this pull request Aug 16, 2018

properly escape json names #652

Merged

BaaMeow added 2 commits August 16, 2018 10:45

revert windows pathname escaping

95829c4

Woops. _Actually_ revert the changes, aha. ;;

a617280

fix base_name to actually have the desired base_name for the test

97d85f6

this is what copy-paste gets me: small errors that derp the build, waugh

LebedevRI mentioned this pull request Aug 20, 2018

Track 'type' of the run - is it an actual measurement, or an aggregate. #658

Merged

change id to group_id in code to be more clear, and change JSON repor…

64eb948

…ting to report `stats_group` to be extremely clear

Merge branch 'master' into feature/names_id_json_tooling

69a9e90

LebedevRI mentioned this pull request Sep 13, 2018

Track two more details about runs - the aggregate name, and run name. #675

Merged

ghost mentioned this pull request Dec 23, 2018

[JSON] add threads and repetitions to the json output #748

Merged

ghost closed this Mar 29, 2019

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JSON] Benchmark IDs, Names, Grouping Support: Better Tooling #616

[JSON] Benchmark IDs, Names, Grouping Support: Better Tooling #616

ghost commented Jun 12, 2018

coveralls commented Jun 13, 2018 •

edited

Loading

ghost commented Jun 13, 2018

AppVeyorBot commented Jun 13, 2018

LebedevRI left a comment

LebedevRI Jun 13, 2018

ghost Jun 13, 2018

dmah42 commented Jun 13, 2018

AppVeyorBot commented Jun 17, 2018

googlebot commented Aug 10, 2018

ghost commented Aug 10, 2018

ghost commented Aug 15, 2018 •

edited by ghost

Loading

LebedevRI commented Aug 15, 2018

googlebot commented Aug 16, 2018

ghost commented Aug 16, 2018

AppVeyorBot commented Aug 16, 2018

LebedevRI commented Aug 16, 2018

ghost commented Aug 16, 2018

AppVeyorBot commented Aug 16, 2018

AppVeyorBot commented Aug 16, 2018

LebedevRI commented Aug 16, 2018

AppVeyorBot commented Aug 16, 2018

ghost commented Aug 22, 2018

AppVeyorBot commented Aug 22, 2018

AppVeyorBot commented Aug 22, 2018

LebedevRI commented Sep 5, 2018

ghost commented Dec 14, 2018

ghost commented Dec 23, 2018

ghost commented Mar 29, 2019

[JSON] Benchmark IDs, Names, Grouping Support: Better Tooling #616

[JSON] Benchmark IDs, Names, Grouping Support: Better Tooling #616

Conversation

ghost commented Jun 12, 2018

"id"

"family"

coveralls commented Jun 13, 2018 • edited Loading

ghost commented Jun 13, 2018

AppVeyorBot commented Jun 13, 2018

LebedevRI left a comment

Choose a reason for hiding this comment

LebedevRI Jun 13, 2018

Choose a reason for hiding this comment

ghost Jun 13, 2018

Choose a reason for hiding this comment

dmah42 commented Jun 13, 2018

AppVeyorBot commented Jun 17, 2018

googlebot commented Aug 10, 2018

ghost commented Aug 10, 2018

ghost commented Aug 15, 2018 • edited by ghost Loading

LebedevRI commented Aug 15, 2018

googlebot commented Aug 16, 2018

ghost commented Aug 16, 2018

AppVeyorBot commented Aug 16, 2018

LebedevRI commented Aug 16, 2018

ghost commented Aug 16, 2018

AppVeyorBot commented Aug 16, 2018

AppVeyorBot commented Aug 16, 2018

LebedevRI commented Aug 16, 2018

AppVeyorBot commented Aug 16, 2018

ghost commented Aug 22, 2018

AppVeyorBot commented Aug 22, 2018

AppVeyorBot commented Aug 22, 2018

LebedevRI commented Sep 5, 2018

ghost commented Dec 14, 2018

ghost commented Dec 23, 2018

ghost commented Mar 29, 2019

coveralls commented Jun 13, 2018 •

edited

Loading

ghost commented Aug 15, 2018 •

edited by ghost

Loading