Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[JSON] Benchmark IDs, Names, Grouping Support: Better Tooling #616

Closed
wants to merge 12 commits into from
Closed

[JSON] Benchmark IDs, Names, Grouping Support: Better Tooling #616

wants to merge 12 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Jun 12, 2018

This pull request is the same as #599, but is rebased after the formatting changes and only includes the JSON changes, leaving out console and CSV for later (or never).

5 fields are now in place on JSON data

  • "base_name" (the name one registers with the benchmark)
  • "threads" (the number of threads for that instance of the benchmark)
  • "repetitions" (the number of repetitions for the benchmark)
  • "id" (the instance id of the benchmark)
  • "family" (the id of the family of benchmarks spawned by a single call + parameters passed to a RegisterBenchmark/BENCHMARK()/etc. call).

All tests were updated to look for, inspect, and demand the new fields be present.

There are 2 new generated IDs asides from what comes from the composed "name" attribute of each benchmark.

"id"

this is a number tied to each created instance of a benchmark, and is unique across every single instance of that benchmark. It uniquely ties a single registered benchmark to the statistics generated for that benchmark. It solves the original problem in #567.

"family"

This is a number tied to a single family of benchmarks: for example, a benchmark registered with BENCHMARK(my_benchmark)->Range(8, 8 << 10); will generate many, many different instances for the range, but they all conceptually belong to this same "family". Families are useful for knowing each intentionally registered group. It also gets past potential self-hostile users who register benchmarks under the exact same name: they can be differentiated by their family ID.

The rest of the fields are a destructing of the typical name: base_name is the core name that is registered with the benchmark. threads is the used thread count for the benchmark. repetitions is the number of repetitions performed, set at the command line or directly on the benchmark itself.

Each one of these allows for optimal and easy grouping, and allows individual runs of a repeating benchmark to have their stats group together with others. name parsing / shenanigans will continue to
work as before.

Console output changes and CSV output changes are left for another Pull Request, potentially as a follow-up to this one.

@coveralls
Copy link

coveralls commented Jun 13, 2018

Coverage Status

Coverage increased (+0.3%) to 87.192% when pulling 69a9e90 on BaaMeow:feature/names_id_json_tooling into af441fc on google:master.

@ghost
Copy link
Author

ghost commented Jun 13, 2018

Some test cases for the edge / adversarial cases:

#include <benchmark/benchmark.h>

static void adversarial(benchmark::State &state)
{
	for (auto _ : state)
	{
	}
}
BENCHMARK(adversarial);
BENCHMARK(adversarial);

static void many_range_rep(benchmark::State &state)
{
	for (auto _ : state)
	{
	}
}
BENCHMARK(many_range_rep);
BENCHMARK(many_range_rep)->Repetitions(3);
BENCHMARK(many_range_rep)->Range(8, 8 << 10);
BENCHMARK(many_range_rep)->Range(8, 8 << 10)->Repetitions(3);

static void common_prefix(benchmark::State &state)
{
	for (auto _ : state)
	{
	}
}
static void common_prefix_mean(benchmark::State &state)
{
	for (auto _ : state)
	{
	}
}
BENCHMARK(common_prefix);
BENCHMARK(common_prefix_mean);

JSON output from above test-case benchmarks

@AppVeyorBot
Copy link

Build benchmark 1286 completed (commit 9a8871dc30 by @baameow)

Copy link
Collaborator

@LebedevRI LebedevRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.
At least can be read :)
The test coverage is bad.
None of the tests actually demonstrate how id, family_id behaves.
And yeah, even though it is of less importance given family_id, the base_name looks just wrong.

src/complexity.cc Show resolved Hide resolved
src/json_reporter.cc Outdated Show resolved Hide resolved
@@ -142,6 +158,11 @@ BENCHMARK(BM_Counters_Threads)->ThreadRange(1, 8);
ADD_CASES(TC_ConsoleOut, {{"^BM_Counters_Threads/threads:%int %console_report "
"bar=%hrfloat foo=%hrfloat$"}});
ADD_CASES(TC_JSONOut, {{"\"name\": \"BM_Counters_Threads/threads:%int\",$"},
{"\"base_name\": \"BM_Counters_Threads\",$", MR_Next},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still bothering me. It just looks completely wrong.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full name is {base_name}\threads:{thread}\{repetitions}. This PR provides access to the parts, and lets you re-assemble them and group them arbitrarily to your liking. We provide ID numbers for instances, so you can pick statistics, and provide family ID numbers, so you can differentiate between benchmarks which are -- for whatever reason -- registered with the same name.

@dmah42
Copy link
Member

dmah42 commented Jun 13, 2018

In the output example you have:

      "repetitions": -3689348814741910324,
      "threads": -3689348814741910324,

this doesn't look right.

Also, if the ID only serves to group foo:repeats/3 and foo:repeats/3_mean, but you want to also know when it's a statistical output vs a BM run, you could likely add a field for whether it's statistical. maybe a "stats" field that would be "mean" or "median" that could be queried.

I'm still missing why unique IDs are necessary when we have any number of names that can be generated, that at least will be consistent across runs.

@AppVeyorBot
Copy link

Build benchmark 1288 completed (commit b121917ed9 by @baameow)

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

@ghost
Copy link
Author

ghost commented Aug 10, 2018

I'm still working on this, just as an FYI. I've just been terribly busy!

I guess I'm not very good at rebasing / merges (or PGP and commit signing). Sorry about that; it looked okay on my local box but I guess I have some more work to do!

@ghost
Copy link
Author

ghost commented Aug 15, 2018

@dominichamon I added the name "stats" with the name of the statistic (including for RMS and BigO). That mostly solves the problem, since there's a distinct way to tell runs apart now.

I still think "family" is useful as an ID, since I would want to group things without having to hobble together the name. There are also cases where someone can register things of the exact same name. For example, below is a collection of degenerately named and used benchmarks. This is what the output looks like.

With the name "stats", we can maybe get rid of id. I'm still hesitant to do so, though: you would need to parse multiple fields (base_name, stats, repetitions/iterations/threads/etc.) to make sure you had one unique grouping, especially in the case of degenerate names as illustrated below.

#include <benchmark/benchmark.h>

static void adversarial(benchmark::State& state) {
	int x = 0;
	for (auto _ : state) {
		x += 2;
		benchmark::DoNotOptimize(x);
	}
}
BENCHMARK(adversarial);
BENCHMARK(adversarial);

static void common_prefix(benchmark::State& state) {
	for (auto _ : state) {
	}
}
static void common_prefix_mean(benchmark::State& state) {
	for (auto _ : state) {
	}
}
static void using_the_underscore_poorly_(benchmark::State& state) {
	for (auto _ : state) {
	}
}
BENCHMARK(common_prefix)->Repetitions(3);
BENCHMARK(common_prefix_mean)->Repetitions(3);
BENCHMARK(using_the_underscore_poorly_)->Repetitions(3);

void complexity(benchmark::State& state) {
	for (auto _ : state) {
		for (int i = 0; i < 1024; ++i) {
			benchmark::DoNotOptimize(&i);
		}
	}
	state.SetComplexityN(state.range(0));
}
BENCHMARK(complexity)->Range(1, 1 << 18)->Complexity(benchmark::o1);
BENCHMARK(complexity)->Range(1, 1 << 18)->Complexity();
BENCHMARK(complexity)->Range(1, 1 << 18)->Complexity([](int64_t) {
	return 1.0;
});

std::vector<int> random_vector(int64_t size) {
	std::vector<int> v;
	v.reserve(static_cast<int>(size));
	for (int i = 0; i < size; ++i) {
		v.push_back(static_cast<int>(std::rand() % size));
	}
	return v;
}

void complexity_rms(benchmark::State& state) {
	auto v = random_vector(state.range(0));
	const int64_t item_not_in_vector = state.range(0) * 2;
	for (auto _ : state) {
		benchmark::DoNotOptimize(std::find(v.begin(), v.end(), item_not_in_vector));
	}
	state.SetComplexityN(state.range(0));
}
BENCHMARK(complexity_rms)
	->RangeMultiplier(2)
	->Range(1 << 10, 1 << 16)
	->Complexity(benchmark::oN);
BENCHMARK(complexity_rms)
	->RangeMultiplier(2)
	->Range(1 << 10, 1 << 16)
	->Complexity([](int64_t n) -> double { return static_cast<double>(n); });
BENCHMARK(complexity_rms)
	->RangeMultiplier(2)
	->Range(1 << 10, 1 << 16)
	->Complexity();

static void many_range_rep(benchmark::State& state) {
	for (auto _ : state) {
	}
}
BENCHMARK(many_range_rep);
BENCHMARK(many_range_rep)->Repetitions(3);
BENCHMARK(many_range_rep)->Range(8, 8 << 10);
BENCHMARK(many_range_rep)->Range(8, 8 << 10)->Repetitions(3);

@LebedevRI
Copy link
Collaborator

PR tips:

  • Fetch the remotes (the google/benchmark) repo
  • Rebase your local git master branch on top of git master of google/benchmark repo
  • Push that to your github repo clone
  • Rebase your WIP branch ontop of your new git master
  • [force]Push that to your github repo clone

Else there is a lot of unrelated noise in the PR.

add some tests around looking for backslahes or forward slashes, to handle Linux/Windows differences and the need for escaping
add stats name to tests
@googlebot
Copy link

CLAs look good, thanks!

@ghost
Copy link
Author

ghost commented Aug 16, 2018

@LebedevRI Thanks for the tips; I think I managed to clean it up. The googlebot seems to agree; I just need to check over the diffs to make sure they're no longer super messy...

There's a few places where the clang-format er had fun, but it didn't seem to drastically alter the presentation of anything. If something looks wrong do let me know: otherwise, everything looks to be okay (just need to wait for Appveyor and Travis-CI to report that everything is fine).

@AppVeyorBot
Copy link

Build benchmark 1367 failed (commit 4768af1dbe by @baameow)

@LebedevRI
Copy link
Collaborator

I think I managed to clean it up.

Thanks, this is indeed better.

It would be best if you could split the \\\\ change (+ tests!) into a separate pr,
because it can be merged right away, separately from the rest of this.

@ghost ghost mentioned this pull request Aug 16, 2018
@ghost
Copy link
Author

ghost commented Aug 16, 2018

@LebedevRI Done! #652

@AppVeyorBot
Copy link

Build benchmark 1368 completed (commit 7d7a0aec1d by @baameow)

@AppVeyorBot
Copy link

Build benchmark 1371 completed (commit e023a94ade by @baameow)

@LebedevRI
Copy link
Collaborator

Ok, that is great, some progress. Now, how about let's try to split this a bit more.

I think it would be completely uncontroversial to introduce that stats field.
@dominichamon, @EricWF ?

Can you split it into yet another PR? :)

this is what copy-paste gets me: small errors that derp the build, waugh
@AppVeyorBot
Copy link

Build benchmark 1373 completed (commit ea8b468b39 by @baameow)

…ting to report `stats_group` to be extremely clear
@ghost
Copy link
Author

ghost commented Aug 22, 2018

@LebedevRI Sure, if anyone else chimes in and says just the stats part is wanted I will be more than happy to break it off into another PR and leave the ID bits for later.

As a small update, I changed id in the JSON to be the much more descriptive "stats_group" label. It just makes a bit more sense than ìd.

@AppVeyorBot
Copy link

Build benchmark 1386 completed (commit 08614d7ca6 by @baameow)

@AppVeyorBot
Copy link

Build benchmark 1387 completed (commit e2f4089675 by @baameow)

@LebedevRI
Copy link
Collaborator

I had another look, and i still stand with the same opinion :/
I think this does the right thing, and does move in the right direction.
But. It implements loads of different (but related, yes) things at once:

  • Store existing data in the JSON:
    1. threads
    2. repetitions
    3. aggregate name (statistic, rms, bigo)
      These three are great, and completely uncontroversial on their own. I'd merge them.
  • Add newish data:
    • ID's.
    • Store more name variants.
      Again, both of these most likely make sense.
      But It's really hard to tell (due to the diff's size) if they do the sane thing or not.
      Also, they really should have separate tests, that actually check the names/ids.

So i would still suggest to split this up further...

LebedevRI added a commit to LebedevRI/benchmark that referenced this pull request Sep 13, 2018
This is related to BaaMeow's work in
google#616
but is not based on it.

Two new fields are tracked, and dumped into JSON:
* If the run is an aggregate, the aggregate's name is stored.
  It can be RMS, BigO, mean, median, stddev, or any custom stat name.
* The aggregate-name-less run name is additionally stored.
  I.e. not some name of the benchmark function, but the actual
  name, but without the 'aggregate name' suffix.

This way one can group/filter all the runs,
and filter by the particular aggregate type.

I *might* need this for further tooling improvement.
Or maybe not.
But this is certainly worthwhile for custom tooling.
LebedevRI added a commit that referenced this pull request Sep 13, 2018
…#675)

This is related to @baameow's work in #616 but is not based on it.

Two new fields are tracked, and dumped into JSON:
* If the run is an aggregate, the aggregate's name is stored.
  It can be RMS, BigO, mean, median, stddev, or any custom stat name.
* The aggregate-name-less run name is additionally stored.
  I.e. not some name of the benchmark function, but the actual
  name, but without the 'aggregate name' suffix.

This way one can group/filter all the runs,
and filter by the particular aggregate type.

I *might* need this for further tooling improvement.
Or maybe not.
But this is certainly worthwhile for custom tooling.
JBakamovic pushed a commit to JBakamovic/benchmark that referenced this pull request Dec 6, 2018
…google#675)

This is related to @baameow's work in google#616 but is not based on it.

Two new fields are tracked, and dumped into JSON:
* If the run is an aggregate, the aggregate's name is stored.
  It can be RMS, BigO, mean, median, stddev, or any custom stat name.
* The aggregate-name-less run name is additionally stored.
  I.e. not some name of the benchmark function, but the actual
  name, but without the 'aggregate name' suffix.

This way one can group/filter all the runs,
and filter by the particular aggregate type.

I *might* need this for further tooling improvement.
Or maybe not.
But this is certainly worthwhile for custom tooling.
@ghost
Copy link
Author

ghost commented Dec 14, 2018

@LebedevRI Ping pong: sorry for the quiet! I've been way too busy for my own good, but I'm back now and I have quite a bit of free time!

I wanted to get up to speed again and help move forward some of the wanted things. Judging from #675, it looks like a stats_name (aggregate name) was pulled into a separate PR by you and handled nicely, yay!

Should I go ahead and add a new PR for threads and repetition counts, since those are uncontroversial? I can probably get those done by Monday, and then we can figure out if additional identifiers are still worth it!

@ghost
Copy link
Author

ghost commented Dec 23, 2018

With #748 going live, if its accepted then the only thing that might be needed are some of the family IDs for the case of clashing and degenerate benchmark names. I'll make a new pull request with the last of that after #748 is reviewed.

Happy Holidays to everyone! 🎄

@ghost
Copy link
Author

ghost commented Mar 29, 2019

I won't be pursuing the IDs. While helpful for bringing me to the exact right place for the data samples, I'm sure I can figure out other things to get around it.

If someone else wants to put IDs in the benchmarks and samples, feel free.

@ghost ghost closed this Mar 29, 2019
This pull request was closed.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants