Implementation of random interleaving. (#1105)

* Implementation of random interleaving. See http://github.com/google/benchmark/issues/1051 for the feature requests. Committer: Hai Huang (http://github.com/haih-g) On branch fr-1051 Changes to be committed: modified: include/benchmark/benchmark.h modified: src/benchmark.cc new file: src/benchmark_adjust_repetitions.cc new file: src/benchmark_adjust_repetitions.h modified: src/benchmark_api_internal.cc modified: src/benchmark_api_internal.h modified: src/benchmark_register.cc modified: src/benchmark_runner.cc modified: src/benchmark_runner.h modified: test/CMakeLists.txt new file: test/benchmark_random_interleaving_gtest.cc * Fix benchmark_random_interleaving_gtest.cc for fr-1051 Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark.cc modified: src/benchmark_runner.cc modified: test/benchmark_random_interleaving_gtest.cc * Fix macos build for fr-1051 Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark_api_internal.cc modified: src/benchmark_api_internal.h modified: src/benchmark_runner.cc * Fix macos and windows build for fr-1051. Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark_runner.cc * Fix benchmark_random_interleaving_test.cc for macos and windows in fr-1051 Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: test/benchmark_random_interleaving_gtest.cc * Fix int type benchmark_random_interleaving_gtest for macos in fr-1051 Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: test/benchmark_random_interleaving_gtest.cc * Address dominichamon's comments 03/29 for fr-1051 Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark.cc modified: src/benchmark_api_internal.cc modified: src/benchmark_api_internal.h modified: test/benchmark_random_interleaving_gtest.cc * Address dominichamon's comment on default min_time / repetitions for fr-1051. Also change sentinel of random_interleaving_repetitions to -1. Hopefully it fixes the failures on Windows. Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark.cc modified: src/benchmark_api_internal.cc modified: src/benchmark_api_internal.h * Fix windows test failures for fr-1051 Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark_api_internal.cc modified: src/benchmark_runner.cc * Add license blurb for fr-1051. Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark_adjust_repetitions.cc modified: src/benchmark_adjust_repetitions.h * Switch to std::shuffle() for fr-1105. Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark.cc * Change to 1e-9 in fr-1105 Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark_adjust_repetitions.cc * Fix broken build caused by bad merge for fr-1105. Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark_api_internal.cc modified: src/benchmark_runner.cc * Fix build breakage for fr-1051. Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark.cc modified: src/benchmark_api_internal.cc modified: src/benchmark_api_internal.h modified: src/benchmark_register.cc modified: src/benchmark_runner.cc * Print out reports as they come in if random interleaving is disabled (fr-1051) Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark.cc * size_t, int64_t --> int in benchmark_runner for fr-1051. Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark_runner.cc modified: src/benchmark_runner.h * Address comments from dominichamon for fr-1051 Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark.cc modified: src/benchmark_adjust_repetitions.cc modified: src/benchmark_adjust_repetitions.h modified: src/benchmark_api_internal.cc modified: src/benchmark_api_internal.h modified: test/benchmark_random_interleaving_gtest.cc * benchmar_indices --> size_t to make CI pass: fr-1051 Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark.cc * Fix min_time not initialized issue for fr-1051. Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark_api_internal.cc modified: src/benchmark_api_internal.h * min_time --> MinTime in fr-1051. Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: src/benchmark_api_internal.cc modified: src/benchmark_api_internal.h modified: src/benchmark_runner.cc * Add doc for random interleaving for fr-1051 Committer: Hai Huang <haih@google.com> On branch fr-1051 Your branch is up to date with 'origin/fr-1051'. Changes to be committed: modified: README.md new file: docs/random_interleaving.md Co-authored-by: Dominic Hamon <dominichamon@users.noreply.github.com>
google · May 20, 2021 · a6a738c · a6a738c
1 parent c983c3e
commit a6a738c
Show file tree

Hide file tree

Showing 11 changed files with 772 additions and 85 deletions.
diff --git a/README.md b/README.md
@@ -180,7 +180,7 @@ BENCHMARK_MAIN();
 ```
 
 To run the benchmark, compile and link against the `benchmark` library
-(libbenchmark.a/.so). If you followed the build steps above, this library will 
+(libbenchmark.a/.so). If you followed the build steps above, this library will
 be under the build directory you created.
 
 ```bash
@@ -300,6 +300,8 @@ too (`-lkstat`).
 
 [Setting the Time Unit](#setting-the-time-unit)
 
+[Random Interleaving](docs/random_interleaving.md)
+
 [User-Requested Performance Counters](docs/perf_counters.md)
 
 [Preventing Optimization](#preventing-optimization)
@@ -400,8 +402,8 @@ Write benchmark results to a file with the `--benchmark_out=<filename>` option
 (or set `BENCHMARK_OUT`). Specify the output format with
 `--benchmark_out_format={json|console|csv}` (or set
 `BENCHMARK_OUT_FORMAT={json|console|csv}`). Note that the 'csv' reporter is
-deprecated and the saved `.csv` file 
-[is not parsable](https://github.com/google/benchmark/issues/794) by csv 
+deprecated and the saved `.csv` file
+[is not parsable](https://github.com/google/benchmark/issues/794) by csv
 parsers.
 
 Specifying `--benchmark_out` does not suppress the console output.

diff --git a/docs/random_interleaving.md b/docs/random_interleaving.md
@@ -0,0 +1,26 @@
+<a name="interleaving" />
+
+# Random Interleaving
+
+[Random Interleaving](https://github.com/google/benchmark/issues/1051) is a
+technique to lower run-to-run variance. It breaks the execution of a
+microbenchmark into multiple chunks and randomly interleaves them with chunks
+from other microbenchmarks in the same benchmark test. Data shows it is able to
+lower run-to-run variance by
+[40%](https://github.com/google/benchmark/issues/1051) on average.
+
+To use, set `--benchmark_enable_random_interleaving=true`.
+
+It's a known issue that random interleaving may increase the benchmark execution
+time, if:
+
+1.  A benchmark has costly setup and / or teardown. Random interleaving will run
+    setup and teardown many times and may increase test execution time
+    significantly.
+2.  The time to run a single benchmark iteration is larger than the desired time
+    per repetition (i.e., `benchmark_min_time / benchmark_repetitions`).
+
+The overhead of random interleaving can be controlled by
+`--benchmark_random_interleaving_max_overhead`. The default value is 0.4 meaning
+the total execution time under random interlaving is limited by 1.4 x original
+total execution time. Set it to `inf` for unlimited overhead.
diff --git a/src/benchmark.cc b/src/benchmark.cc
@@ -33,8 +33,10 @@
 #include <cstdlib>
 #include <fstream>
 #include <iostream>
+#include <limits>
 #include <map>
 #include <memory>
+#include <random>
 #include <string>
 #include <thread>
 #include <utility>
@@ -54,6 +56,18 @@
 #include "thread_manager.h"
 #include "thread_timer.h"
 
+// Each benchmark can be repeated a number of times, and within each
+// *repetition*, we run the user-defined benchmark function a number of
+// *iterations*. The number of repetitions is determined based on flags
+// (--benchmark_repetitions).
+namespace {
+
+// Attempt to make each repetition run for at least this much of time.
+constexpr double kDefaultMinTimeTotalSecs = 0.5;
+constexpr int kRandomInterleavingDefaultRepetitions = 12;
+
+}  // namespace
+
 // Print a list of benchmarks. This option overrides all other options.
 DEFINE_bool(benchmark_list_tests, false);
 
@@ -62,16 +76,39 @@ DEFINE_bool(benchmark_list_tests, false);
 // linked into the binary are run.
 DEFINE_string(benchmark_filter, ".");
 
-// Minimum number of seconds we should run benchmark before results are
-// considered significant.  For cpu-time based tests, this is the lower bound
-// on the total cpu time used by all threads that make up the test.  For
-// real-time based tests, this is the lower bound on the elapsed time of the
-// benchmark execution, regardless of number of threads.
-DEFINE_double(benchmark_min_time, 0.5);
+// Do NOT read these flags directly. Use Get*() to read them.
+namespace do_not_read_flag_directly {
+
+// Minimum number of seconds we should run benchmark per repetition before
+// results are considered significant. For cpu-time based tests, this is the
+// lower bound on the total cpu time used by all threads that make up the test.
+// For real-time based tests, this is the lower bound on the elapsed time of the
+// benchmark execution, regardless of number of threads. If left unset, will use
+// kDefaultMinTimeTotalSecs / FLAGS_benchmark_repetitions, if random
+// interleaving is enabled. Otherwise, will use kDefaultMinTimeTotalSecs.
+// Do NOT read this flag directly. Use GetMinTime() to read this flag.
+DEFINE_double(benchmark_min_time, -1.0);
 
 // The number of runs of each benchmark. If greater than 1, the mean and
-// standard deviation of the runs will be reported.
-DEFINE_int32(benchmark_repetitions, 1);
+// standard deviation of the runs will be reported. By default, the number of
+// repetitions is 1 if random interleaving is disabled, and up to
+// kDefaultRepetitions if random interleaving is enabled. (Read the
+// documentation for random interleaving to see why it might be less than
+// kDefaultRepetitions.)
+// Do NOT read this flag directly, Use GetRepetitions() to access this flag.
+DEFINE_int32(benchmark_repetitions, -1);
+
+}  // namespace do_not_read_flag_directly
+
+// The maximum overhead allowed for random interleaving. A value X means total
+// execution time under random interleaving is limited by
+// (1 + X) * original total execution time. Set to 'inf' to allow infinite
+// overhead.
+DEFINE_double(benchmark_random_interleaving_max_overhead, 0.4);
+
+// If set, enable random interleaving. See
+// http://github.com/google/benchmark/issues/1051 for details.
+DEFINE_bool(benchmark_enable_random_interleaving, false);
 
 // Report the result of each benchmark repetitions. When 'true' is specified
 // only the mean, standard deviation, and other statistics are reported for
@@ -122,6 +159,30 @@ DEFINE_kvpairs(benchmark_context, {});
 
 std::map<std::string, std::string>* global_context = nullptr;
 
+// Performance measurements always come with random variances. Defines a
+// factor by which the required number of iterations is overestimated in order
+// to reduce the probability that the minimum time requirement will not be met.
+const double kSafetyMultiplier = 1.4;
+
+// Wraps --benchmark_min_time and returns valid default values if not supplied.
+double GetMinTime() {
+  const double default_min_time = kDefaultMinTimeTotalSecs / GetRepetitions();
+  const double flag_min_time =
+      do_not_read_flag_directly::FLAGS_benchmark_min_time;
+  return flag_min_time >= 0.0 ? flag_min_time : default_min_time;
+}
+
+// Wraps --benchmark_repetitions and return valid default value if not supplied.
+int GetRepetitions() {
+  const int default_repetitions =
+      FLAGS_benchmark_enable_random_interleaving
+          ? kRandomInterleavingDefaultRepetitions
+          : 1;
+  const int flag_repetitions =
+      do_not_read_flag_directly::FLAGS_benchmark_repetitions;
+  return flag_repetitions >= 0 ? flag_repetitions : default_repetitions;
+}
+
 // FIXME: wouldn't LTO mess this up?
 void UseCharPointer(char const volatile*) {}
 
@@ -241,23 +302,57 @@ void State::FinishKeepRunning() {
 namespace internal {
 namespace {
 
+// Flushes streams after invoking reporter methods that write to them. This
+// ensures users get timely updates even when streams are not line-buffered.
+void FlushStreams(BenchmarkReporter* reporter) {
+  if (!reporter) return;
+  std::flush(reporter->GetOutputStream());
+  std::flush(reporter->GetErrorStream());
+};
+
+// Reports in both display and file reporters.
+void Report(BenchmarkReporter* display_reporter,
+            BenchmarkReporter* file_reporter, const RunResults& run_results) {
+  auto report_one = [](BenchmarkReporter* reporter,
+                       bool aggregates_only,
+                       const RunResults& results) {
+    assert(reporter);
+    // If there are no aggregates, do output non-aggregates.
+    aggregates_only &= !results.aggregates_only.empty();
+    if (!aggregates_only)
+      reporter->ReportRuns(results.non_aggregates);
+    if (!results.aggregates_only.empty())
+      reporter->ReportRuns(results.aggregates_only);
+  };
+
+  report_one(display_reporter, run_results.display_report_aggregates_only,
+             run_results);
+  if (file_reporter)
+    report_one(file_reporter, run_results.file_report_aggregates_only,
+               run_results);
+
+  FlushStreams(display_reporter);
+  FlushStreams(file_reporter);
+};
+
 void RunBenchmarks(const std::vector<BenchmarkInstance>& benchmarks,
                    BenchmarkReporter* display_reporter,
                    BenchmarkReporter* file_reporter) {
   // Note the file_reporter can be null.
   CHECK(display_reporter != nullptr);
 
   // Determine the width of the name field using a minimum width of 10.
-  bool might_have_aggregates = FLAGS_benchmark_repetitions > 1;
+  bool might_have_aggregates = GetRepetitions() > 1;
   size_t name_field_width = 10;
   size_t stat_field_width = 0;
   for (const BenchmarkInstance& benchmark : benchmarks) {
     name_field_width =
         std::max<size_t>(name_field_width, benchmark.name().str().size());
     might_have_aggregates |= benchmark.repetitions() > 1;
 
-    for (const auto& Stat : benchmark.statistics())
+    for (const auto& Stat : benchmark.statistics()) {
       stat_field_width = std::max<size_t>(stat_field_width, Stat.name_.size());
+    }
   }
   if (might_have_aggregates) name_field_width += 1 + stat_field_width;
 
@@ -268,45 +363,61 @@ void RunBenchmarks(const std::vector<BenchmarkInstance>& benchmarks,
   // Keep track of running times of all instances of current benchmark
   std::vector<BenchmarkReporter::Run> complexity_reports;
 
-  // We flush streams after invoking reporter methods that write to them. This
-  // ensures users get timely updates even when streams are not line-buffered.
-  auto flushStreams = [](BenchmarkReporter* reporter) {
-    if (!reporter) return;
-    std::flush(reporter->GetOutputStream());
-    std::flush(reporter->GetErrorStream());
-  };
-
   if (display_reporter->ReportContext(context) &&
       (!file_reporter || file_reporter->ReportContext(context))) {
-    flushStreams(display_reporter);
-    flushStreams(file_reporter);
-
-    for (const auto& benchmark : benchmarks) {
-      RunResults run_results = RunBenchmark(benchmark, &complexity_reports);
-
-      auto report = [&run_results](BenchmarkReporter* reporter,
-                                   bool report_aggregates_only) {
-        assert(reporter);
-        // If there are no aggregates, do output non-aggregates.
-        report_aggregates_only &= !run_results.aggregates_only.empty();
-        if (!report_aggregates_only)
-          reporter->ReportRuns(run_results.non_aggregates);
-        if (!run_results.aggregates_only.empty())
-          reporter->ReportRuns(run_results.aggregates_only);
-      };
-
-      report(display_reporter, run_results.display_report_aggregates_only);
-      if (file_reporter)
-        report(file_reporter, run_results.file_report_aggregates_only);
-
-      flushStreams(display_reporter);
-      flushStreams(file_reporter);
+    FlushStreams(display_reporter);
+    FlushStreams(file_reporter);
+
+    // Without random interleaving, benchmarks are executed in the order of:
+    //   A, A, ..., A, B, B, ..., B, C, C, ..., C, ...
+    // That is, repetition is within RunBenchmark(), hence the name
+    // inner_repetitions.
+    // With random interleaving, benchmarks are executed in the order of:
+    //  {Random order of A, B, C, ...}, {Random order of A, B, C, ...}, ...
+    // That is, repetitions is outside of RunBenchmark(), hence the name
+    // outer_repetitions.
+    int inner_repetitions =
+        FLAGS_benchmark_enable_random_interleaving ? 1 : GetRepetitions();
+    int outer_repetitions =
+        FLAGS_benchmark_enable_random_interleaving ? GetRepetitions() : 1;
+    std::vector<size_t> benchmark_indices(benchmarks.size());
+    for (size_t i = 0; i < benchmarks.size(); ++i) {
+      benchmark_indices[i] = i;
+    }
+
+    std::random_device rd;
+    std::mt19937 g(rd());
+    // 'run_results_vector' and 'benchmarks' are parallel arrays.
+    std::vector<RunResults> run_results_vector(benchmarks.size());
+    for (int i = 0; i < outer_repetitions; i++) {
+      if (FLAGS_benchmark_enable_random_interleaving) {
+        std::shuffle(benchmark_indices.begin(), benchmark_indices.end(), g);
+      }
+      for (size_t j : benchmark_indices) {
+        // Repetitions will be automatically adjusted under random interleaving.
+        if (!FLAGS_benchmark_enable_random_interleaving ||
+            i < benchmarks[j].RandomInterleavingRepetitions()) {
+          RunBenchmark(benchmarks[j], outer_repetitions, inner_repetitions,
+                       &complexity_reports, &run_results_vector[j]);
+          if (!FLAGS_benchmark_enable_random_interleaving) {
+            // Print out reports as they come in.
+            Report(display_reporter, file_reporter, run_results_vector.at(j));
+          }
+        }
+      }
+    }
+
+    if (FLAGS_benchmark_enable_random_interleaving) {
+      // Print out all reports at the end of the test.
+      for (const RunResults& run_results : run_results_vector) {
+        Report(display_reporter, file_reporter, run_results);
+      }
     }
   }
   display_reporter->Finalize();
   if (file_reporter) file_reporter->Finalize();
-  flushStreams(display_reporter);
-  flushStreams(file_reporter);
+  FlushStreams(display_reporter);
+  FlushStreams(file_reporter);
 }
 
 // Disable deprecated warnings temporarily because we need to reference
@@ -456,6 +567,7 @@ void PrintUsageAndExit() {
           "          [--benchmark_filter=<regex>]\n"
           "          [--benchmark_min_time=<min_time>]\n"
           "          [--benchmark_repetitions=<num_repetitions>]\n"
+          "          [--benchmark_enable_random_interleaving={true|false}]\n"
           "          [--benchmark_report_aggregates_only={true|false}]\n"
           "          [--benchmark_display_aggregates_only={true|false}]\n"
           "          [--benchmark_format=<console|json|csv>]\n"
@@ -476,10 +588,16 @@ void ParseCommandLineFlags(int* argc, char** argv) {
     if (ParseBoolFlag(argv[i], "benchmark_list_tests",
                       &FLAGS_benchmark_list_tests) ||
         ParseStringFlag(argv[i], "benchmark_filter", &FLAGS_benchmark_filter) ||
-        ParseDoubleFlag(argv[i], "benchmark_min_time",
-                        &FLAGS_benchmark_min_time) ||
-        ParseInt32Flag(argv[i], "benchmark_repetitions",
-                       &FLAGS_benchmark_repetitions) ||
+        ParseDoubleFlag(
+            argv[i], "benchmark_min_time",
+            &do_not_read_flag_directly::FLAGS_benchmark_min_time) ||
+        ParseInt32Flag(
+            argv[i], "benchmark_repetitions",
+            &do_not_read_flag_directly::FLAGS_benchmark_repetitions) ||
+        ParseBoolFlag(argv[i], "benchmark_enable_random_interleaving",
+                      &FLAGS_benchmark_enable_random_interleaving) ||
+        ParseDoubleFlag(argv[i], "benchmark_random_interleaving_max_overhead",
+                        &FLAGS_benchmark_random_interleaving_max_overhead) ||
         ParseBoolFlag(argv[i], "benchmark_report_aggregates_only",
                       &FLAGS_benchmark_report_aggregates_only) ||
         ParseBoolFlag(argv[i], "benchmark_display_aggregates_only",