C RecMatMul Using BenchmarkRunner #749

Soroosh129 · 2021-11-12T16:45:35Z

This adds back the BenchmarkRunner reactor and makes use of it in the RecMatMul (MatMul.lf) benchmark.

@petervdonovan When I use the benchmark runner script to run this benchmark, I get validation failed messages. Do you know what might be the reason for this?

$ ./run_benchmark.py benchmark=savina_parallelism_recmatmul.yaml target=lf-c
[2021-11-12 10:41:02,614][bash][INFO] - ---- Start execution at time Fri Nov 12 10:40:59 2021
[2021-11-12 10:41:02,614][bash][INFO] - ---- plus 319410005 nanoseconds.
[2021-11-12 10:41:02,614][bash][INFO] - Benchmark: ThreadRingReactorLFCppBenchmark
[2021-11-12 10:41:02,614][bash][INFO] - System information
[2021-11-12 10:41:02,614][bash][INFO] - O/S Name: Linux
[2021-11-12 10:41:02,614][bash][INFO] - Validation failed for (i,j)=(1, 179) with (181506.000000, 183296.000000)
[2021-11-12 10:41:02,614][bash][INFO] - Iteration: 1	 Duration: 341.537 msec
[2021-11-12 10:41:02,614][bash][INFO] - Validation failed for (i,j)=(2, 663) with (1345890.000000, 1357824.000000)
[2021-11-12 10:41:02,614][bash][INFO] - Iteration: 2	 Duration: 267.330 msec
[2021-11-12 10:41:02,614][bash][INFO] - Validation failed for (i,j)=(1, 727) with (729908.000000, 744448.000000)
[2021-11-12 10:41:02,614][bash][INFO] - Iteration: 3	 Duration: 265.782 msec
[2021-11-12 10:41:02,614][bash][INFO] - Validation failed for (i,j)=(2, 277) with (558432.000000, 567296.000000)
[2021-11-12 10:41:02,614][bash][INFO] - Iteration: 4	 Duration: 267.440 msec
[2021-11-12 10:41:02,614][bash][INFO] - Validation failed for (i,j)=(1, 288) with (290304.000000, 294912.000000)
[2021-11-12 10:41:02,614][bash][INFO] - Iteration: 5	 Duration: 270.877 msec
[2021-11-12 10:41:02,614][bash][INFO] - Validation failed for (i,j)=(1, 1017) with (1017000.000000, 1041408.000000)
[2021-11-12 10:41:02,614][bash][INFO] - Iteration: 6	 Duration: 265.654 msec
[2021-11-12 10:41:02,614][bash][INFO] - Validation failed for (i,j)=(1, 256) with (259328.000000, 262144.000000)
[2021-11-12 10:41:02,614][bash][INFO] - Iteration: 7	 Duration: 269.025 msec
[2021-11-12 10:41:02,615][bash][INFO] - Validation failed for (i,j)=(1, 951) with (943392.000000, 973824.000000)
[2021-11-12 10:41:02,615][bash][INFO] - Iteration: 8	 Duration: 269.129 msec
[2021-11-12 10:41:02,615][bash][INFO] - Validation failed for (i,j)=(2, 790) with (1545240.000000, 1617920.000000)
[2021-11-12 10:41:02,615][bash][INFO] - Iteration: 9	 Duration: 270.081 msec
[2021-11-12 10:41:02,615][bash][INFO] - Validation failed for (i,j)=(1, 987) with (631680.000000, 1010688.000000)
[2021-11-12 10:41:02,615][bash][INFO] - Iteration: 10	 Duration: 266.024 msec
[2021-11-12 10:41:02,615][bash][INFO] - Validation failed for (i,j)=(1, 390) with (396240.000000, 399360.000000)
[2021-11-12 10:41:02,615][bash][INFO] - Iteration: 11	 Duration: 269.198 msec
[2021-11-12 10:41:02,615][bash][INFO] - Validation failed for (i,j)=(1, 847) with (864787.000000, 867328.000000)
[2021-11-12 10:41:02,615][bash][INFO] - Iteration: 12	 Duration: 267.104 msec
[2021-11-12 10:41:02,615][bash][INFO] - Execution - Summary:
[2021-11-12 10:41:02,615][bash][INFO] - Best Time:	 265.782 msec
[2021-11-12 10:41:02,615][bash][INFO] - Worst Time:	 341.537 msec
[2021-11-12 10:41:02,615][bash][INFO] - Median Time:	 268.065 msec
[2021-11-12 10:41:02,615][bash][INFO] - ---- Elapsed logical time (in nsec): 0
[2021-11-12 10:41:02,615][bash][INFO] - ---- Elapsed physical time (in nsec): 3,294,657,984

petervdonovan · 2021-11-12T16:49:53Z

The benchmark has a race condition, so we expect some entries to have smaller values than they should. When you run it with a single thread, do the validation failed messages go away?

(The number of rows and columns must also be a power of 2, but I don't think that's the issue here.)

Soroosh129 · 2021-11-12T16:52:24Z

The benchmark has a race condition, so we expect some entries to have smaller values than they should.

I see. Interesting.

When you run it with a single thread, do the validation failed messages go away?

Yes they do. Thanks for the clarification.

benchmark/C/Savina/src/parallelism/MatMul.lf

Co-authored-by: Peter Donovan <33707478+petervdonovan@users.noreply.github.com>

cmnrd

Thanks! This helps a lot, but for some reason C++ is still a bit faster (C++ is blue in the plot below).

I don't think we need to worry about it though.

Note that I significantly simplified the C++ benchmark runner. It only has a start and a finished port now.

I think in order to merge this we should add this version including the runner alongside the original benchmark. Unfortunately, the python runner script will not parse the output of this modified benchmark correctly.

petervdonovan · 2021-11-15T07:55:50Z

I don't think we need to worry about it though.

I do not disagree, but in case anyone is still concerned: On my machine, median execution time decreased from 5.5s to 3.1s for the single-threaded runtime when I used this function

double* transposed_mat_at_d(matrix_t matrix, size_t i, size_t j) {
    return mat_at_d(matrix, j, i);
}

as a replacement for mat_at_d when accessing the B matrix (both writing and reading).

Apparently transposing a matrix in this way is a standard trick for speeding up matrix multiplication (because of cache performance).

I know that our results are invalid if there are algorithmic differences between implementations, but this seems fairly low-level. If I understand correctly, it is little more than a change in how we interact with the hardware prefetcher, which seems mild enough compared to what a JIT compiler can do. Then again, it appears to me that the C version with the benchmark runner but without transposing might be the most similar to the C++ version, judging from the first and subsequent execution times on my machine.

One might also just dismiss this as silliness: It only highlights the fact that this benchmark tells us more about the content of the inner loop than it tells us about the runtime.

cmnrd · 2021-11-15T11:17:25Z

Interesting! I don't think this explains the small gap between C and C++ though (unless the C++ compiler is smart enough to make this optimization automatically). I don't think it would be unfair to optimize the matrix access, but we should do it similarly in both the C++ and the C version. Then, however, I would expect to see the same gap again.

This is based on a suggestion from Christian, since the Python runner script does not parse the output of the modified benchmark correctly. The original benchmark has a couple of minor corrections that also appear in the one that uses the benchmark runner.

cmnrd

Looks good! Thank you both

Soroosh129 added 2 commits November 12, 2021 10:29

Resurrected the benchmark runner reactor

0c38637

Updated RecMatMul to use the BenchmarkRunner reactor

febe824

Soroosh129 requested review from cmnrd and petervdonovan November 12, 2021 16:45

petervdonovan reviewed Nov 12, 2021

View reviewed changes

benchmark/C/Savina/src/parallelism/MatMul.lf Outdated Show resolved Hide resolved

Update benchmark/C/Savina/src/parallelism/MatMul.lf

5a42cf8

Co-authored-by: Peter Donovan <33707478+petervdonovan@users.noreply.github.com>

cmnrd reviewed Nov 14, 2021

View reviewed changes

petervdonovan added 2 commits November 16, 2021 01:07

Transpose B matrix for C, C++ versions.

cc26429

cmnrd approved these changes Nov 22, 2021

View reviewed changes

cmnrd merged commit f230123 into master Nov 22, 2021

cmnrd deleted the c-matmul-benchmarkRunner branch November 22, 2021 08:11

cmnrd mentioned this pull request Nov 22, 2021

Use the BenchmarkRunner reactor in all C benchmarks #764

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C RecMatMul Using BenchmarkRunner #749

C RecMatMul Using BenchmarkRunner #749

Soroosh129 commented Nov 12, 2021 •

edited

Loading

petervdonovan commented Nov 12, 2021 •

edited

Loading

Soroosh129 commented Nov 12, 2021

cmnrd left a comment

petervdonovan commented Nov 15, 2021

cmnrd commented Nov 15, 2021

cmnrd left a comment

C RecMatMul Using BenchmarkRunner #749

C RecMatMul Using BenchmarkRunner #749

Conversation

Soroosh129 commented Nov 12, 2021 • edited Loading

petervdonovan commented Nov 12, 2021 • edited Loading

Soroosh129 commented Nov 12, 2021

cmnrd left a comment

Choose a reason for hiding this comment

petervdonovan commented Nov 15, 2021

cmnrd commented Nov 15, 2021

cmnrd left a comment

Choose a reason for hiding this comment

Soroosh129 commented Nov 12, 2021 •

edited

Loading

petervdonovan commented Nov 12, 2021 •

edited

Loading