brew install python3
mkdir BUILD-release
cd BUILD-release
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build .
sudo nice -n -20 ./macos-core-to-core-latency > results.log
# sudo nice -n -20 ./macos-core-to-core-latency -r 10 > results.log # shorter test run
python3 -m venv path/to/venv
source path/to/venv/bin/activate
python3 -m pip install -r requirements.txt
python3 ./macos-core-to-core-latency.py results.log
deactivate
open results.png
The benchmark is sensitive to CPU core frequencies. Therefore, connect your device to power to avoid power throttling. As for thermal throttling (e.g. on fanless systems like the Macbook Air), there aren't many options: External cooling, tuning for overall shorter runs.
You can change the following variables in main.cpp. This can help in case the runs don't finish or take too long.
constexpr int iterationsPerExperiment = 2000; // could be lowered to 1000
// dummyWorkloadLoopLength runs on idle threads to make sure all threads are scheduled concurrently
constexpr int dummyWorkloadLoopLength = iterationsPerExperiment*1024; // empirical - Try raising by 2-4x on M Ultra, because it currently only roughly covers an experiment at 100ns core-to-core latency
int targetExperiments = 300; // -r argument // can be lowered further
constexpr bool optionWarmup = true;
constexpr bool optionEstimateFrequency = true;
macos-core-to-core-latency.py
has the -v
option to print out more statistics about the measurement.
Alternatively you can look at the logs directly.
Cores 0, 1, 2, 3 are E-cores, the others are P-cores.
Cores 0, 1, 2, 3 are E-cores, the others are P-cores.
Cores 0, 1, 2, 3 are E-cores, the others are P-cores.
The measurement in PR #3 took almost a day, and is not very stable. Most likely the ~minimum of each cluster is the true latency of the whole cluster.
Cores 0, 1, 2, 3 are E-cores, the others are P-cores.
Cores 0, 1 are E-cores, the others are P-cores.