higher CV_PAUSE cost on skylake #22852

vrabaud · 2022-11-23T09:56:02Z

System Information

OpenCV version: 4.6.0
Operating System / Platform: Custom Linux
Compiler & compiler version: Custom clang

Detailed description

On Intel architectures, CV_PAUSE is implemented with __mm_pause:

opencv/modules/core/src/parallel_impl.cpp

Line 47 in 6ca205a

    
           #   define CV_PAUSE(v) do { for (int __delay = (v); __delay > 0; --__delay) { _mm_pause(); } } while (0)

But it is called with the same number of loops independently from the architecture:

opencv/modules/core/src/parallel_impl.cpp

Line 393 in 6ca205a

CV_PAUSE(16);

And the cost of __mm_pause went from 5 micro-ops on Haswell to 140 on Skylake thus creating more CPU consumption from the Threadpool on Skylake.

This is documented (as well as a workaround) here: https://www.intel.com/content/www/us/en/developer/articles/technical/a-common-construct-to-avoid-the-contention-of-threads-architecture-agnostic-spin-wait-loops.html

Steps to reproduce

Profiling any multi-threaded code on Haswell and then Skylake.

Issue submission checklist

I report the issue, it's not a question
I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
I updated to the latest OpenCV version and the issue is still there
There is reproducer code and related data files (videos, images, onnx, etc)

The text was updated successfully, but these errors were encountered:

kallaballa · 2022-11-23T12:05:07Z

Hi! I am a contributor. From what I read my Tigerlake should be affected too? If so, I should see a considerable performance improvement on multi-threaded code (with sufficient lock contention) if i change it to CV_PAUSE(1);?

kallaballa · 2022-11-23T12:07:45Z

Also, could you provide a minimal program (reproducer) that exhibits that behaviour?

vrabaud · 2022-11-23T12:23:22Z

Anything after Skylake should be affected from what I have found.

A reproducer is tough: the cost of computation in a thread is usually higher (except if you have more than 100 cores) and if you want to benchmark with callgrind, it will use SIMD emulation which might not work with __mm_pause.

I could only find that "bug" thanks to private tooling and scale.

kallaballa · 2022-11-23T12:29:16Z

I changed CV_PAUSE(16); to CV_PAUSE(1); but fail to produce a program that shows signs of improvement. At the moment I simply measure execution times (actually FPS in my case) but also profiling with perf (through hotspot) doesn't show a significant difference.
Still investigating.

Anyway, cool find!

kallaballa · 2022-11-24T06:13:55Z

Anything after Skylake should be affected from what I have found.

A reproducer is tough: the cost of computation in a thread is usually higher (except if you have more than 100 cores) and if you want to benchmark with callgrind, it will use SIMD emulation which might not work with __mm_pause.

I could only find that "bug" thanks to private tooling and scale.

I understand your argument of scale but still I would like to put the difference to effect somehow. btw. why would you profile with callgrind?

vrabaud · 2022-11-24T10:35:16Z

For callgrind, I was just suggesting the CPU profiler I usually use for open source work.

alalek · 2022-11-24T11:18:05Z

@vrabaud Feel free to prepare PR (which is validated on your side).

vrabaud · 2022-11-24T12:40:33Z

I'll get back to you with a pull request once I can validate the gain. For now, I am testing CV_PAUSE as follows:

// The delay fits the old behavior where __mm_pause took 5 cycles.
#   define CV_PAUSE(v) do { const uint64_t __delay = 5 * v; uint64_t __init = __rdtsc(); do { _mm_pause(); } while ((__rdtsc() - __init) < __delay); } while (0)

This is fixing opencv#22852

vrabaud added the bug label Nov 23, 2022

vrabaud mentioned this issue Dec 15, 2022

Fix slower CV_PAUSE on SkyLake and above. #22966

Merged

4 tasks

vrabaud added a commit to vrabaud/opencv that referenced this issue Dec 15, 2022

Fix slower CV_PAUSE on SkyLake and above.

b36fdf3

This is fixing opencv#22852

vrabaud added a commit to vrabaud/opencv that referenced this issue Dec 15, 2022

Fix slower CV_PAUSE on SkyLake and above.

b7b08fa

This is fixing opencv#22852

vrabaud closed this as completed Dec 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

higher CV_PAUSE cost on skylake #22852

higher CV_PAUSE cost on skylake #22852

vrabaud commented Nov 23, 2022

kallaballa commented Nov 23, 2022

Uh oh!

kallaballa commented Nov 23, 2022

Uh oh!

vrabaud commented Nov 23, 2022

Uh oh!

kallaballa commented Nov 23, 2022 •

edited

Loading

Uh oh!

kallaballa commented Nov 24, 2022

Uh oh!

vrabaud commented Nov 24, 2022

Uh oh!

alalek commented Nov 24, 2022

Uh oh!

vrabaud commented Nov 24, 2022 •

edited

Loading

Uh oh!

Uh oh!

higher CV_PAUSE cost on skylake #22852

higher CV_PAUSE cost on skylake #22852

Comments

vrabaud commented Nov 23, 2022

System Information

Detailed description

Steps to reproduce

Issue submission checklist

kallaballa commented Nov 23, 2022

Uh oh!

kallaballa commented Nov 23, 2022

Uh oh!

vrabaud commented Nov 23, 2022

Uh oh!

kallaballa commented Nov 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kallaballa commented Nov 24, 2022

Uh oh!

vrabaud commented Nov 24, 2022

Uh oh!

alalek commented Nov 24, 2022

Uh oh!

vrabaud commented Nov 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kallaballa commented Nov 23, 2022 •

edited

Loading

vrabaud commented Nov 24, 2022 •

edited

Loading