Skip to content

Add Loongson Advanced SIMD Extension support: -DCPU_BASELINE=LASX #21833

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 10 commits into from
Sep 10, 2022

Conversation

gititgo
Copy link
Contributor

@gititgo gititgo commented Apr 7, 2022

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov
Copy link
Contributor

Could you point to cross-complier and may be some notes how to build the code. QEMU or some other emulator will be very useful too. Also I recommend you to take a look on cvRound implementation. It's used everywhere and efficient rounding affects performance a lot:

@gititgo
Copy link
Contributor Author

gititgo commented Apr 8, 2022

Could you point to cross-complier and may be some notes how to build the code. QEMU or some other emulator will be very useful too.

We are preparing such QEMU, but it is not finished yet. We'll provide it as soon as it's available.

@@ -660,6 +662,10 @@ struct HWFeatures
have[CV_CPU_RVV] = true;
#endif

#if defined __loongarch_asx
have[CV_CPU_LASX] = true;
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are best to enable these SIMD features by reading the CPUID like ARM and x86.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are best to enable these SIMD features by reading the CPUID like ARM and x86.

Ok,thanks very much.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After I copied the first patch to my local opencv repository (Tag: 4.5.4), the Calib3d_StereoBM.regression test failed.
Note that the cmake option I used on 3A5000 platform is " -DBUILD_PNG=ON -D WITH_OPENCL=OFF -DBUILD_WITH_DEBUG_INFO=ON -DCPU_BASELINE=LASX ".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I‘m so sorry that I didn't get the message or emails. I'm very glad you have a 3a5000 environment as We are considering providing 3a5000 remote environment.

YES, the Calib3d_StereoBM.regression test is currently failed. We suspect that compiler optimization is the reason as DEBUG version is OK. We will follow up and fix the bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gititgo Do you have any news on the Calib3d_StereoBM.regression test failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't found the cause of the problem yet. Our colleagues are still working on it.

@gititgo
Copy link
Contributor Author

gititgo commented Apr 29, 2022

The cross-complier (build Loongarch on x86) and cmake config file is here:
git clone https://gitee.com/wenux/cross-compiler-la-on-x86.git

How to use it:
(1)tar -xvf toolchain-loongarch64-linux-gnu-cross-830-rc1.0-2022-04-22a.tar.xz
(2)Set "tools" in cmake config file (la64_linux_setup.cmake) to your real path;
(3)cmake with config file:
cmake -DCMAKE_TOOLCHAIN_FILE=path/to/your/la64_linux_setup.cmake -DCPU_BASELINE=LASX -DBUILD_OPENJPEG=ON ../
(4)make

@gititgo
Copy link
Contributor Author

gititgo commented Apr 30, 2022

Is QEMU necessary for this PR ?

@asmorkalov
Copy link
Contributor

It'll be great to have QEMU to run tests.

@gititgo
Copy link
Contributor Author

gititgo commented May 18, 2022

It'll be great to have QEMU to run tests.

Is a 3A5000(Loongarch64) environment ok? Because QEMU may take a long time.

@fengyuentau
Copy link
Member

How long is it going to take to run all unit tests in QEMU? It is recommanded to have a CI pipeline to test code automatically. Please at least provide something that we can perform tests.

@gititgo
Copy link
Contributor Author

gititgo commented May 19, 2022

How long is it going to take to run all unit tests in QEMU? It is recommanded to have a CI pipeline to test code automatically. Please at least provide something that we can perform tests.

QEMU is under development, but I haven't got the exact time. We can provide a remote Loongarch environment. Can this be used for automated testing ?

@fengyuentau
Copy link
Member

Yes, but we are also expecting an environment which we can use for testing if your remote Loongarch environment is expired to us. So it is recommanded to have QEMU to run tests on Loongarch.

@asmorkalov
Copy link
Contributor

The provided git link (git clone https://gitee.com/wenux/cross-compiler-la-on-x86.git) is protected by username and password.

@gititgo
Copy link
Contributor Author

gititgo commented May 23, 2022

The provided git link (git clone https://gitee.com/wenux/cross-compiler-la-on-x86.git) is protected by username and password.

Sorry,it‘s ok now.

@asmorkalov
Copy link
Contributor

opencv/modules/core/src/parallel_impl.cpp:63:5: warning: #warning "Can't detect 'pause' (CPU-yield) instruction on the target platform. Specify CV_PAUSE() definition via compiler flags." [-Wcpp]
 #   warning "Can't detect 'pause' (CPU-yield) instruction on the target platform. Specify CV_PAUSE() definition via compiler flags."

@asmorkalov
Copy link
Contributor

Also the best way to enable OpenCV cross-compilation for the new architecture is to place toolchain file to opencv/platforms/linux/loongson. I propose to replace cmake local variable tools with CACHE variable with meaningful name to set it from command line. See opencv/platforms/linux/riscv64-clang.toolchain.cmake as example.

@asmorkalov
Copy link
Contributor

@gititgo Thanks a lot for the toolchain. I was able to build the source code.

@vpisarev
Copy link
Contributor

vpisarev commented May 27, 2022

@gititgo, thank you for the contribution! Please, mark the proper items in the checklist:

  * I agree to contribute to the project under Apache 2 License.
  * To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV

without this confirmation we cannot merge your code into OpenCV

@gititgo
Copy link
Contributor Author

gititgo commented May 27, 2022

without this confirmation we cannot merge your code into OpenCV

OK,marked

@gititgo
Copy link
Contributor Author

gititgo commented May 30, 2022

@fengyuentau We prepared a remote LoongArch PC for test. The IP and password have been emailed to you.

@fengyuentau
Copy link
Member

I run accuracy tests (./bin/opencv_test_*) on your LoongArch PC and only 5 modules (core, flann, highgui, ml and videoio) did not fail at Segmentation fault. The CMake command I used is cmake -B build -D CPU_BASELINE=LASX opencv. Did I miss something?

@gititgo
Copy link
Contributor Author

gititgo commented Jun 9, 2022

Please -DBUILD_PNG=ON as there may be a bug in libpng in our system.

@fengyuentau
Copy link
Member

Rebuilt with CMake option -DBUILD_PNG=ON and no more segmentation faults. But Calib3d_StereoBM.regression failed as discussed above.

@gititgo
Copy link
Contributor Author

gititgo commented Jun 9, 2022

Yes,there are two known issues:
1、Calib3d_StereoBM.regression: may be the compiler optimization issue as DEBUG version is OK.
2、videoio/videocapture_acceleration.read(ffmpeg): ffmpeg has a bug in our system (optimized on loongarch) as it‘s OK when we use Open source ffmpeg.

Copy link
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Known issues are related to 3rdparty components on LoongArch PC, so they can be solved outside this pull request. Please take a look at my comments.

@fengyuentau
Copy link
Member

Yes,there are two known issues: 1、Calib3d_StereoBM.regression: may be the compiler optimization issue as DEBUG version is OK. 2、videoio/videocapture_acceleration.read(ffmpeg): ffmpeg has a bug in our system (optimized on loongarch) as it‘s OK when we use Open source ffmpeg.

Tested again, those known issues are still there:

# calib
[  FAILED  ] Calib3d_StereoBM.regression
# videoio
[  FAILED  ] videoio_ffmpeg.parallel
[  FAILED  ] videoio/videocapture_acceleration.read/32, where GetParam() = (sample_322x242_15frames.yuv420p.mpeg2video.mp4, FFMPEG, NONE, false)
[  FAILED  ] videoio/videocapture_acceleration.read/33, where GetParam() = (sample_322x242_15frames.yuv420p.mpeg2video.mp4, FFMPEG, NONE, true)
[  FAILED  ] videoio/videocapture_acceleration.read/34, where GetParam() = (sample_322x242_15frames.yuv420p.mpeg2video.mp4, FFMPEG, ANY, false)
[  FAILED  ] videoio/videocapture_acceleration.read/35, where GetParam() = (sample_322x242_15frames.yuv420p.mpeg2video.mp4, FFMPEG, ANY, true)
[  FAILED  ] videoio/videocapture_acceleration.read/64, where GetParam() = (sample_322x242_15frames.yuv420p.libx265.mp4, FFMPEG, NONE, false)
[  FAILED  ] videoio/videocapture_acceleration.read/65, where GetParam() = (sample_322x242_15frames.yuv420p.libx265.mp4, FFMPEG, NONE, true)
[  FAILED  ] videoio/videocapture_acceleration.read/66, where GetParam() = (sample_322x242_15frames.yuv420p.libx265.mp4, FFMPEG, ANY, false)
[  FAILED  ] videoio/videocapture_acceleration.read/67, where GetParam() = (sample_322x242_15frames.yuv420p.libx265.mp4, FFMPEG, ANY, true)
[  FAILED  ] videoio/videocapture_acceleration.read/80, where GetParam() = (sample_322x242_15frames.yuv420p.libvpx-vp9.mp4, FFMPEG, NONE, false)
[  FAILED  ] videoio/videocapture_acceleration.read/81, where GetParam() = (sample_322x242_15frames.yuv420p.libvpx-vp9.mp4, FFMPEG, NONE, true)
[  FAILED  ] videoio/videocapture_acceleration.read/82, where GetParam() = (sample_322x242_15frames.yuv420p.libvpx-vp9.mp4, FFMPEG, ANY, false)
[  FAILED  ] videoio/videocapture_acceleration.read/83, where GetParam() = (sample_322x242_15frames.yuv420p.libvpx-vp9.mp4, FFMPEG, ANY, true)
[  FAILED  ] videoio/videocapture_acceleration.read/96, where GetParam() = (sample_322x242_15frames.yuv420p.libaom-av1.mp4, FFMPEG, NONE, false)
[  FAILED  ] videoio/videocapture_acceleration.read/97, where GetParam() = (sample_322x242_15frames.yuv420p.libaom-av1.mp4, FFMPEG, NONE, true)
[  FAILED  ] videoio/videocapture_acceleration.read/98, where GetParam() = (sample_322x242_15frames.yuv420p.libaom-av1.mp4, FFMPEG, ANY, false)
[  FAILED  ] videoio/videocapture_acceleration.read/99, where GetParam() = (sample_322x242_15frames.yuv420p.libaom-av1.mp4, FFMPEG, ANY, true)

Tests on other modules passed.

@gititgo
Copy link
Contributor Author

gititgo commented Aug 15, 2022

@gititgo gititgo closed this Aug 15, 2022
@gititgo gititgo reopened this Aug 15, 2022
@gititgo
Copy link
Contributor Author

gititgo commented Aug 15, 2022

Tested again, those known issues are still there

The version of ffmpeg has just been updated on the remote env, videoio module passed now.

@asmorkalov
Copy link
Contributor

Thanks a lot for update! Please take a look on "docs" builder. It reports a lot of formatting issues like "modules/core/include/opencv2/core/hal/intrin_lasx.hpp:1865: trailing whitespace."

@gititgo
Copy link
Contributor Author

gititgo commented Aug 15, 2022

OK, formatting issues are fixed.

@fengyuentau
Copy link
Member

issue 1: core

The following test fails from time to time:

# core
[  FAILED  ] Core/HAL.mat_decomp/15, where GetParam() = 15

It fails most likely in the first run of a fresh complilation, and passes from the second run.

issue 2: highgui & gtk

And with the new ffmpeg, issues on videoio module were gone. However, it seems gtk is somehow misconfigured. At first, gtk was missing, but after installing gtk, the issue became:

[----------] 3 tests from Highgui_GUI
[ RUN      ] Highgui_GUI.regression
Exception message: OpenCV(4.6.0-dev) /home/loongson/opencv-workspace/opencv-21833/modules/highgui/src/window_gtk.cpp:635: error: (-2:Unspecified error) Can't initialize GTK backend in function 'cvInitSystem'

/home/loongson/opencv-workspace/opencv-21833/modules/highgui/test/test_gui.cpp:72: Failure
Expected: namedWindow(window_name) doesn't throw an exception.
  Actual: it throws.
[  FAILED  ] Highgui_GUI.regression (2 ms)
[ RUN      ] Highgui_GUI.trackbar_unsafe
Exception message: OpenCV(4.6.0-dev) /home/loongson/opencv-workspace/opencv-21833/modules/highgui/src/window_gtk.cpp:652: error: (-2:Unspecified error) GTK backend is not available in function 'cvInitSystem'

/home/loongson/opencv-workspace/opencv-21833/modules/highgui/test/test_gui.cpp:153: Failure
Expected: namedWindow(window_name) doesn't throw an exception.
  Actual: it throws.
[  FAILED  ] Highgui_GUI.trackbar_unsafe (0 ms)
[ RUN      ] Highgui_GUI.trackbar
Exception message: OpenCV(4.6.0-dev) /home/loongson/opencv-workspace/opencv-21833/modules/highgui/src/window_gtk.cpp:652: error: (-2:Unspecified error) GTK backend is not available in function 'cvInitSystem'

/home/loongson/opencv-workspace/opencv-21833/modules/highgui/test/test_gui.cpp:192: Failure
Expected: namedWindow(window_name) doesn't throw an exception.
  Actual: it throws.
[  FAILED  ] Highgui_GUI.trackbar (0 ms)
[----------] 3 tests from Highgui_GUI (2 ms total)

I guess this is another software compatibility issue.

@asmorkalov
Copy link
Contributor

[ FAILED ] Core/HAL.mat_decomp/15, where GetParam() = 15 - it's compute test. Most probably it's a sign of UB somewhere in code or compiler issue.

@asmorkalov
Copy link
Contributor

[ RUN ] Highgui_GUI.regression - please check if X session is available and properly configured. Otherwise you need to build OpenCV without UI support.

@gititgo
Copy link
Contributor Author

gititgo commented Aug 18, 2022

[----------] 3 tests from Highgui_GUI

By using the following command to test on remote env, you can make the graphics display locally:
$ export DISPLAY=:0.0; ./bin/opencv_test_highgui

@fengyuentau
Copy link
Member

[ FAILED ] Core/HAL.mat_decomp/15, where GetParam() = 15 - it's compute test. Most probably it's a sign of UB somewhere in code or compiler issue.

@asmorkalov Do you mean undefined behaviour by UB?

[ RUN ] Highgui_GUI.regression - please check if X session is available and properly configured. Otherwise you need to build OpenCV without UI support.

With the env set by export DISPLAY=:0.0, highgui tests are now passed.

By using the following command to test on remote env, you can make the graphics display locally:
$ export DISPLAY=:0.0; ./bin/opencv_test_highgui

Thanks!

@asmorkalov
Copy link
Contributor

UB is undefined behavior, bug that triggers randomly depending on not initialized variable or stack content in case of out of bound access.

@gititgo
Copy link
Contributor Author

gititgo commented Aug 23, 2022

core

[ FAILED ] Core/HAL.mat_decomp/15, where GetParam() = 15

This is a accuracy issue:
/home/loongson/src/wenxue/opencv/modules/core/test/test_hal_core.cpp:205: Failure Expected: (cvtest::norm(x, x0, NORM_INF | NORM_RELATIVE)) <= (eps), actual: 1.08289e-10 vs 1e-10

The reason is that the compiler uses fmadd class instructions to optimize the "multiply + add" operation, but precision loss may be triggered in parallel. We disable this feature on loongarch paltform by using the compiler option "-ffp-contract = off" and then the test case passes.

Now we add the compiler option "-ffp-contract = off" in opencv/modules/core/CMakeLists.txt. Is there a better place to add this option ?

@asmorkalov
Copy link
Contributor

fmadd is important optimization. IMHO we can tune test threshold a bit, but not disable it. @alalek @vpisarev what do you think?

@fengyuentau
Copy link
Member

@gititgo Any updates regarding the issue on the core module? Can we simply adjust the threshold specifically for LASX?

@gititgo
Copy link
Contributor Author

gititgo commented Sep 2, 2022

yes, tunning the test threshold is a good idea. If you have no objection, I will do this specifically for LASX.

@vpisarev
Copy link
Contributor

vpisarev commented Sep 2, 2022

@fengyuentau, @gititgo, I'm fine with increasing tolerance threshold from 1e-10 to 5e-10, for example

Copy link
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All is working fine except known issues. Thank you👍

Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@asmorkalov asmorkalov merged commit 4154bd0 into opencv:4.x Sep 10, 2022
@alalek
Copy link
Member

alalek commented Sep 11, 2022

This merge has been done without necessary references in commits message on this PR.

Add Loongson Advanced SIMD Extension support: -DCPU_BASELINE=LASX
    
    * Add Loongson Advanced SIMD Extension support: -DCPU_BASELINE=LASX
    * Add resize.lasx.cpp for Loongson SIMD acceleration
    * Add imgwarp.lasx.cpp for Loongson SIMD acceleration
    * Add LASX acceleration support for dnn/conv
    * Add CV_PAUSE(v) for Loongarch
    * Set LASX by default on Loongarch64
    * LoongArch: tune test threshold for Core/HAL.mat_decomp/15
    
    Co-authored-by: shengwenxue <shengwenxue@loongson.cn>

No PR ID at all.

Existed maintenance scripts would not care about this merge.

@asmorkalov asmorkalov added platform: loongson Loonson CPU architecure and LASX simd and removed platform: other labels Oct 19, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants