-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Add Loongson Advanced SIMD Extension support: -DCPU_BASELINE=LASX #21833
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
Could you point to cross-complier and may be some notes how to build the code. QEMU or some other emulator will be very useful too. Also I recommend you to take a look on cvRound implementation. It's used everywhere and efficient rounding affects performance a lot:
|
We are preparing such QEMU, but it is not finished yet. We'll provide it as soon as it's available. |
@@ -660,6 +662,10 @@ struct HWFeatures | |||
have[CV_CPU_RVV] = true; | |||
#endif | |||
|
|||
#if defined __loongarch_asx | |||
have[CV_CPU_LASX] = true; | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are best to enable these SIMD features by reading the CPUID like ARM and x86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are best to enable these SIMD features by reading the CPUID like ARM and x86.
Ok,thanks very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After I copied the first patch to my local opencv repository (Tag: 4.5.4), the Calib3d_StereoBM.regression test failed.
Note that the cmake option I used on 3A5000 platform is " -DBUILD_PNG=ON -D WITH_OPENCL=OFF -DBUILD_WITH_DEBUG_INFO=ON -DCPU_BASELINE=LASX ".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I‘m so sorry that I didn't get the message or emails. I'm very glad you have a 3a5000 environment as We are considering providing 3a5000 remote environment.
YES, the Calib3d_StereoBM.regression test is currently failed. We suspect that compiler optimization is the reason as DEBUG version is OK. We will follow up and fix the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gititgo Do you have any news on the Calib3d_StereoBM.regression
test failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't found the cause of the problem yet. Our colleagues are still working on it.
The cross-complier (build Loongarch on x86) and cmake config file is here: How to use it: |
Is QEMU necessary for this PR ? |
It'll be great to have QEMU to run tests. |
Is a 3A5000(Loongarch64) environment ok? Because QEMU may take a long time. |
How long is it going to take to run all unit tests in QEMU? It is recommanded to have a CI pipeline to test code automatically. Please at least provide something that we can perform tests. |
QEMU is under development, but I haven't got the exact time. We can provide a remote Loongarch environment. Can this be used for automated testing ? |
Yes, but we are also expecting an environment which we can use for testing if your remote Loongarch environment is expired to us. So it is recommanded to have QEMU to run tests on Loongarch. |
The provided git link (git clone https://gitee.com/wenux/cross-compiler-la-on-x86.git) is protected by username and password. |
Sorry,it‘s ok now. |
|
Also the best way to enable OpenCV cross-compilation for the new architecture is to place toolchain file to |
@gititgo Thanks a lot for the toolchain. I was able to build the source code. |
@gititgo, thank you for the contribution! Please, mark the proper items in the checklist:
without this confirmation we cannot merge your code into OpenCV |
OK,marked |
@fengyuentau We prepared a remote LoongArch PC for test. The IP and password have been emailed to you. |
I run accuracy tests (./bin/opencv_test_*) on your LoongArch PC and only 5 modules (core, flann, highgui, ml and videoio) did not fail at Segmentation fault. The CMake command I used is |
Please -DBUILD_PNG=ON as there may be a bug in libpng in our system. |
Rebuilt with CMake option |
Yes,there are two known issues: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! Known issues are related to 3rdparty components on LoongArch PC, so they can be solved outside this pull request. Please take a look at my comments.
Tested again, those known issues are still there:
Tests on other modules passed. |
|
The version of ffmpeg has just been updated on the remote env, videoio module passed now. |
Thanks a lot for update! Please take a look on "docs" builder. It reports a lot of formatting issues like "modules/core/include/opencv2/core/hal/intrin_lasx.hpp:1865: trailing whitespace." |
OK, formatting issues are fixed. |
issue 1: coreThe following test fails from time to time:
It fails most likely in the first run of a fresh complilation, and passes from the second run. issue 2: highgui & gtkAnd with the new ffmpeg, issues on videoio module were gone. However, it seems gtk is somehow misconfigured. At first, gtk was missing, but after installing gtk, the issue became:
I guess this is another software compatibility issue. |
[ FAILED ] Core/HAL.mat_decomp/15, where GetParam() = 15 - it's compute test. Most probably it's a sign of UB somewhere in code or compiler issue. |
[ RUN ] Highgui_GUI.regression - please check if X session is available and properly configured. Otherwise you need to build OpenCV without UI support. |
By using the following command to test on remote env, you can make the graphics display locally: |
@asmorkalov Do you mean undefined behaviour by UB?
With the env set by
Thanks! |
UB is undefined behavior, bug that triggers randomly depending on not initialized variable or stack content in case of out of bound access. |
This is a accuracy issue: The reason is that the compiler uses fmadd class instructions to optimize the "multiply + add" operation, but precision loss may be triggered in parallel. We disable this feature on loongarch paltform by using the compiler option "-ffp-contract = off" and then the test case passes. Now we add the compiler option "-ffp-contract = off" in opencv/modules/core/CMakeLists.txt. Is there a better place to add this option ? |
@gititgo Any updates regarding the issue on the core module? Can we simply adjust the threshold specifically for LASX? |
yes, tunning the test threshold is a good idea. If you have no objection, I will do this specifically for LASX. |
@fengyuentau, @gititgo, I'm fine with increasing tolerance threshold from 1e-10 to 5e-10, for example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All is working fine except known issues. Thank you👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This merge has been done without necessary references in commits message on this PR.
No PR ID at all. Existed maintenance scripts would not care about this merge. |
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.