Skip to content

FTZ and DAZ modes are incorrectly configured for convolution #21046

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
4 tasks done
YashasSamaga opened this issue Nov 12, 2021 · 1 comment
Closed
4 tasks done

FTZ and DAZ modes are incorrectly configured for convolution #21046

YashasSamaga opened this issue Nov 12, 2021 · 1 comment

Comments

@YashasSamaga
Copy link
Contributor

YashasSamaga commented Nov 12, 2021

System information (version)
  • OpenCV => 3.4.16/4.5.4
Detailed description

Issue #17259 reported that different weights for the same model architecture showed drastically different performance characteristics. The issue was resolved by enabling FTZ and DAZ for convolution in #17295. The proposed fix is incomplete and incorrect.

Related comment: #17295 (comment)

Steps to reproduce
#include <opencv2/core.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/highgui.hpp>

#include <iostream>
#include <vector>
#include <thread>
#include <xmmintrin.h>

int main ()
{
    uint32_t ftzMode = _MM_GET_FLUSH_ZERO_MODE();
    uint32_t dazMode = _MM_GET_DENORMALS_ZERO_MODE();
    std::cout << "MainThread BEGIN" << std::this_thread::get_id() << " FTZ: " << (ftzMode == _MM_FLUSH_ZERO_ON) << ", DAZ: " << (dazMode == _MM_DENORMALS_ZERO_ON) << std::endl;

    auto net = cv::dnn::readNetFromDarknet("yolov4.cfg", "yolov4.weights");
    net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
    net.setPreferableTarget(cv::dnn::DNN_TARGET_OPENCL);

    auto frame = cv::imread("dog.jpg");
    auto blob = cv::dnn::blobFromImage(frame, 0.00392, cv::Size(608, 608), cv::Scalar(), true, false, CV_32F);
    net.setInput(blob);

    auto output_names = net.getUnconnectedOutLayersNames();

    std::vector<cv::Mat> detections;
    net.forward(detections, output_names);

    ftzMode = _MM_GET_FLUSH_ZERO_MODE();
    dazMode = _MM_GET_DENORMALS_ZERO_MODE();
    std::cout << "MainThread END" << std::this_thread::get_id() << "FTZ: " << (ftzMode == _MM_FLUSH_ZERO_ON) << ", DAZ: " << (dazMode == _MM_DENORMALS_ZERO_ON) << std::endl;

    return 0;
}

In addition, I have added the following under a lock in operator() of ParallelConv.

std::cout << "ConvThread " << std::this_thread::get_id() << "FTZ: " << (ftzMode == _MM_FLUSH_ZERO_ON) << ", DAZ: " << (dazMode == _MM_DENORMALS_ZERO_ON) << std::endl;

Problem I: FPU state only set in the calling thread

MainThread BEGIN 140152961351680 FTZ: 0, DAZ: 0
ConvThread 140152676220928 FTZ: 0, DAZ: 0
ConvThread 140152567492608 FTZ: 0, DAZ: 0
ConvThread 140152659435520 FTZ: 0, DAZ: 0
ConvThread 140152961351680 FTZ: 1, DAZ: 1
ConvThread 140152559099904 FTZ: 0, DAZ: 0
ConvThread 140152550707200 FTZ: 0, DAZ: 0
ConvThread 140152651042816 FTZ: 0, DAZ: 0
ConvThread 140152667828224 FTZ: 0, DAZ: 0
.
.
.
MainThread END 140152961351680 FTZ: 0, DAZ: 0

Problem 2: net.forward() does not preserve calling thread's FPU state

This issue arises when there is an early return (happens in OCL target) or an exception is thrown. The current mechanism manually manages the FPU state setting and resetting. An RAII-based solution is appropriate here.

I don't have OCL configured to test but I strongly suspect this is the case.

Issue submission checklist
  • I report the issue, it's not a question
  • I checked the problem with documentation, FAQ, open issues,
    forum.opencv.org, Stack Overflow, etc and have not found solution
  • I updated to latest OpenCV version and the issue is still there
  • There is reproducer code and related data files: videos, images, onnx, etc
@alalek
Copy link
Member

alalek commented Jan 23, 2022

Current status of FTZ/DAZ propagation on x86.

Linux:

  • main thread - disabled (until -ffast-math)
  • std::threads: newly created threads - disabled, propagation is not available (during creation)
  • TBB (up to 2020.3 interface 11103): newly created threads - disabled, not propagated, preserved
  • OpenMP: newly created threads - disabled, not propagated, preserved

Mac OSX:

  • GCD: main thread - disabled, propagation works from the caller thread ✔

Windows:

  • Concurrency: main thread - disabled, newly created threads - disabled, not propagated, preserved

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants