-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Vectorisation: gcc errors for float vector times int scalar (or for double vector) #95
Comments
PS I forgot to mention, I tried gcc8.3, 9.2 and 10.1, all of them have the same issue |
I believe it is as simple as the error says : "conversion of scalar ‘int’ to vector ‘float4’ {aka ‘__vector(4) float’} involves truncation". The important thing being at the end : the problem is that an int does not fit in a float, you will lose digits as an int has 31 bits and the mantissa of the float only 23. On the other hand the mantissa of a double is 52 bits, so it fits. Nothing related to vectors I believe. Although it may be a warning in scalar ? |
Ah, and of course if you explicitly cast (like |
And last point : with a hardcoded value, the compiler can check whether this particular value fits, and if it does, no problem. While with a parameter it has to suppose it won't fit |
Very good, thanks a lot! 👍 I am still a bit surprised that this only shows up as an error for vector types however:
Anyway I found a simple solution to my problem here: in Madgraph I can simply replace the "int" by a "short". Here these are only helicities, mainly +1 or -1. Until we find a particle with spin +32769, that should do it... Debugging another unrelated issue, but I have a patch that should fix this, then I will close this. Thanks again Sebastien! |
Your weird points are surely explain by the automatic casting that happens behind the scene. I would have to check the references but basically float * int is probably resulting in 4 steps : converting float to double, converting int to double, computing product, converting back to float. Now take care, this would mean a slow down of a factor ~10 for the operation ! I've seen codes where you gain a factor 2 by fixing types properly to avoid conversions between floats and doubles. And I mean real life reconstruction code in LHCb, not toy examples ! For all these kind of things the key tool is godbolt. As an example, let's check I'm right on the float * int -> https://godbolt.org/z/73nq7c and I'm totally wrong ! :-D Ok, one would have to dig further ! |
…t multiplied by float vector) Results ok in no-simd float. Throughput 1.84E5 (lower than no-simd double?!) ./check.exe -p 16384 32 1 *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = FLOAT (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[1] == AOS Momenta memory layout = AOSOA[1] == AOS Internal loops fptype_sv = SCALAR (no SIMD) Random number generation = CURAND (C++ code) ----------------------------------------------------------------------- NumberOfEntries = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 2.973374e+00 ) sec TotalTime[Rambo+ME] (23)= ( 2.945622e+00 ) sec TotalTime[RndNumGen] (1)= ( 2.775213e-02 ) sec TotalTime[Rambo] (2)= ( 1.001029e-01 ) sec TotalTime[MatrixElems] (3)= ( 2.845519e+00 ) sec MeanTimeInMatrixElems = ( 2.845519e+00 ) sec [Min,Max]TimeInMatrixElems = [ 2.845519e+00 , 2.845519e+00 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 EvtsPerSec[Rnd+Rmb+ME](123)= ( 1.763276e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 1.779889e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 1.842504e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.372328e-02 +- 1.131740e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 5.445383e-03 , 7.884562e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.194670e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) ***********************************************************************
Thanks a lot Sebastien! Do not worry too much about my question. It's just peculiar that gcc does two different things in places where I would expect it to do the same thing, that's all. Anyway, the issue is fixed, I moved to short int for helicities and I checked that all behave ok for floats also with simd (I had to fix a few more issues down the line): valassi@327b5c3 Closing this one, thanks again Sebastien! |
@valassi Now, if you are playing with vector types, things might go wrong in only a single lane of the vector register. What should be done? Send signals? What if you stored all 4 values already in memory, and entry 1 generates an exception? Are entries 2 and 3 valid or should they be wiped away? (A regular for-loop would not have computed them) That is to say:
So Sebasti |
Hi @hageboeck thanks, useful to know. But I think implementing some of the things you mention would be too complicated and unnecessary. Anyway, using short int (which I am quite sure is a good type for helicities) resolves the issues (without any hacks, it les gcc do what it finds right). So, issue solved :-) |
For the record, here's a very useful table (by user "Z boson": thanks!): https://stackoverflow.com/a/43778723 This may be extremely useful to port this implementation beyond gcc to clang, icc or other compilers. It seems that clang/icc do not support most scalar-operator-vector operations, so some extra boilerplate additions is probably needed to port there (or we could use the LHCb SIMD wrapper described by @sponce). |
I explained why gcc complains. I actually didn't Edit: mean to propose to implement anything. I'm all for writing a nice simple for loop and letting the compiler do the rest. Too much of those vector extensions looks like a step in the opposite direction of portability. |
This is a spinoff of #71
I just tried to switch to float and I get some errors like
I made a simple reproducer. This fails with
where
This only happens with floats, not doubles, and only with ints that I pass from a function (not hardcoded).
Some links mention this, but I do not see a solution.
I am not sure about
@sponce and @hageboeck do you have an idea?
Thanks Andrea
The text was updated successfully, but these errors were encountered: