Vectorisation: gcc errors for float vector times int scalar (or for double vector) #95

valassi · 2020-12-08T15:26:23Z

This is a spinoff of #71

I just tried to switch to float and I get some errors like

CPPProcess.cc:164:32: error: conversion of scalar ‘int’ to vector ‘mgOnGpu::fptype_v’ {aka ‘__vector(16) float’} involves truncation
       fi_0 = cxmake( p, -pvec3 * nsf );
                         ~~~~~~~^~~~~

I made a simple reproducer. This fails with

g++ -O3 -march=core-avx2 local/truncate1.cc
local/truncate1.cc: In function ‘float4 timesN(int, const float4&)’:
local/truncate1.cc:29:57: error: conversion of scalar ‘int’ to vector ‘float4’ {aka ‘__vector(4) float’} involves truncation
 float4 timesN( const int n, const float4& v ){ return n * v; }
                                                       ~~^~~

where

cat local/truncate1.cc
#include <iostream>

typedef double double4 __attribute__ (( vector_size ( 4 * sizeof(double) ) ));
typedef float float4 __attribute__ (( vector_size ( 4 * sizeof(float) ) ));

inline std::ostream& operator<<( std::ostream& out, const double4& v )
{
  out << "{ " << v[0];
  for ( int i=1; i<4; i++ ) std::cout << ", " << v[i];
  out << " }";
  return out;
}

inline std::ostream& operator<<( std::ostream& out, const float4& v )
{
  out << "{ " << v[0];
  for ( int i=1; i<4; i++ ) std::cout << ", " << v[i];
  out << " }";
  return out;
}

double4 times2( const double4& v ){ return 2 * v; }

float4 times2( const float4& v ){ return 2 * v; }

double4 timesN( const int n, const double4& v ){ return n * v; }

// error: conversion of scalar ‘int’ to vector ‘float4’ {aka ‘__vector(4) float’} involves truncation
float4 timesN( const int n, const float4& v ){ return n * v; }
//float4 timesN( const int n, const float4& v ){ return (float)n * v; } // this is ok instead...

int main( int argc, char **argv )
{
  double4 f4 = {0.1, 1.1, 2.1, 3.1};
  float4 d4 = {4.1, 5.1, 6.1, 7.1};
  std::cout << d4 << std::endl;
  std::cout << f4 << std::endl;

  std::cout << std::endl;
  std::cout << times2(d4) << std::endl;
  std::cout << times2(f4) << std::endl;

  std::cout << std::endl;
  std::cout << 3*d4 << std::endl;
  std::cout << 3*f4 << std::endl;

  std::cout << std::endl;
  std::cout << timesN(3,d4) << std::endl;
  std::cout << timesN(3,f4) << std::endl;
}

This only happens with floats, not doubles, and only with ints that I pass from a function (not hardcoded).

Some links mention this, but I do not see a solution.

I am not sure about

what is actually causing this? should it not simply convert the scalar int to scalar float an dmultiply each element of the float vector?
is this a gcc bug?

@sponce and @hageboeck do you have an idea?
Thanks Andrea

The text was updated successfully, but these errors were encountered:

valassi · 2020-12-08T15:28:12Z

PS I forgot to mention, I tried gcc8.3, 9.2 and 10.1, all of them have the same issue

sponce · 2020-12-08T15:41:22Z

I believe it is as simple as the error says : "conversion of scalar ‘int’ to vector ‘float4’ {aka ‘__vector(4) float’} involves truncation". The important thing being at the end : the problem is that an int does not fit in a float, you will lose digits as an int has 31 bits and the mantissa of the float only 23. On the other hand the mantissa of a double is 52 bits, so it fits. Nothing related to vectors I believe. Although it may be a warning in scalar ?

sponce · 2020-12-08T15:42:45Z

Ah, and of course if you explicitly cast (like (float)n), you're telling the compiler you know what you are doing and in this case that you want to but precision, so it does it

sponce · 2020-12-08T15:43:51Z

And last point : with a hardcoded value, the compiler can check whether this particular value fits, and if it does, no problem. While with a parameter it has to suppose it won't fit

valassi · 2020-12-08T16:46:14Z

Very good, thanks a lot! 👍
I had not done my maths, ehm...

I am still a bit surprised that this only shows up as an error for vector types however:

if you multiply an int and a float, the int is converted/narrowed to float with no warning or error; if you multiply an int and a vector of floats, you get an error
similarly, I hav ethe impression, if you build a scalar float from an int, you get no warning; if you build a vector float from an int, you get a narrowing warning

Anyway I found a simple solution to my problem here: in Madgraph I can simply replace the "int" by a "short". Here these are only helicities, mainly +1 or -1. Until we find a particle with spin +32769, that should do it...

Debugging another unrelated issue, but I have a patch that should fix this, then I will close this. Thanks again Sebastien!

sponce · 2020-12-08T17:07:22Z

Your weird points are surely explain by the automatic casting that happens behind the scene. I would have to check the references but basically float * int is probably resulting in 4 steps : converting float to double, converting int to double, computing product, converting back to float. Now take care, this would mean a slow down of a factor ~10 for the operation ! I've seen codes where you gain a factor 2 by fixing types properly to avoid conversions between floats and doubles. And I mean real life reconstruction code in LHCb, not toy examples !

For all these kind of things the key tool is godbolt. As an example, let's check I'm right on the float * int -> https://godbolt.org/z/73nq7c and I'm totally wrong ! :-D Ok, one would have to dig further !

…t multiplied by float vector) Results ok in no-simd float. Throughput 1.84E5 (lower than no-simd double?!) ./check.exe -p 16384 32 1 *********************************************************************** NumBlocksPerGrid = 16384 NumThreadsPerBlock = 32 NumIterations = 1 ----------------------------------------------------------------------- FP precision = FLOAT (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[1] == AOS Momenta memory layout = AOSOA[1] == AOS Internal loops fptype_sv = SCALAR (no SIMD) Random number generation = CURAND (C++ code) ----------------------------------------------------------------------- NumberOfEntries = 1 TotalTime[Rnd+Rmb+ME] (123)= ( 2.973374e+00 ) sec TotalTime[Rambo+ME] (23)= ( 2.945622e+00 ) sec TotalTime[RndNumGen] (1)= ( 2.775213e-02 ) sec TotalTime[Rambo] (2)= ( 1.001029e-01 ) sec TotalTime[MatrixElems] (3)= ( 2.845519e+00 ) sec MeanTimeInMatrixElems = ( 2.845519e+00 ) sec [Min,Max]TimeInMatrixElems = [ 2.845519e+00 , 2.845519e+00 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 524288 EvtsPerSec[Rnd+Rmb+ME](123)= ( 1.763276e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 1.779889e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 1.842504e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 524288 MeanMatrixElemValue = ( 1.372328e-02 +- 1.131740e-05 ) GeV^0 [Min,Max]MatrixElemValue = [ 5.445383e-03 , 7.884562e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.194670e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) ***********************************************************************

valassi · 2020-12-08T17:19:50Z

Thanks a lot Sebastien! Do not worry too much about my question. It's just peculiar that gcc does two different things in places where I would expect it to do the same thing, that's all.

Anyway, the issue is fixed, I moved to short int for helicities and I checked that all behave ok for floats also with simd (I had to fix a few more issues down the line): valassi@327b5c3

Closing this one, thanks again Sebastien!
Andrea

hageboeck · 2020-12-08T17:47:59Z

Very good, thanks a lot! 👍
I had not done my maths, ehm...

I am still a bit surprised that this only shows up as an error for vector types however:

if you multiply an int and a float, the int is converted/narrowed to float with no warning or error; if you multiply an int and a vector of floats, you get an error

similarly, I hav ethe impression, if you build a scalar float from an int, you get no warning; if you build a vector float from an int, you get a narrowing warning

Anyway I found a simple solution to my problem here: in Madgraph I can simply replace the "int" by a "short". Here these are only helicities, mainly +1 or -1. Until we find a particle with spin +32769, that should do it...

Debugging another unrelated issue, but I have a patch that should fix this, then I will close this. Thanks again Sebastien!

@valassi
That's because you have ways to do error handling with scalar floating-point arithmetic. You can set traps that check for overflows nans, infinities etc. You can make the library throw floating-point exceptions, you can make it generate the SIGFPE signal and install signal handlers. Many C functions have the side-effect of setting the errno when something goes wrong. In short, you have lots of ways to "do stuff" if something goes wrong.

Now, if you are playing with vector types, things might go wrong in only a single lane of the vector register. What should be done? Send signals? What if you stored all 4 values already in memory, and entry 1 generates an exception? Are entries 2 and 3 valid or should they be wiped away? (A regular for-loop would not have computed them)

That is to say:
gcc is suuuuper conservative when you vectorise, because you might get different results as compared to the scalar implementation.

When doing auto vectorisation, it just doesn't vectorise the loop.
When doing explicit vectorisation, it doesn't let you shoot yourself in the foot.

So Sebastiaen is right that getting your types right can improve a lot of things.

valassi · 2020-12-08T19:44:32Z

Hi @hageboeck thanks, useful to know. But I think implementing some of the things you mention would be too complicated and unnecessary. Anyway, using short int (which I am quite sure is a good type for helicities) resolves the issues (without any hacks, it les gcc do what it finds right). So, issue solved :-)

valassi · 2020-12-09T11:10:24Z

For the record, here's a very useful table (by user "Z boson": thanks!): https://stackoverflow.com/a/43778723

This may be extremely useful to port this implementation beyond gcc to clang, icc or other compilers. It seems that clang/icc do not support most scalar-operator-vector operations, so some extra boilerplate additions is probably needed to port there (or we could use the LHCb SIMD wrapper described by @sponce).

hageboeck · 2020-12-09T13:35:31Z

I explained why gcc complains. I actually didn't Edit: mean to propose to implement anything.

I'm all for writing a nice simple for loop and letting the compiler do the rest. Too much of those vector extensions looks like a step in the opposite direction of portability.

valassi closed this as completed Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorisation: gcc errors for float vector times int scalar (or for double vector) #95

Vectorisation: gcc errors for float vector times int scalar (or for double vector) #95

valassi commented Dec 8, 2020

valassi commented Dec 8, 2020

sponce commented Dec 8, 2020 •

edited

Loading

sponce commented Dec 8, 2020

sponce commented Dec 8, 2020

valassi commented Dec 8, 2020

sponce commented Dec 8, 2020

valassi commented Dec 8, 2020

hageboeck commented Dec 8, 2020 •

edited

Loading

valassi commented Dec 8, 2020

valassi commented Dec 9, 2020

hageboeck commented Dec 9, 2020 •

edited

Loading

Vectorisation: gcc errors for float vector times int scalar (or for double vector) #95

Vectorisation: gcc errors for float vector times int scalar (or for double vector) #95

Comments

valassi commented Dec 8, 2020

valassi commented Dec 8, 2020

sponce commented Dec 8, 2020 • edited Loading

sponce commented Dec 8, 2020

sponce commented Dec 8, 2020

valassi commented Dec 8, 2020

sponce commented Dec 8, 2020

valassi commented Dec 8, 2020

hageboeck commented Dec 8, 2020 • edited Loading

valassi commented Dec 8, 2020

valassi commented Dec 9, 2020

hageboeck commented Dec 9, 2020 • edited Loading

sponce commented Dec 8, 2020 •

edited

Loading

hageboeck commented Dec 8, 2020 •

edited

Loading

hageboeck commented Dec 9, 2020 •

edited

Loading