16-bit floating-point support for C/C++ #65

jeffhammond · 2017-06-22T15:50:55Z

Problem

There is interest in supporting 16-bit floating point (henceforth FP16) in MPI.

See https://lists.mpi-forum.org/pipermail/mpiwg-p2p/2017-June/thread.html

Proposal

Add a type associated with FP16 that does not depend on the Fortran definition (MPI_REAL2).

See references. Various non-standard names for FP16 including __fp16 and short float. The candidate ISO name is _Float16. It may be prudent for MPI to add a type (along the lines of MPI_Count and MPI_Aint) since ISO C and C++ have not standardized names yet and they may not be identical; the typedef would be MPI_Float16, which it may be deprecated as soon as there is an ISO C/C++ name.

Changes to the Text

TODO

Impact on Implementations

The implementation of FP16 is straightforward, following whatever code exists for MPI_REAL2 today, or by copying code for FP32 with s/32/16/g.

A high-quality implementation may need to use special care when implementing reduction operators that can lose precision.

Impact on Users

FP16 support will be available independent of anything related to Fortran.

Users working on machine learning do not use Fortran anywhere (except perhaps indirectly in BLAS) and are not likely to be satisfied with MPI_REAL2, particularly since an implementation can omit support for it if a Fortran compiler is not present.

References

Half-precision floating-point format on Wikipedia.
ISO/IEC JTC 1/SC 22/WG14 N1945 (ISO C proposal)
ISO/IEC JTC1 SC22 WG14 N2017 (ISO C++ proposal)
GCC documentation for Half-Precision Floating Point and Additional Floating Types (e.g. _Float16)
Clang/LLVM _Float16 support for C/C++ commit
Intel® Half-Precision Floating-Point Format Conversion Instructions
Performance Benefits of Half Precision Floats

The text was updated successfully, but these errors were encountered:

ahori · 2017-11-07T06:19:31Z

Attached
FP16-Reading-1st.pdf
is my very first draft for the standard with FP16. Modified parts are colored with cyan. My points are;

MPI_HALF, MPI_C_HALF_COMPLEX, MPI_CXX_HALF_COMPLEX, MPI_HALF_INT (reduction), MPI2_HALF (reduction) are added (MPI_REAL2 and MPI_COMPLEX4 were already defined).
These new FP16 related data types are optional.

Any comments and/or suggestions are welcome.

dholmes-epcc-ed-ac-uk · 2017-11-07T08:49:02Z

Is the MPI_CXX_* type aimed at providing C++ bindings? That support has been removed from MPI and should not now be updated or extended.

MPI2_HALF looks like a typo for MPI_HALF2?

If we have MPI_C_HALF_COMPLEX (for C) do we need MPI_F_HALF_COMPLEX (for Fortran) and MPI_*_HALF_COMPLEX2 (for reductions, in both languages)?

How should a user program determine if an MPI library supports these types if they are optional? Should there be a mandatory compile-time constant like MPI_HAS_HALF_TYPES?

#ifdef MPI_HAS_HALF_TYPES
MPI_HALF myVar;
#else
MPI_FLOAT myVar;
#endif

kawashima-fj · 2017-11-07T09:47:42Z

@dholmes-epcc-ed-ac-uk My understanding is that the C++ binding is removed but using the C binding from C++ programs is still supported. C++ datatypes are required for such case. Actually MPI_CXX_BOOL was added in MPI-3.0 which removed the C++ binding.

bosilca · 2017-11-07T15:17:05Z

@dholmes-epcc-ed-ac-uk I assume MPI2_HALF was indeed a typo for the reduction type MPI_2HALF. So far optional types were just not in the headers, the standard assumed it was the developer responsibility to detect the lack of such types by whatever means they want/need.

@ahori what is the expected link between the newly added half precision type and MPI_REAL2 and MPI_COMPLEX4 ? Are modern compilers translating Fortran REAL*2 type into half precision ?

jeffhammond · 2017-11-07T16:33:30Z

@dholmes-epcc-ed-ac-uk

Is the MPI_CXX_* type aimed at providing C++ bindings? That support has been removed from MPI and should not now be updated or extended.

We deprecated the C++ functions. We still need C++ types, for example, because C99 _Complex T and C++ std::complex<T> are not the same type.

If we have MPI_C_HALF_COMPLEX (for C) do we need MPI_F_HALF_COMPLEX (for Fortran) and MPI_*_HALF_COMPLEX2 (for reductions, in both languages)?

MPI_COMPLEX2 is the Fortran complex type consisting of two 16-bit floating-point numbers.

How should a user program determine if an MPI library supports these types if they are optional?

The same way you determine whether the 14 optional types we already have in Table 13.2 are supported.

jeffhammond · 2017-11-07T16:34:09Z

@ahori We should use the ISO C candidate name _Float16 instead of half or floathalf, unless we want to define a placeholder to deal with the potential differences between C and C++. For that, I'd choose something like MPI_Float and provide the text explaining that it is a typedef to the IEEE half precision type supported by the compiler.

ahori · 2017-11-09T05:55:37Z

@dholmes-epcc-ed-ac-uk

MPI2_HALF looks like a typo for MPI_HALF2?

Correct.

If we have MPI_C_HALF_COMPLEX (for C) do we need MPI_F_HALF_COMPLEX (for Fortran) and MPI_*_HALF_COMPLEX2 (for reductions, in both languages)?

I just added FP16 types almost automatically without any intention. I have noticed that there is no explanation on MPI_C_* and MPI_CXX_* types found in the MPI-IO external 32 section. I checked MPI 2.2 and I found these were added since 2.2. Is there anybody who can explain what they are?

How should a user program determine if an MPI library supports these types if they are optional? Should there be a mandatory compile-time constant like MPI_HAS_HALF_TYPES?

No idea. But the same thing happens on the other optional types. (configure can detect :-)

@jeffhammond

We should use the ISO C candidate name _Float16 instead of half or floathalf, unless we want to define a placeholder to deal with the potential differences between C and C++. For that, I'd choose something like MPI_Float and provide the text explaining that it is a typedef to the IEEE half precision type supported by the compiler.

I have just remembered what Rolf suggested to use "short float" instead of "half." He also suggested to have MPI_FLOAT*_T types as below;

The traditional type names such as MPI_DOUBLE cannot be deprecated just for backward compatibility. Anyway I will update the text in a few days, before flying to Denver.

@bosilca

what is the expected link between the newly added half precision type and MPI_REAL2 and MPI_COMPLEX4 ? Are modern compilers translating Fortran REAL*2 type into half precision ?

I think/suppose/hope so.

ahori · 2017-11-09T08:10:50Z

@ahori

If we have MPI_C_HALF_COMPLEX (for C) do we need MPI_F_HALF_COMPLEX (for Fortran) and MPI_HALF_COMPLEX2 (for reductions, in both languages)?
I just added FP16 types almost automatically without any intention. I have noticed that there is no explanation on MPI_C and MPI_CXX_* types found in the MPI-IO external 32 section. I checked MPI 2.2 and I found these were added since 2.2. Is there anybody who can explain what they are?

Oh, I got the answer of this. There 2 ways of expressing complex numbers in C++, _Complex. and std::complex. MPI_CXX_*_COMPLEX means std::complex, whereas MPI_C_*_COMPLEX means _Complex. To distinguish these, we need both.

ahori · 2017-11-09T08:36:52Z

@kawashima-fj

My understanding is that the C++ binding is removed but using the C binding from C++ programs is still supported. C++ datatypes are required for such case. Actually MPI_CXX_BOOL was added in MPI-3.0 which removed the C++ binding.

Oh, why do we need C++ types ? What happens when a C++ object appears in the argument of MPI_Send or MPI_Recv ? I believe this is not allowed. Is there any situations where a C++ object can be the argument of an MPI function ?

kawashima-fj · 2017-11-10T02:18:48Z

@ahori

Accessing the C binding from C++ is allowed.

from MPI-3.1 p.5:

1.8 Who Should Use This Standard?
This standard is intended for use by all those who want to write portable message-passing
programs in Fortran and C (and access the C bindings from C++).

from MPI-3.1 p.36:

MPI requires support for inter-language communication, i.e., if messages are sent by a
C or C++ process and received by a Fortran process, or vice-versa. The behavior is defined
in Section 17.2.

And as you mentioned, C++ has std::complex<float> etc., which is not available in C and is not identical to C float _Complex. Therefore we need C++ specific datatypes to send/receive objects of such type. If short float _Complex is defined in C and std::complex<short float> is defined in C++ and they are not identical, we need MPI_C_SHORT_FLOAT_COMPLEX and MPI_CXX_SHORT_FLOAT_COMPLEX (or other names).

I don't mean sending/receiving C++ class object.

ahori · 2017-11-10T03:44:43Z

@kawashima-fj

According to here;

http://en.cppreference.com/w/cpp/numeric/complex (in the box titled "Non-static data members")

the layouts on complex numbers of C++ and C are (guaranteed to be?) the same.
So, here is my point;

If std::complex is implemented as a REAL object, then MPI_CXX_*_COMPLEX types are needless.
If std::complex is NOT a real object and its data layout is the same with C, then again MPI_CXX_*_COMPLEX types are needless.
(If std::complex is NOT a real object and its data layout is NOT the same with C, then who wins ?)

Further, I think the same thing happens on bool types.

Am I missing something?

kawashima-fj · 2017-11-10T04:07:15Z

@ahori Oh, sorry, I didn't know that the layouts on complex numbers of C++ and C are guaranteed to be the same. If so, my opinion is meaningless. But MPI_CXX_FLOAT_COMPLEX etc. was introduced in MPI-2.2 erratum and MPI-3.0. Does anyone know the reason?

jeffhammond · 2017-11-10T05:58:33Z

@kawashima-fj I had to dig through old email to remember the details, but I found them. @jsquyres captured the background in https://blogs.cisco.com/performance/the-mpi-c-bindings-are-gone-what-does-it-mean-to-you, which includes the following:

These users also brought to light a critical oversight in the existing C and Fortran bindings: MPI datatypes for some C++ types were missing from MPI-2.2. An MPI-3 proposal added several MPI datatypes (e.g., MPI_CXX_FLOAT_COMPLEX) to support these C++ basic datatypes. This proposal was passed, and is included in the final version of MPI-3.0.

This change was implemented in ticket 340.

jeffhammond · 2017-11-10T06:08:30Z

@ahori We need C++ complex types in MPI because C's _Complex was introduced in C99 and there are C++ is not a superset of C99, so it is not a clean solution to compel a C++ user to use MPI_C_*_COMPLEX, even if this might always work in practice. This is particularly important since one or more vendor compilers do not support C99, so we need to allow for MPI toolchains based upon C89 and C++11 compilers, for example.

kawashima-fj · 2017-11-10T06:31:46Z

@jeffhammond Thanks! I understand.

ahori · 2017-11-10T07:58:26Z

Here is the updated (half -> short) version

FP16-Reading-1st-v1.pdf.pdf

kawashima-fj · 2017-11-10T10:34:19Z

@ahori My comment against your PDF:

In p.179, MPI_C_SHORT_FLOAT_COMPLEX and MPI_CXX_SHORT_FLOAT_COMPLEX should be placed after the words "and if available:"

In p.182, we should update the following sentence ("nine").

MPI provides nine such predefined datatypes.

In p.544, MPI_C_SHORT_COMPLEX and MPI_CXX_SHORT_COMPLEX should be MPI_C_SHORT_FLOAT_COMPLEX and MPI_CXX_SHORT_FLOAT_COMPLEX respectively (FLOAT_ is missing).

In p.182, should we add MPI_2SHORT for Fortran? This is a question for MPI community and is not trivial.

If SHORT will be introduced in Fortran standard, probably we should.
If not, MPI_2SHORT seems odd. Using the name MPI_2REAL2 instead of MPI_2SHORT is still odd?
How much is it useful? The index part of MPI_2SHORT is also half-precision FP. Is it enough for typical usage of MPI_MAXLOC and MPI_MINLOC?

jeffhammond · 2017-11-10T18:53:25Z

@ahori I am concerned about MPI_2SHORT. The maximum value of float16 is 65504, which means that, unlike the other Fortran pair types, this one cannot index all of an array that might be passed to the associated reduction. What are the semantics in this case? Does the M**LOC operation only scan the first 65504 elements, or does it scan the entire array? If it scans the entire array, which happens to be larger than 65504 elements, and finds the location to be in excess of 65504, what does it return? Do we define this as an input error when count exceeds 65504 or do we only detect an error if location is in excess of 65504? No matter what, this feels like an ugly thing to do.

My preference would be to recognize that Fortran now supports the equivalent of structs and use those, at the expense of not supporting legacy Fortran usage. I've proposed a more general representation of pair types already (#18 (comment)) but it needs a ticket.

Update:

I created a ticket for this: #70.

ahori · 2017-11-13T20:43:42Z

@kawashima-fj
@jeffhammond

I agree with you and I will remove MPI_2SHORT.

ahori · 2017-11-22T08:46:47Z

@kawashima-fj
@jeffhammond

Here is the second version reflecting your comments.

FP16-Reading-1st-v2.pdf

kawashima-fj · 2017-12-04T10:16:40Z

I’m sorry for not replying sooner. I cannot attend the meeting this week but you will discuss this topic.

Why do we make MPI_SHORT_FLOAT an optional datatype?

(a) Because short float is not yet standardized in C?
(b) Because compilers for platforms which don't have 16-bit floating-point cannot support short float?

I believe the reason is only (a), because the current proposal in C and C++ WG ISO/IEC JTC 1/SC 22/WG 14 N2016, ISO/IEC JTC 1/SC 22/WG 21 P0192R1: Adding Fundamental Type for Short Float, has the following sentences.

We propose adding a new fundamental type, short float – for a float type of unspecified
(platform defined) bit size, shorter or equal to float. Language needs short float
to represent "shorter than float" math that may be natively available on the platform.

As of storage and bit-layout for a short float number, we would expect most implementations
to follow IEEE 754-2008 [11] half-precision floating point number format. On platform that
do not provide any advantages of using shorter float, short float may be implemented as
storage-only type, like __fp16 on gcc/ARM today. For example, it can be stored in ’binary16’
format in memory (occupying less bytes than float), converted to native 32-bit float on read
from memory, operated on using native 32-bit float math operations and converted back to
’binary16’ on store to memory. Or, the platform may choose to not take any advantages of
short float and represent it using float in both memory and registers.

Another document in C++ WG ISO/IEC JTC 1/SC 22/WG 21 P0303R0: Extensions to C++ for Short Float Type has similar wording regarding precision.

the type float provides at least as much precision as short float

This situation is same as long double, which is same as double on some platforms.

In the sense of (a), I think all MPI_FLOAT{16,32,64,128} also should be defined as optional datatypes in p.677 and p.678 in the current draft.

And in the future, MPI_SHORT_FLOAT (and its *_COMPLEX) will be changed to a mandatory datatype once short float is standardized in C and it is supported in many compilers.

Other editorial comments against the current draft:

In p.28, MPI_C_SHORT_FLOAT_COMPLEX should be placed before MPI_C_COMPLEX (to align the order with other places).
In p.28, added datatypes should be marked as optional.
In p.179, MPI_FLOAT* are missing in 'Floating point'.
In p.182, MPI_SHORT_FLOAT_INT should be marked as optional.
In p.544, MPI_CHARACTER, MPI_LOGICAL, and MPI_INTEGER should be reverted to the left side of the table (because the header of right side is Optional Type).
In p.544, MPI_SHORT_FLOAT and MPI_C_SHORT_FLOAT_COMPLEX should be removed from the left side of the table (because they are in the right side of the table already).
In p.544, MPI_FLOAT* should be moved to the right side of the table from the left side (because they are optinal types).
In p.544, MPI_C_SHORT_FLOAT should be removed from the right side of the table (because MPI_SHORT_FLOAT (without C_ is in the table already)).
In p.544, MPI_COMPLEX2 should not be added (because MPI_COMPLEX2 corresponds to COMPLEX*2 in Fortran, which consists of REAL*1 (real part) + REAL*1 (imaginary part)).
In p.632, MPI_COMPLEX2 should not be added.
In p.679, MPI_COMPLEX2 should not be added.
Reference for short float and float*_t should be added anywhere (at least in the final draft).

dholmes-epcc-ed-ac-uk added not ready wg-p2p Point-to-Point Working Group labels Jun 22, 2017

jeffhammond mentioned this issue Jun 22, 2017

define language-agnostic, IEEE types #66

Open

ahori self-assigned this Nov 9, 2017

jeffhammond mentioned this issue Nov 10, 2017

Fortran pair types #70

Open

ahori mentioned this issue Jan 18, 2018

Data type naming rule #74

Open

dholmes-epcc-ed-ac-uk self-assigned this Mar 1, 2018

This was referenced Jul 17, 2018

support fp16 horovod/horovod#278

Closed

Add support for float16 (half-precision floats) and related operations such as hgemm() flame/blis#234

Open

hzhou mentioned this issue Oct 22, 2018

Half-precision floating point type pmodels/mpich#3389

Closed

kawashima-fj mentioned this issue Dec 19, 2018

Add FP16 datatypes open-mpi/ompi#6205

Merged

wesbland removed the not ready label Jul 21, 2021

wesbland unassigned ahori Jul 7, 2022

wesbland added the mpi-5 For inclusion in the MPI 5.0 standard label Jun 14, 2023

github-project-automation bot added this to MPI 5.0 Jun 14, 2023

github-project-automation bot moved this to To Do in MPI 5.0 Jun 14, 2023

wesbland assigned Wee-Free-Scot and unassigned dholmes-epcc-ed-ac-uk Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

16-bit floating-point support for C/C++ #65

16-bit floating-point support for C/C++ #65

jeffhammond commented Jun 22, 2017 •

edited

Loading

ahori commented Nov 7, 2017

dholmes-epcc-ed-ac-uk commented Nov 7, 2017

kawashima-fj commented Nov 7, 2017

bosilca commented Nov 7, 2017

jeffhammond commented Nov 7, 2017 •

edited

Loading

jeffhammond commented Nov 7, 2017 •

edited

Loading

ahori commented Nov 9, 2017

ahori commented Nov 9, 2017

ahori commented Nov 9, 2017

kawashima-fj commented Nov 10, 2017

ahori commented Nov 10, 2017 •

edited

Loading

kawashima-fj commented Nov 10, 2017

jeffhammond commented Nov 10, 2017 •

edited

Loading

jeffhammond commented Nov 10, 2017

kawashima-fj commented Nov 10, 2017

ahori commented Nov 10, 2017

kawashima-fj commented Nov 10, 2017

jeffhammond commented Nov 10, 2017 •

edited

Loading

ahori commented Nov 13, 2017

ahori commented Nov 22, 2017

kawashima-fj commented Dec 4, 2017

16-bit floating-point support for C/C++ #65

16-bit floating-point support for C/C++ #65

Comments

jeffhammond commented Jun 22, 2017 • edited Loading

Problem

Proposal

Changes to the Text

Impact on Implementations

Impact on Users

References

ahori commented Nov 7, 2017

dholmes-epcc-ed-ac-uk commented Nov 7, 2017

kawashima-fj commented Nov 7, 2017

bosilca commented Nov 7, 2017

jeffhammond commented Nov 7, 2017 • edited Loading

jeffhammond commented Nov 7, 2017 • edited Loading

ahori commented Nov 9, 2017

ahori commented Nov 9, 2017

ahori commented Nov 9, 2017

kawashima-fj commented Nov 10, 2017

ahori commented Nov 10, 2017 • edited Loading

kawashima-fj commented Nov 10, 2017

jeffhammond commented Nov 10, 2017 • edited Loading

jeffhammond commented Nov 10, 2017

kawashima-fj commented Nov 10, 2017

ahori commented Nov 10, 2017

kawashima-fj commented Nov 10, 2017

jeffhammond commented Nov 10, 2017 • edited Loading

ahori commented Nov 13, 2017

ahori commented Nov 22, 2017

kawashima-fj commented Dec 4, 2017

jeffhammond commented Jun 22, 2017 •

edited

Loading

jeffhammond commented Nov 7, 2017 •

edited

Loading

jeffhammond commented Nov 7, 2017 •

edited

Loading

ahori commented Nov 10, 2017 •

edited

Loading

jeffhammond commented Nov 10, 2017 •

edited

Loading

jeffhammond commented Nov 10, 2017 •

edited

Loading