-
Notifications
You must be signed in to change notification settings - Fork 13.5k
SystemZ Backend: Add support for operations such as FP16_TO_FP and FP_TO_FP16 #50374
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Any updates from the community reg this issue? Thanks! |
Moving out of MLIR: this is a backend issue. |
It looks like commit https://reviews.llvm.org/rG8cd8120a7b5d which has been tagged as 13.0.0-rc could solve this issue, since it adds support for arch14 and operations related to FP16 conversion to the SystemZ backend. Could anyone from community help to confirm this? Thanks! |
This issue still persists on TensorFlow v2.8.0 which uses LLVM 15. Looks like specific half-precision (16 bit) operations are still missing in SystemZ LLVM backend. Can anyone from community take a look at this issue? Thanks very much! |
Recently we've run test cases under Looks like We also found that when building TensorFlow with options We think this could be used as a workaround for now, but to address the root cause, Any thoughts or suggestions from the community reg this issue would be greatly appreciated. Thanks! |
The following function (compiler explorer): define half @deref(ptr %p) {
%x = load half, ptr %p
ret half %x
} currently fails to compile when compiling for
Other operations involving |
FWIW, this is the only remaining blocker I'm aware of for Zig to be able to target s390x: ❯ zig cc s390x.c -target s390x-linux-musl
LLVM ERROR: Cannot select: 0x6d24170: i32 = fp_to_fp16 0x6d23a00
0x6d23a00: f32,ch = CopyFromReg 0x5e82a10, Register:f32 %10
0x6c03da0: f32 = Register %10
In function: __fixhfsi |
On s390x, every use of the f16 data type will currently ICE due to llvm/llvm-project#50374, causing doctest failures on the platform. Most doctests were already restricted to certain platforms, so fix this by likewise restricting the remaining five.
…oss35 core: Limit remaining f16 doctests to x86_64 linux On s390x, every use of the f16 data type will currently ICE due to llvm/llvm-project#50374, causing doctest failures on the platform. Most doctests were already restricted to certain platforms, so fix this by likewise restricting the remaining five.
Rollup merge of rust-lang#127588 - uweigand:s390x-f16-doctests, r=tgross35 core: Limit remaining f16 doctests to x86_64 linux On s390x, every use of the f16 data type will currently ICE due to llvm/llvm-project#50374, causing doctest failures on the platform. Most doctests were already restricted to certain platforms, so fix this by likewise restricting the remaining five.
Patch in progress here: #109164 |
- _Float16 is now accepted by Clang. - The half IR type is fully handled by the backend. - These values are passed in FP registers and converted to/from float around each operation. - Compiler-rt conversion functions are now built for s390x including the missing extendhfdf2 which was added. Fixes llvm#50374
Extended Description
Hi,
Recently we're running test suite of TensorFlow v2.5.0 on s390x (Ubuntu 18.04).
Test case //tensorflow/compiler/tests:sort_ops_test_cpu fails due to the following error:
LLVM ERROR: Cannot select: 0x3ff14167ca0: f32 = fp16_to_fp 0x3ff14167f10
0x3ff14167f10: i32,ch = load<(dereferenceable load 2 from %ir.4, !alias.scope !6, !noalias !4), zext from i16> 0x3ff14197548, 0x3ff141678f8, undef:i64
0x3ff141678f8: i64,ch = load<(load 8 from %ir.3)> 0x3ff14197548, 0x3ff14167890, undef:i64
0x3ff14167890: i64 = add nuw 0x3ff141674e8, Constant:i64<8>
0x3ff141674e8: i64,ch = CopyFromReg 0x3ff14197548, Register:i64 %2
0x3ff14167480: i64 = Register %2
0x3ff14167828: i64 = Constant<8>
0x3ff14167758: i64 = undef
0x3ff14167758: i64 = undef
In function: compare_lt_WCTTAtafbb4__.7
Other test cases such as //tensorflow/python/keras/optimizer_v2:adam_test and //tensorflow/core/kernels/mlir_generated:abs_cpu_f16_f16_gen_test also fail on s390x due to similar reasons. A related issue (tensorflow/tensorflow#44362) has been raised in TensorFlow GitHub issues.
We think the root cause is lack of support for operations such as FP16_TO_FP and FP_TO_FP16 which perform promotions and truncation for half-precision (16 bit) floating numbers in the SystemZ LLVM backend (llvm/lib/Target/SystemZ/SystemZISelLowering.cpp). Could these features be considered to add to SystemZ LLVM backend? Thanks!
The text was updated successfully, but these errors were encountered: