SystemZ Backend: Add support for operations such as FP16_TO_FP and FP_TO_FP16 #50374

kun-lu20 · 2021-07-08T17:03:09Z


Bugzilla Link	51030
Version	unspecified
OS	Linux
CC	@River707,@ftynse

Extended Description

Hi,

Recently we're running test suite of TensorFlow v2.5.0 on s390x (Ubuntu 18.04).

Test case //tensorflow/compiler/tests:sort_ops_test_cpu fails due to the following error:

LLVM ERROR: Cannot select: 0x3ff14167ca0: f32 = fp16_to_fp 0x3ff14167f10
0x3ff14167f10: i32,ch = load<(dereferenceable load 2 from %ir.4, !alias.scope !6, !noalias !4), zext from i16> 0x3ff14197548, 0x3ff141678f8, undef:i64
0x3ff141678f8: i64,ch = load<(load 8 from %ir.3)> 0x3ff14197548, 0x3ff14167890, undef:i64
0x3ff14167890: i64 = add nuw 0x3ff141674e8, Constant:i64<8>
0x3ff141674e8: i64,ch = CopyFromReg 0x3ff14197548, Register:i64 %2
0x3ff14167480: i64 = Register %2
0x3ff14167828: i64 = Constant<8>
0x3ff14167758: i64 = undef
0x3ff14167758: i64 = undef
In function: compare_lt_WCTTAtafbb4__.7

Other test cases such as //tensorflow/python/keras/optimizer_v2:adam_test and //tensorflow/core/kernels/mlir_generated:abs_cpu_f16_f16_gen_test also fail on s390x due to similar reasons. A related issue (tensorflow/tensorflow#44362) has been raised in TensorFlow GitHub issues.

We think the root cause is lack of support for operations such as FP16_TO_FP and FP_TO_FP16 which perform promotions and truncation for half-precision (16 bit) floating numbers in the SystemZ LLVM backend (llvm/lib/Target/SystemZ/SystemZISelLowering.cpp). Could these features be considered to add to SystemZ LLVM backend? Thanks!

kun-lu20 · 2021-09-21T14:06:40Z

Any updates from the community reg this issue? Thanks!

joker-eph · 2021-09-21T16:27:16Z

Moving out of MLIR: this is a backend issue.

kun-lu20 · 2021-10-08T17:13:26Z

It looks like commit https://reviews.llvm.org/rG8cd8120a7b5d which has been tagged as 13.0.0-rc could solve this issue, since it adds support for arch14 and operations related to FP16 conversion to the SystemZ backend. Could anyone from community help to confirm this? Thanks!

kun-lu20 · 2022-04-07T13:35:13Z

This issue still persists on TensorFlow v2.8.0 which uses LLVM 15. Looks like specific half-precision (16 bit) operations are still missing in SystemZ LLVM backend.

Can anyone from community take a look at this issue? Thanks very much!

kun-lu20 · 2022-07-06T18:20:20Z

Recently we've run test cases under //tensorflow/core/kernels/mlir_generated category in TensorFlow v2.9.1 and found that this issue still exists.

Looks like FP16/F16 related operations are still unsupported in LLVM SystemZ backend for most Z cpu models, which causes these test cases (such as abs_cpu_f16_f16_gen_test and sqrt_cpu_f64_f64_gen_test) to fail when applyFullConversion() or applyPartialConversion() function is invoked. Although this commit has added FP16 support in the new arch14 (z16) model, it seems that arch14 still doesn't have full support for FP16 operations.

We also found that when building TensorFlow with options -c opt --copt=-O which sets optimization level to 1 and with JIT_Compilation enabled, these test cases would pass and the output .mlir files could be generated successfully.

We think this could be used as a workaround for now, but to address the root cause, FP16 related operations still need to be added to SystemZ backend.

Any thoughts or suggestions from the community reg this issue would be greatly appreciated. Thanks!

beetrees · 2024-06-13T20:25:56Z

The following function (compiler explorer):

define half @deref(ptr %p) {
  %x = load half, ptr %p
  ret half %x
}

currently fails to compile when compiling for s390x-unknown-linux-gnu with the following error:

LLVM ERROR: Cannot select: 0x89e9c70: f32,ch = load<(load (s16) from %ir.p), anyext from f16> 0x89a9bc8, 0x89e9c00, undef:i64
  0x89e9c00: i64,ch = CopyFromReg 0x89a9bc8, Register:i64 %0
    0x89e9b90: i64 = Register %0
  0x89e9ce0: i64 = undef
In function: deref
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /opt/compiler-explorer/clang-trunk/bin/llc -o /app/output.s -mtriple=s390x-unknown-linux-gnu <source>
1.	Running pass 'Function Pass Manager' on module '<source>'.
2.	Running pass 'SystemZ DAG->DAG Pattern Instruction Selection' on function '@deref'
 #0 0x00000000037197d8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-trunk/bin/llc+0x37197d8)
 #1 0x000000000371714c SignalHandler(int) Signals.cpp:0:0
 #2 0x00007baf41042520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #3 0x00007baf410969fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #4 0x00007baf41042476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #5 0x00007baf410287f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #6 0x000000000073359e llvm::UniqueStringSaver::save(llvm::StringRef) (.cold) StringSaver.cpp:0:0
 #7 0x00000000034dca44 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/opt/compiler-explorer/clang-trunk/bin/llc+0x34dca44)
 #8 0x00000000034e3e85 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/opt/compiler-explorer/clang-trunk/bin/llc+0x34e3e85)
 #9 0x00000000019ed9de (anonymous namespace)::SystemZDAGToDAGISel::Select(llvm::SDNode*) SystemZISelDAGToDAG.cpp:0:0
#10 0x00000000034d9f94 llvm::SelectionDAGISel::DoInstructionSelection() (/opt/compiler-explorer/clang-trunk/bin/llc+0x34d9f94)
#11 0x00000000034e92a1 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/opt/compiler-explorer/clang-trunk/bin/llc+0x34e92a1)
#12 0x00000000034ebed4 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x34ebed4)
#13 0x00000000034edd44 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x34edd44)
#14 0x00000000019efe0a (anonymous namespace)::SystemZDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) SystemZISelDAGToDAG.cpp:0:0
#15 0x00000000034dd861 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x34dd861)
#16 0x000000000282216b llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#17 0x0000000002d58b22 llvm::FPPassManager::runOnFunction(llvm::Function&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x2d58b22)
#18 0x0000000002d58ca1 llvm::FPPassManager::runOnModule(llvm::Module&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x2d58ca1)
#19 0x0000000002d5a950 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x2d5a950)
#20 0x000000000084df94 compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#21 0x0000000000745af6 main (/opt/compiler-explorer/clang-trunk/bin/llc+0x745af6)
#22 0x00007baf41029d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#23 0x00007baf41029e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#24 0x0000000000845bce _start (/opt/compiler-explorer/clang-trunk/bin/llc+0x845bce)
Program terminated with signal: SIGSEGV
Compiler returned: 139

Other operations involving half also fail with similar errors.

alexrp · 2024-06-28T23:46:56Z

FWIW, this is the only remaining blocker I'm aware of for Zig to be able to target s390x:

❯ zig cc s390x.c -target s390x-linux-musl
LLVM ERROR: Cannot select: 0x6d24170: i32 = fp_to_fp16 0x6d23a00
  0x6d23a00: f32,ch = CopyFromReg 0x5e82a10, Register:f32 %10
    0x6c03da0: f32 = Register %10
In function: __fixhfsi

On s390x, every use of the f16 data type will currently ICE due to llvm/llvm-project#50374, causing doctest failures on the platform. Most doctests were already restricted to certain platforms, so fix this by likewise restricting the remaining five.

…oss35 core: Limit remaining f16 doctests to x86_64 linux On s390x, every use of the f16 data type will currently ICE due to llvm/llvm-project#50374, causing doctest failures on the platform. Most doctests were already restricted to certain platforms, so fix this by likewise restricting the remaining five.

Rollup merge of rust-lang#127588 - uweigand:s390x-f16-doctests, r=tgross35 core: Limit remaining f16 doctests to x86_64 linux On s390x, every use of the f16 data type will currently ICE due to llvm/llvm-project#50374, causing doctest failures on the platform. Most doctests were already restricted to certain platforms, so fix this by likewise restricting the remaining five.

llvm/llvm-project#50374

JonPsson1 · 2024-10-07T06:55:11Z

Patch in progress here: #109164

llvm/llvm-project#50374

- _Float16 is now accepted by Clang. - The half IR type is fully handled by the backend. - These values are passed in FP registers and converted to/from float around each operation. - Compiler-rt conversion functions are now built for s390x including the missing extendhfdf2 which was added. Fixes llvm#50374

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021

EugeneZelenko added the backend:SystemZ label Jul 6, 2022

rposts mentioned this issue Apr 4, 2023

execution_profile_test_with_xla_hlo_profile_cpu fails on s390x tensorflow/tensorflow#45021

Closed

This was referenced Jun 13, 2024

Tracking Issue for f16 and f128 float types rust-lang/rust#116909

Open

Update compiler_builtins to 0.1.114 rust-lang/rust#125016

Merged

alexrp mentioned this issue Jun 28, 2024

std.Target: Use arch8 as the baseline CPU model for s390x. ziglang/zig#20451

Merged

uweigand mentioned this issue Jul 10, 2024

core: Limit remaining f16 doctests to x86_64 linux rust-lang/rust#127588

Merged

beetrees mentioned this issue Aug 2, 2024

Configure which platforms have f16 and f128 enabled by default rust-lang/compiler-builtins#652

Merged

alexrp added a commit to alexrp/zig that referenced this issue Aug 12, 2024

llvm: Disable lowering to f16 on s390x.

2fdfb07

llvm/llvm-project#50374

alexrp mentioned this issue Aug 12, 2024

llvm: Disable lowering to f16 on s390x. ziglang/zig#21045

Merged

andrewrk pushed a commit to ziglang/zig that referenced this issue Aug 12, 2024

llvm: Disable lowering to f16 on s390x.

82b0f44

llvm/llvm-project#50374

SammyJames pushed a commit to SammyJames/zig that referenced this issue Aug 13, 2024

llvm: Disable lowering to f16 on s390x.

7b7eb7f

llvm/llvm-project#50374

JonPsson1 self-assigned this Oct 7, 2024

richerfu pushed a commit to richerfu/zig that referenced this issue Oct 28, 2024

llvm: Disable lowering to f16 on s390x.

bba353c

llvm/llvm-project#50374

JonPsson1 mentioned this issue Nov 19, 2024

[SystemZ] Add support for half (fp16) #109164

Merged

JonPsson1 closed this as completed in #109164 Apr 16, 2025

JonPsson1 closed this as completed in 6d03f51 Apr 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SystemZ Backend: Add support for operations such as FP16_TO_FP and FP_TO_FP16 #50374

SystemZ Backend: Add support for operations such as FP16_TO_FP and FP_TO_FP16 #50374

kun-lu20 commented Jul 8, 2021

kun-lu20 commented Sep 21, 2021

joker-eph commented Sep 21, 2021

kun-lu20 commented Oct 8, 2021

kun-lu20 commented Apr 7, 2022

kun-lu20 commented Jul 6, 2022 •

edited

Loading

beetrees commented Jun 13, 2024 •

edited

Loading

alexrp commented Jun 28, 2024 •

edited

Loading

JonPsson1 commented Oct 7, 2024

SystemZ Backend: Add support for operations such as FP16_TO_FP and FP_TO_FP16 #50374

SystemZ Backend: Add support for operations such as FP16_TO_FP and FP_TO_FP16 #50374

Comments

kun-lu20 commented Jul 8, 2021

Extended Description

kun-lu20 commented Sep 21, 2021

joker-eph commented Sep 21, 2021

kun-lu20 commented Oct 8, 2021

kun-lu20 commented Apr 7, 2022

kun-lu20 commented Jul 6, 2022 • edited Loading

beetrees commented Jun 13, 2024 • edited Loading

alexrp commented Jun 28, 2024 • edited Loading

JonPsson1 commented Oct 7, 2024

kun-lu20 commented Jul 6, 2022 •

edited

Loading

beetrees commented Jun 13, 2024 •

edited

Loading

alexrp commented Jun 28, 2024 •

edited

Loading