[all-commits] [llvm/llvm-project] 89f36d: [X86] Add ExpandLargeFpConvert Pass and enable for...

Wed Nov 30 21:48:15 PST 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 89f36dd8f32f85bfaf19dfa2cbb12d6622727bfb
      https://github.com/llvm/llvm-project/commit/89f36dd8f32f85bfaf19dfa2cbb12d6622727bfb
  Author: Freddy Ye <freddy.ye at intel.com>
  Date:   2022-12-01 (Thu, 01 Dec 2022)

  Changed paths:
    M llvm/include/llvm/CodeGen/MachinePassRegistry.def
    M llvm/include/llvm/CodeGen/Passes.h
    M llvm/include/llvm/CodeGen/TargetLowering.h
    M llvm/include/llvm/InitializePasses.h
    M llvm/lib/CodeGen/CMakeLists.txt
    M llvm/lib/CodeGen/CodeGen.cpp
    A llvm/lib/CodeGen/ExpandLargeFpConvert.cpp
    M llvm/lib/CodeGen/TargetLoweringBase.cpp
    M llvm/lib/CodeGen/TargetPassConfig.cpp
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/AArch64/O0-pipeline.ll
    M llvm/test/CodeGen/AArch64/O3-pipeline.ll
    M llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
    M llvm/test/CodeGen/ARM/O3-pipeline.ll
    M llvm/test/CodeGen/M68k/pipeline.ll
    M llvm/test/CodeGen/PowerPC/O3-pipeline.ll
    M llvm/test/CodeGen/RISCV/O0-pipeline.ll
    M llvm/test/CodeGen/RISCV/O3-pipeline.ll
    M llvm/test/CodeGen/X86/O0-pipeline.ll
    A llvm/test/CodeGen/X86/expand-large-fp-convert-fptosi129.ll
    A llvm/test/CodeGen/X86/expand-large-fp-convert-fptoui129.ll
    A llvm/test/CodeGen/X86/expand-large-fp-convert-si129tofp.ll
    A llvm/test/CodeGen/X86/expand-large-fp-convert-ui129tofp.ll
    A llvm/test/CodeGen/X86/fp-i129.ll
    M llvm/test/CodeGen/X86/opt-pipeline.ll
    M llvm/tools/opt/opt.cpp

  Log Message:
  -----------
  [X86] Add ExpandLargeFpConvert Pass and enable for X86

As stated in
https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528,
this implementation is very similar to ExpandLargeDivRem, which expands
‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions
with a bitwidth above a threshold into auto-generated functions. This is
useful for targets like x86_64 that cannot lower fp convertions with more
than 128 bits. The expanded nodes are referring from the IR generated by
`compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`,
and etc.

Corner cases:
1. For fp16: as there is no related builtins added in compliler-rt. So I
mainly utilized the fp32 <-> fp16 lib calls to implement.
2. For fp80: as this pass is soft fp emulation and no fp80 instructions can
help in this problem. I recommend users to deprecate this usage. For now, the
implementation uses fp128 as the temporary conversion type and inserts
fptrunc/ext at top/end of the function.
3. For bf16: as clang FE currently doesn't support bf16 algorithm operations
(convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for
now.
4. For unsigned FPToI: since both default hardware behaviors and libgcc are
ignoring "returns 0 for negative input" spec. This pass follows this old way
to ignore unsigned FPToI. See this example:
https://gcc.godbolt.org/z/bnv3jqW1M

The end-to-end tests are uploaded at https://reviews.llvm.org/D138261

Reviewed By: LuoYuanke, mgehre-amd

Differential Revision: https://reviews.llvm.org/D137241