[all-commits] [llvm/llvm-project] 88a239: [AMDGPU] Adopt new lowering sequence for `fdiv16` ...

Tue Oct 8 06:49:44 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 88a239d292da80f260788c0817a07cbc0a8ac758
      https://github.com/llvm/llvm-project/commit/88a239d292da80f260788c0817a07cbc0a8ac758
  Author: Shilei Tian <i at tianshilei.me>
  Date:   2024-10-08 (Tue, 08 Oct 2024)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
    M llvm/lib/Target/AMDGPU/SIISelLowering.cpp
    M llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f16.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fdiv.mir
    M llvm/test/CodeGen/AMDGPU/fdiv.f16.ll
    M llvm/test/CodeGen/AMDGPU/fold-int-pow2-with-fmul-or-fdiv.ll
    M llvm/test/CodeGen/AMDGPU/frem.ll

  Log Message:
  -----------
  [AMDGPU] Adopt new lowering sequence for `fdiv16` (#109295)

The current lowering of `fdiv16` can generate incorrectly rounded result
in some cases. The new sequence was provided by the HW team, as shown
below written in C++.

```
half fdiv(half a, half b) {
  float a32 = float(a);
  float b32 = float(b);
  float r32 = 1.0f / b32;
  float q32 = a32 * r32;
  float e32 = -b32 * q32 + a32;
  q32 = e32 * r32 + q32;
  e32 = -b32 * q32 + a32;
  float tmp = e32 * r32;
  uin32_t tmp32 = std::bit_cast<uint32_t>(tmp);
  tmp32 = tmp32 & 0xff800000;
  tmp = std::bit_cast<float>(tmp32);
  q32 = tmp + q32;
  half q16 = half(q32);
  q16 = div_fixup_f16(q16);
  return q16;
}
```

Fixes SWDEV-477608.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications