[PATCH] D155741: AMDGPU: Implement new 2ulp fdiv lowering

Wed Jul 19 12:34:22 PDT 2023

arsenm created this revision.
arsenm added reviewers: AMDGPU, foad, b-sumner, Pierre-vh, rampitec.
Herald added subscribers: StephenFan, kerbowa, hiraditya, Anastasia, tpr, dstuttard, yaxunl, jvesely, kzhuravl.
Herald added a project: All.
arsenm requested review of this revision.
Herald added subscribers: wangpc, wdng.
Herald added a project: LLVM.

Extends the new frexp scaled reciprocal to the general case. The
reciprocal case is just the same thing when frexp of 1 is constant
folded. Could probably clean up the code to rely on that constant
folding.

Improves results for the IEEE path for the default OpenCL division. We
used to only emit the fdiv.fast intrinsic with a 2.5 ulp accuracy
threshold with DAZ, which uses explicit range checks. This gives us a
better fast option with the default IEEE behavior.

https://reviews.llvm.org/D155741

Files:
  llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
  llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f32.ll
  llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll
  llvm/test/CodeGen/AMDGPU/fdiv.ll
  llvm/test/CodeGen/AMDGPU/fdiv32-to-rcp-folding.ll
  llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D155741.542152.patch
Type: text/x-patch
Size: 357083 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230719/3a2606c4/attachment-0001.bin>