[PATCH] D78732: AMDGPU: Fix non-flushing, pre-gfx9 implementation of fcanonicalize

Thu Apr 23 11:21:27 PDT 2020

arsenm created this revision.
arsenm added a reviewer: rampitec.
Herald added subscribers: kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl.
arsenm planned changes to this revision.
arsenm added a comment.

Just realized I broke this again

This fixes conformance failures when the library implementation of
fmin/fmax were accidentally not inlined, forcing the assumption of no
flushing on targets where denormals are not enabled by default.

If f32 denormals were enabled pre-gfx9, we would still try to
implement this with v_max_f32. Pre-gfx9, these instructions ignored
the denormal mode and did not flush. Switch to the multiply form,
which should always work in this case.

Now this will always use max to implement canonicalize on
gfx9+. Pre-gfx9, it will depend on the denormal mode and only use max
if flushing isn't enabled. We probably should only use max for f64 though.

For f32/f16 it's a neutral choice (and worse in terms of code size in
1 case for f16), but possibly worse for the compiler since it does add
an extra register use operand. Leave this change for later.

https://reviews.llvm.org/D78732

Files:
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/SIInstructions.td
  llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fcanonicalize.mir
  llvm/test/CodeGen/AMDGPU/amdgcn-ieee.ll
  llvm/test/CodeGen/AMDGPU/clamp.ll
  llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
  llvm/test/CodeGen/AMDGPU/fcanonicalize.f16.ll
  llvm/test/CodeGen/AMDGPU/fcanonicalize.ll
  llvm/test/CodeGen/AMDGPU/fminnum.f64.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D78732.259631.patch
Type: text/x-patch
Size: 60065 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200423/c3b8ff63/attachment-0001.bin>