[PATCH] D153544: [AMDGPU] Use V_FMA_MIX* more often

Wed Jun 28 13:16:38 PDT 2023

arsenm added a comment.

FMA and MAD formation is kind of complicated and I'm still leaning towards you being better off doing this in a combine where all other fma formation is done

I'm assuming all these instructions internally perform the fma as f32?

================
Comment at: llvm/lib/Target/AMDGPU/VOP3PInstructions.td:185
+  def : GCNPat <
+    (f16 (fpround (any_fmul (f32 (VOP3PMadMixMods f32:$src0, i32:$src0_modifiers)),
+                            (f32 (VOP3PMadMixMods f32:$src1, i32:$src1_modifiers))))),
----------------
probably should be just fmul, this may not be the correct behavior with strictfp if the fmul were to raise an exception

================
Comment at: llvm/lib/Target/AMDGPU/VOP3PInstructions.td:187
+                            (f32 (VOP3PMadMixMods f32:$src1, i32:$src1_modifiers))))),
+    (mixlo_inst $src0_modifiers, $src0,
+                $src1_modifiers, $src1,
----------------
If this is using v_mad_mix (i.e >= gfx900 && < gfx906), you can't introduce v_mad* without checking if denormal flushing is enabled

================
Comment at: llvm/test/CodeGen/AMDGPU/fdiv.f16.ll:3
-; RUN: llc -march=amdgcn -mcpu=fiji -denormal-fp-math-f32=preserve-sign -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX8PLUS %s
-; RUN: llc -march=amdgcn -mcpu=fiji -denormal-fp-math-f32=preserve-sign -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX8PLUS %s
-; RUN: llc -march=amdgcn -mcpu=gfx900 -denormal-fp-math-f32=preserve-sign -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX8PLUS %s
----------------
matejam wrote:
> I guess this was an accident.
Conversion of tests to generated checks should be done separate from a functional change

================
Comment at: llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll:2121

+define { float } @mixlo_fptrunc(float %a, float %b, half %c) #0 {
+; GFX1100-LABEL: mixlo_fptrunc:
----------------
Why the struct return?

Should pre-commit the baseline tests for any new cases

================
Comment at: llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll:2177
+.entry:
+  %0 = fmul reassoc nnan nsz arcp contract afn float %a, %b
+  %1 = fptrunc float %0 to half
----------------
Use named values and drop the flags

================
Comment at: llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll:2179
+  %1 = fptrunc float %0 to half
+  %2 = fadd reassoc nnan nsz arcp contract afn half %1, %c
+  %3 = bitcast half %2 to i16
----------------
The base pattern doesn't include the add. I think tests including the add could be useful, but the base pattern tests should end at the fmul

================
Comment at: llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll:2180-2182
+  %3 = bitcast half %2 to i16
+  %4 = zext i16 %3 to i32
+  %5 = bitcast i32 %4 to float
----------------
Don't need all this casting noise to return a result

================
Comment at: llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll:2183
+  %5 = bitcast i32 %4 to float
+  %6 = insertvalue { float } undef, float %5, 0
+  ret { float } %6
----------------
don't bother with the struct wrapper?

================
Comment at: llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll:2186
+}
+
 declare half @llvm.minnum.f16(half, half) #1
----------------
Need tests with source modifiers and multiple uses, plus the permutations with the lo and hi cases. Also denormal flushing on and off

Also need all the tests for the other pattern

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153544/new/

https://reviews.llvm.org/D153544