[PATCH] D134354: [AMDGPU][GlobalISel] Support mad/fma_mix selection

Fri Sep 23 06:05:25 PDT 2022

arsenm added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/mad-mix.ll:1-4
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -show-mc-encoding < %s | FileCheck -enable-var-scope -check-prefixes=GCN,GFX900,GFX9 %s
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx906 -verify-machineinstrs -show-mc-encoding < %s | FileCheck -enable-var-scope -check-prefixes=GCN,GFX906,GFX9 %s
+; RUN: llc -global-isel -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s | FileCheck -enable-var-scope -check-prefixes=GCN,CIVI,VI %s
+; RUN: llc -global-isel -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s | FileCheck -enable-var-scope -check-prefixes=GCN,CIVI,CI %s
----------------
Pierre-vh wrote:
> Pierre-vh wrote:
> > arsenm wrote:
> > > Pierre-vh wrote:
> > > > arsenm wrote:
> > > > > This is a copy pasted version of the existing test. I'd assume they can share the same runlines in the same file
> > > > Some run lines are different unfortunately, see FIXMEs in the test
> > > I'd work to eliminate those differences. If it turns out to be difficult, I would probably switch to generated checks and have them share the same file
> > The remaining difference fall in the following categories:
> >  - isCanonicalized has different behaviour between GISel/DAG, so there's v_mac instead of v_mad in a few places since SIFoldOperand doesn't fold
> >    - there's also v_madak_f32 that's no longer present, I think it's the same cause but I haven't looked into it yet
> >  - op_sel is on the second v_mad_mix instead of the first, seems like a harmless difference due to how DAG/GISel work?
> >  - A G_LSHR + G_SHUFFLEVECTOR (1,0) pair isn't folded out. I think those operations negate each other, perhaps a combine should be added for that?
> >  - Some unfinished things like -[v0| not being picked up yet.
> > 
> > The last 2 definitely have to be fixed, I'll look into them ASAP, but are the first 2 important as well? I'm not sure of what to do with `isCanonicalized`, is there a place where I can find the list of operations that should go in there?
> For `-|v0|`, the gMIR looks like this:
> ```
> bb.1 (%ir-block.0):
>   liveins: $vgpr0, $vgpr1, $vgpr2
>   %0:_(s32) = COPY $vgpr0
>   %3:_(s32) = COPY $vgpr1
>   %1:_(s16) = G_TRUNC %3:_(s32)
>   %4:_(s32) = COPY $vgpr2
>   %2:_(s16) = G_TRUNC %4:_(s32)
>   %8:_(s16) = G_FCONSTANT half 0xH8000
>   %7:_(<2 x s16>) = G_BUILD_VECTOR %8:_(s16), %8:_(s16)
>   %5:_(<2 x s16>) = G_BITCAST %0:_(s32)
>   %6:_(<2 x s16>) = G_FABS %5:_
>   %18:_(<2 x s16>) = G_FNEG %6:_
>   %9:_(<2 x s16>) = G_FADD %7:_, %18:_
>   %19:_(s32) = G_BITCAST %9:_(<2 x s16>)
>   %20:_(s32) = G_CONSTANT i32 16
>   %21:_(s32) = G_LSHR %19:_, %20:_(s32)
>   %17:_(s16) = G_TRUNC %21:_(s32)
>   %12:_(s32) = G_FPEXT %17:_(s16)
>   %13:_(s32) = G_FPEXT %1:_(s16)
>   %14:_(s32) = G_FPEXT %2:_(s16)
>   %15:_(s32) = G_FMA %12:_, %13:_, %14:_
>   $vgpr0 = COPY %15:_(s32)
>   SI_RETURN implicit $vgpr0
> ```
> 
> Could we add a combine to fold `G_FADD (+-)0.0, x` into just `x`?
> If we add that and another one to fold ` G_LSHR + G_SHUFFLEVECTOR (1,0)`, it should address most of the remaining differences. 
fadd 0 can only be folded out if you don't care about signed zeros and don't care about canonicalizing (I think the existing DAG combine fails to consider this second point). Which test is this in? I don't see how this would have ever been able to fold into an fmax_mix operand.

We should have the shift + shuffle combine.

isCanonicalized is essentially a list of opcodes with floating point semantics.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134354/new/

https://reviews.llvm.org/D134354