[PATCH] D134354: [AMDGPU][GlobalISel] Support mad/fma_mix selection

Fri Sep 23 00:13:23 PDT 2022

Pierre-vh added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/mad-mix.ll:1-4
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -show-mc-encoding < %s | FileCheck -enable-var-scope -check-prefixes=GCN,GFX900,GFX9 %s
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx906 -verify-machineinstrs -show-mc-encoding < %s | FileCheck -enable-var-scope -check-prefixes=GCN,GFX906,GFX9 %s
+; RUN: llc -global-isel -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s | FileCheck -enable-var-scope -check-prefixes=GCN,CIVI,VI %s
+; RUN: llc -global-isel -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s | FileCheck -enable-var-scope -check-prefixes=GCN,CIVI,CI %s
----------------
Pierre-vh wrote:
> arsenm wrote:
> > Pierre-vh wrote:
> > > arsenm wrote:
> > > > This is a copy pasted version of the existing test. I'd assume they can share the same runlines in the same file
> > > Some run lines are different unfortunately, see FIXMEs in the test
> > I'd work to eliminate those differences. If it turns out to be difficult, I would probably switch to generated checks and have them share the same file
> The remaining difference fall in the following categories:
>  - isCanonicalized has different behaviour between GISel/DAG, so there's v_mac instead of v_mad in a few places since SIFoldOperand doesn't fold
>    - there's also v_madak_f32 that's no longer present, I think it's the same cause but I haven't looked into it yet
>  - op_sel is on the second v_mad_mix instead of the first, seems like a harmless difference due to how DAG/GISel work?
>  - A G_LSHR + G_SHUFFLEVECTOR (1,0) pair isn't folded out. I think those operations negate each other, perhaps a combine should be added for that?
>  - Some unfinished things like -[v0| not being picked up yet.
> 
> The last 2 definitely have to be fixed, I'll look into them ASAP, but are the first 2 important as well? I'm not sure of what to do with `isCanonicalized`, is there a place where I can find the list of operations that should go in there?
For `-|v0|`, the gMIR looks like this:
```
bb.1 (%ir-block.0):
  liveins: $vgpr0, $vgpr1, $vgpr2
  %0:_(s32) = COPY $vgpr0
  %3:_(s32) = COPY $vgpr1
  %1:_(s16) = G_TRUNC %3:_(s32)
  %4:_(s32) = COPY $vgpr2
  %2:_(s16) = G_TRUNC %4:_(s32)
  %8:_(s16) = G_FCONSTANT half 0xH8000
  %7:_(<2 x s16>) = G_BUILD_VECTOR %8:_(s16), %8:_(s16)
  %5:_(<2 x s16>) = G_BITCAST %0:_(s32)
  %6:_(<2 x s16>) = G_FABS %5:_
  %18:_(<2 x s16>) = G_FNEG %6:_
  %9:_(<2 x s16>) = G_FADD %7:_, %18:_
  %19:_(s32) = G_BITCAST %9:_(<2 x s16>)
  %20:_(s32) = G_CONSTANT i32 16
  %21:_(s32) = G_LSHR %19:_, %20:_(s32)
  %17:_(s16) = G_TRUNC %21:_(s32)
  %12:_(s32) = G_FPEXT %17:_(s16)
  %13:_(s32) = G_FPEXT %1:_(s16)
  %14:_(s32) = G_FPEXT %2:_(s16)
  %15:_(s32) = G_FMA %12:_, %13:_, %14:_
  $vgpr0 = COPY %15:_(s32)
  SI_RETURN implicit $vgpr0
```

Could we add a combine to fold `G_FADD (+-)0.0, x` into just `x`?
If we add that and another one to fold ` G_LSHR + G_SHUFFLEVECTOR (1,0)`, it should address most of the remaining differences. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134354/new/

https://reviews.llvm.org/D134354