[PATCH] D132483: [AMDGPU][GlobalISel] Improve BFI Pattern Matching

Wed Aug 24 01:41:16 PDT 2022

Pierre-vh added a comment.

The copy hoisting works for BFI, but it's not an ideal solution I think because for other cases (like the insertelement tests) it can worsen codegen. I tried to make it as local as possible but I couldn't get rid of all the cases where it's unprofitable to move the copy "out".

If copy hoisting is an acceptable solution and we want to move forward with it maybe it needs to be made smarter, e.g. look at the whole expression tree, try to compute how many copies could be inserted and where and choose the solution that inserts the least copies? Perhaps it could even move copies downwards, e.g. currently it can transform a tree of expressions from SGPR to VGPR, but maybe it could do the opposite as well if it introduces less copies?

Thoughts? Should I keep going with this approach and try to make it smarter and better, or give it up?
Ideally I'd really like to be able to just fix the tablegen but I haven't found a way to do it properly.

There's also another annoying case in one of the BFI tests where the RegBankSelect adds 2 identical copies.
It prevents BFI from being selected because %6/%7 aren't identical (despite referencing the same physical register).
This could be fixed by another combine (?) or we could maybe change GISe's `GIM_CheckIsSameOperand` l so it looks through copies of physregs to vregs?

  %6:vgpr(s32) = COPY %2:sgpr(s32)
  %3:vgpr(s32) = G_XOR %1:vgpr, %6:vgpr
  %4:vgpr(s32) = G_AND %0:vgpr, %3:vgpr
  %7:vgpr(s32) = COPY %2:sgpr(s32)
  %5:vgpr(s32) = G_XOR %7:vgpr, %4:vgpr

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132483/new/

https://reviews.llvm.org/D132483