[PATCH] D142192: [AMDGPU] Run unmerge combines post regbankselect

Pierre van Houtryve via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 9 06:17:39 PST 2023


Pierre-vh marked an inline comment as done.
Pierre-vh added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:2050-2052
+  (V_BFI_B32_e64 (COPY_TO_REGCLASS VSrc_b32:$x, VGPR_32),
+                 (COPY_TO_REGCLASS VSrc_b32:$y, VGPR_32),
+                 (COPY_TO_REGCLASS VSrc_b32:$z, VGPR_32))
----------------
arsenm wrote:
> Pierre-vh wrote:
> > arsenm wrote:
> > > I think this needs to go off the a predicate. If we have to generate so many copies it's potentially worse than matching the pattern
> > Do you mean they should be on the matching part (`DivergentBinFrag<...`) and not on the output pattern?
> > For now this doesn't look like it really has cases where too many copies are emitted, I think SIFoldOperand will fold most of them.
> > 
> > I don't remember how we want to handle this in GISel. Do we want to be aware of constant-bus limitations at matching time, or ignore it/always respect it by greedily inserting copies and cleaning it up later (with SIFoldOperands and an eventual improved version)
> > To me the second strategy looks simpler and just as powerful but I don't quite remember if we made a decision there
> > 
> > FWIW, I was thinking about adding a "Finalizer" method to GISel Pattern Matching to allow targets to call some C++ code before the pattern is applied/instructions are built so it can "veto" the pattern if needed and fail if it's non-profitable for instance. I didn't make an RFC yet but if it sounds like something that would be useful I can make one in the coming days/week.
> It's partially an open question for how to handle the constant bus problem. The current strategy is supposed to be let regbankselect aggressively emit copies to VGPR up front so it's impossible to violate, which SIFoldOperands can clean up.
> 
> In the case of patterns, I think it would be worse if we had to manually write all of them out in C++ to handle them in SIFoldOperands. Selection patterns should be applying logic to avoid violating it. The finalizer sounds like the same as the current arbitrary code predicates?
> 
So what I did here follows the current strategy, right? It aggressively copies to VGPRs and lets SIFoldOperands clean up. I suppose an alternative could be to add some PatFrag(s) with GISel code with a heuristic to prevent matching if more than X copies cannot be folded but it feels fragile (because we'd need an arbitrary limit). What do you think?

I also don't think there's a test case here where matching BFI leads to worse code due to too many copies. I think the odds of most of the copies being folded-out are pretty good. There's sometimes one left but it's better than having 3 or 4 instructions to do what BFI can do in one.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142192/new/

https://reviews.llvm.org/D142192



More information about the llvm-commits mailing list