[PATCH] D147786: [AMDGPU] Less aggressively break large PHIs

Tue Apr 11 01:25:15 PDT 2023

Pierre-vh added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1412-1431
+  // Check if this is a simple chain of insertelement that fills the vector. If
+  // that's the case, we can break up this PHI node profitably because the
+  // extractelement we will insert will get folded out.
+  BasicBlock *BB = IE->getParent();
+  BitVector EltsCovered(FVT->getNumElements());
+  InsertElementInst *Next = IE;
+  while (Next && !EltsCovered.all()) {
----------------
Pierre-vh wrote:
> Pierre-vh wrote:
> > arsenm wrote:
> > > I'm worried this heuristic is too simple and doesn't really recognize canonical IR. If I run instcombine on the test cases, nearly all of them fold out the insertelement chains
> > We don't run InstCombine after CGP, but we could. I tried it and my sample is even smaller with this:
> > (I haven't checked if it fixed the original issue though, but it should)
> > 
> >   - Trunk w/ `amdgpu-codegenprepare-break-large-phis=0`: 16314 instructions
> >   - This patch: 16310 instructions
> >   - Trunk: 40310
> >   - Trunk + InstCombiner run after CGP: 13057
> > 
> > If running IC after CGP is okay with you I can create a patch for it.
> Ah, I tried to run IC after CGP but it causes about 300 failures. Is there a way to run IC logic on the added instructions only perhaps?
Forgot to mention: I don't try to anticipate what instcombine will fold but rather what the DAG will fold.
There's no IC run after CGP, so this tries to only break PHIs when it thinks the DAG can simplify-away the added instructions

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147786/new/

https://reviews.llvm.org/D147786