[PATCH] D147786: [AMDGPU] Less aggressively break large PHIs

Tue Apr 11 00:35:55 PDT 2023

Pierre-vh added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1412-1431
+  // Check if this is a simple chain of insertelement that fills the vector. If
+  // that's the case, we can break up this PHI node profitably because the
+  // extractelement we will insert will get folded out.
+  BasicBlock *BB = IE->getParent();
+  BitVector EltsCovered(FVT->getNumElements());
+  InsertElementInst *Next = IE;
+  while (Next && !EltsCovered.all()) {
----------------
arsenm wrote:
> I'm worried this heuristic is too simple and doesn't really recognize canonical IR. If I run instcombine on the test cases, nearly all of them fold out the insertelement chains
We don't run InstCombine after CGP, but we could. I tried it and my sample is even smaller with this:
(I haven't checked if it fixed the original issue though, but it should)

  - Trunk w/ `amdgpu-codegenprepare-break-large-phis=0`: 16314 instructions
  - This patch: 16310 instructions
  - Trunk: 40310
  - Trunk + InstCombiner run after CGP: 13057

If running IC after CGP is okay with you I can create a patch for it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147786/new/

https://reviews.llvm.org/D147786