[PATCH] D147786: [AMDGPU] Less aggressively break large PHIs
Pierre van Houtryve via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 11 00:53:12 PDT 2023
Pierre-vh added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1412-1431
+ // Check if this is a simple chain of insertelement that fills the vector. If
+ // that's the case, we can break up this PHI node profitably because the
+ // extractelement we will insert will get folded out.
+ BasicBlock *BB = IE->getParent();
+ BitVector EltsCovered(FVT->getNumElements());
+ InsertElementInst *Next = IE;
+ while (Next && !EltsCovered.all()) {
----------------
Pierre-vh wrote:
> arsenm wrote:
> > I'm worried this heuristic is too simple and doesn't really recognize canonical IR. If I run instcombine on the test cases, nearly all of them fold out the insertelement chains
> We don't run InstCombine after CGP, but we could. I tried it and my sample is even smaller with this:
> (I haven't checked if it fixed the original issue though, but it should)
>
> - Trunk w/ `amdgpu-codegenprepare-break-large-phis=0`: 16314 instructions
> - This patch: 16310 instructions
> - Trunk: 40310
> - Trunk + InstCombiner run after CGP: 13057
>
> If running IC after CGP is okay with you I can create a patch for it.
Ah, I tried to run IC after CGP but it causes about 300 failures. Is there a way to run IC logic on the added instructions only perhaps?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D147786/new/
https://reviews.llvm.org/D147786
More information about the llvm-commits
mailing list