[PATCH] D150266: [AMDGPU] Improve PHI-breaking heuristics in CGP
Pierre van Houtryve via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed May 10 07:20:23 PDT 2023
Pierre-vh added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1405-1409
+static bool areInSameBB(const Value *A, const Value *B) {
+ const auto *IA = dyn_cast<Instruction>(A);
+ const auto *IB = dyn_cast<Instruction>(B);
+ return IA && IB && IA->getParent() == IB->getParent();
+}
----------------
jmmartinez wrote:
> Nit (very optional): I think that if we made `areInSameBB` into `isOperandInSameBB` (see below) the code becomes slightly simpler
>
> ```
> static bool isOperandInSameBB(const Instruction& I, unsigned Op) {
> if(Instruction *OpI = dyn_cast<Instruction>(I->getOperand(Op))) {
> return Op->getParent() == I.getParent();
> }
> return true;
> }
> ```
>
> ---
>
> ```
> if (isa<Instruction>(VecSrc) && !areInSameBB(VecSrc, IE))
> return false;
> ```
> becomes
> ```
> if (!areInSameBB(IE, 0))
> return false;
> ```
>
> and
>
> ```
> return isa<Constant>(SV->getOperand(1)) ||
> areInSameBB(SV, SV->getOperand(0)) ||
> areInSameBB(SV, SV->getOperand(1));
> ```
> becomes
> ```
> return areInSameBB(SV, 0) || areInSameBB(SV, 1);
> ```
I think it's a bit more readable if I keep the current style, in the loop I still need to extract VecSrc anyway so IMO it's clearer to do:
```
const auto *VecSrc = IE->getOperand(0);
if (isa<Instruction>(VecSrc) && !areInSameBB(VecSrc, IE))
return false;
```
const auto *VecSrc = IE->getOperand(0);
if(isOperandInSameBB(IE, 0))` // it's identical to VecSrc but looks different for no particular reason
return false;
```
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1469-1475
+ if (auto It = BreakPhiNodesCache.find(&I); It != BreakPhiNodesCache.end())
+ return It->second;
+
+ // This function may recurse, so to guard against infinite looping, this PHI
+ // is conservatively considered unbreakable until we reach a conclusion.
+ bool &Result = BreakPhiNodesCache[&I];
+ Result = false;
----------------
jmmartinez wrote:
> You can use the result of `DenseMap::insert` to do a single lookup instead of two (one for checking if the element is in the map and another to insert the false value).
Nice catch, though with iterator invalidation I need to do another lookup anyway before returning to be safe.
Though DenseMap is very fast, especially here because it's unlikely to be large, so I think it's fine.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1472-1473
+
+ // This function may recurse, so to guard against infinite looping, this PHI
+ // is conservatively considered unbreakable until we reach a conclusion.
+ bool &Result = BreakPhiNodesCache[&I];
----------------
jmmartinez wrote:
> Have you considered using an enum with three states (`enum CanBreakPHI { OnHold, CanBreak, Unbreakable }`) instead of a boolean? In that case, we would be able to break phi-nodes that currently we can't.
>
> I have no idea if that would be interesting though.
I tried a similar thing at first: considering all PHIs breakable until they were proven not to be, but I found that it's not beneficial. In one particularly pathological case it even increased the total number of instructions in the output by 50% (from 16000 to 23000). The current logic seems to handle all cases I checked against (rocRAND benchmarks mostly which are super sensitive to this transform)
I don't think a tri-state enum would bring anything here other than added complexity so I opted to just use a boolean, but I'm open to revisiting it.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1474
+ // is conservatively considered unbreakable until we reach a conclusion.
+ bool &Result = BreakPhiNodesCache[&I];
+ Result = false;
----------------
jmmartinez wrote:
> Are references to DenseMap elements guaranteed to be valid after insertion?
>
> If not, the recursive call to `canBreakPhiNode` may insert new values into the cache and invalidate the references held by the parent calls.
Ah, I always forget DenseMap can invalidate on insert
Fixed
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D150266/new/
https://reviews.llvm.org/D150266
More information about the llvm-commits
mailing list