[PATCH] D150266: [AMDGPU] Improve PHI-breaking heuristics in CGP

Pierre van Houtryve via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 10 07:20:23 PDT 2023


Pierre-vh added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1405-1409
+static bool areInSameBB(const Value *A, const Value *B) {
+  const auto *IA = dyn_cast<Instruction>(A);
+  const auto *IB = dyn_cast<Instruction>(B);
+  return IA && IB && IA->getParent() == IB->getParent();
+}
----------------
jmmartinez wrote:
> Nit (very optional): I think that if we made `areInSameBB` into `isOperandInSameBB` (see below) the code becomes slightly simpler
> 
> ```
> static bool isOperandInSameBB(const Instruction& I, unsigned Op) {
>   if(Instruction *OpI = dyn_cast<Instruction>(I->getOperand(Op))) {
>     return Op->getParent() == I.getParent();
>   }
>   return true;
> }
> ```
> 
> ---
> 
> ```
>     if (isa<Instruction>(VecSrc) && !areInSameBB(VecSrc, IE))
>       return false;
> ```
> becomes
> ```
>     if (!areInSameBB(IE, 0))
>       return false;
> ```
> 
> and
> 
> ```
>     return isa<Constant>(SV->getOperand(1)) ||
>            areInSameBB(SV, SV->getOperand(0)) ||
>            areInSameBB(SV, SV->getOperand(1));
> ```
> becomes
> ```
>     return areInSameBB(SV, 0) || areInSameBB(SV, 1);
> ```
I think it's a bit more readable if I keep the current style, in the loop I still need to extract VecSrc anyway so IMO it's clearer to do:

```
    const auto *VecSrc = IE->getOperand(0);
    if (isa<Instruction>(VecSrc) && !areInSameBB(VecSrc, IE))
      return false;
```
    const auto *VecSrc = IE->getOperand(0);
    if(isOperandInSameBB(IE, 0))` // it's identical to VecSrc but looks different for no particular reason
      return false;
```



================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1469-1475
+  if (auto It = BreakPhiNodesCache.find(&I); It != BreakPhiNodesCache.end())
+    return It->second;
+
+  // This function may recurse, so to guard against infinite looping, this PHI
+  // is conservatively considered unbreakable until we reach a conclusion.
+  bool &Result = BreakPhiNodesCache[&I];
+  Result = false;
----------------
jmmartinez wrote:
> You can use the result of `DenseMap::insert` to do a single lookup instead of two (one for checking if the element is in the map and another to insert the false value).
Nice catch, though with iterator invalidation I need to do another lookup anyway before returning to be safe.
Though DenseMap is very fast, especially here because it's unlikely to be large, so I think it's fine.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1472-1473
+
+  // This function may recurse, so to guard against infinite looping, this PHI
+  // is conservatively considered unbreakable until we reach a conclusion.
+  bool &Result = BreakPhiNodesCache[&I];
----------------
jmmartinez wrote:
> Have you considered using an enum with three states (`enum CanBreakPHI { OnHold, CanBreak, Unbreakable }`) instead of a boolean? In that case, we would be able to break phi-nodes that currently we can't.
> 
> I have no idea if that would be interesting though.
I tried a similar thing at first: considering all PHIs breakable until they were proven not to be, but I found that it's not beneficial. In one particularly pathological case it even increased the total number of instructions in the output by 50% (from 16000 to 23000). The current logic seems to handle all cases I checked against (rocRAND benchmarks mostly which are super sensitive to this transform)

I don't think a tri-state enum would bring anything here other than added complexity so I opted to just use a boolean, but I'm open to revisiting it.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1474
+  // is conservatively considered unbreakable until we reach a conclusion.
+  bool &Result = BreakPhiNodesCache[&I];
+  Result = false;
----------------
jmmartinez wrote:
> Are references to DenseMap elements guaranteed to be valid after insertion?
> 
> If not, the recursive call to `canBreakPhiNode` may insert new values into the cache and invalidate the references held by the parent calls. 
Ah, I always forget DenseMap can invalidate on insert
Fixed


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150266/new/

https://reviews.llvm.org/D150266



More information about the llvm-commits mailing list