[PATCH] D101409: [SCEV] Check IDom with single predecessor on getPredecessorWithUniqueSuccessorForBB

Wed Apr 28 13:55:51 PDT 2021

jaykang10 added inline comments.

================
Comment at: llvm/lib/Analysis/ScalarEvolution.cpp:9313
+      return {IDomPred, IDomBB};
+  }
+
----------------
nikic wrote:
> I was going to suggest that we can generalize this by walking up the idom chain (this also removes the need to special case loops):
> 
> ```
>   while (true) {
>     if (const BasicBlock *Pred = BB->getSinglePredecessor())
>       return {Pred, BB};
> 
>     DomTreeNode *IDom = DT.getNode(BB)->getIDom();
>     if (!IDom)
>       break;
> 
>     BB = IDom->getBlock();
>   }
> 
>   return {nullptr, nullptr};
> ```
> 
> Unfortunately, this has a bad compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=41849a91956755e15591240816d1d0c5ec402895&to=41241255419965bbfbc6d361058ea02449be9c50&stat=instructions ClamAV is up nearly 3%.
> 
> So I also tested your original patch: https://llvm-compile-time-tracker.com/compare.php?from=41849a91956755e15591240816d1d0c5ec402895&to=a012ed61e795b191cd7d114bfe972ce05c9f3d6b&stat=instructions This is better, but still bad, especially with little codegen impact.
> 
> isImpliedCond() is notoriously expensive, so compile-time is sensitive to the amount of guards we look at. I think if we want to do any extensions here, we need to also aggressively limit the number of edges for which we take conditions into account.
I appreciate your comment.

> I was going to suggest that we can generalize this by walking up the idom chain (this also removes the need to special case loops):
> 
> ```
>   while (true) {
>     if (const BasicBlock *Pred = BB->getSinglePredecessor())
>       return {Pred, BB};
> 
>     DomTreeNode *IDom = DT.getNode(BB)->getIDom();
>     if (!IDom)
>       break;
> 
>     BB = IDom->getBlock();
>   }
> 
>   return {nullptr, nullptr};
> ```

Yep, if we walk up dominator tree, it is more general solution and we can remove the code to check the loop.

> 
> Unfortunately, this has a bad compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=41849a91956755e15591240816d1d0c5ec402895&to=41241255419965bbfbc6d361058ea02449be9c50&stat=instructions ClamAV is up nearly 3%.
> 
> So I also tested your original patch: https://llvm-compile-time-tracker.com/compare.php?from=41849a91956755e15591240816d1d0c5ec402895&to=a012ed61e795b191cd7d114bfe972ce05c9f3d6b&stat=instructions This is better, but still bad, especially with little codegen impact.

Thanks for checking compile time and codegen impact. At this point, I need to explain why I am trying to add changes to SCEV and IRCE pass. 

I am trying to vectorize loops. Let's see a simple loop.

```
while() {
...
if ()
...
}
```
As you can see, there is `if` statement inside loop. As you know, LoopVectorizer tries to predicate the block of `if` statement with its condition and it asks target machines the cost of the instructions. If there are load or store instructions in the block to be predicated and target machine does not support masked load and store, LoopVectorizer assigns big number to the load and store instructions and it means we can not vectorize the loop. In this case, it is important to remove the `if` statement inside loop before LoopVectorizer.

We need to consider two cases to remove the `if` statement inside loop as below.
1. `if` statement's condition with loop invariant variables
2. `if` statement's condition with induction variables

For the first one, UnSwitch/SimpleUnSwitch pass handles the case. The passes hoist the loop invariant condition outside loop and unswitch loop.
For the second one, there could be several transformations but I think the IRCE pass is general solution. The pass splits the iteration space following the condition with induction variable.

I am trying the above passes handle more cases in order to make more loops vectorizable. Especially for the IRCE pass, the pass is using `isLoopEntryGuardedByCond` for bound check of new loop and it sometimes blocks the pass processes loop even though the loop is fine. This is a reason why I am trying to extend the SCEV's functions.

At this moment, the only SimpleUnSwitch pass is in pipeline of new pass manager but IRCE is not. Therefore, it is not easy to see the codegen impact from this SCEV change because they aim to help IRCE pass.

If possible, I would like to enable IRCE in the pipeline of new pass manager with these SCEV changes because it would make more loops vectorizable. 

> isImpliedCond() is notoriously expensive, so compile-time is sensitive to the amount of guards we look at. I think if we want to do any extensions here, we need to also aggressively limit the number of edges for which we take conditions into account.

Let me think about it.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101409/new/

https://reviews.llvm.org/D101409