[PATCH] D93973: [SCEV] Simplify trunc to zero based on known bits

Sun Jan 3 03:16:54 PST 2021

fhahn added inline comments.

================
Comment at: llvm/lib/Analysis/ScalarEvolution.cpp:1181
+    // desired recursion depth, only do this for unknown SCEV's.
+    if (const SCEVUnknown *U = dyn_cast<SCEVUnknown>(Op)) {
+      uint32_t MinTrailingZeros = GetMinTrailingZeros(Op);
----------------
gilr wrote:
> nikic wrote:
> > fhahn wrote:
> > > Are you worried about the cost of calling `GetMintrailingZeros` here? Could it happen that we have a arbitrary SCEV expression here for which we can get more accurate results thanks to `GetMinTrailingZeros`?
> > I believe this optimization can only be useful in the first place for SCEVUnknown (and SCEVPtrToInt, which is basically the same thing), as well as min/max expression. The latter only because we don't push truncates through min/max like we do for everything else. Possibly we should be doing that, but that's a different issue...
> > 
> > That said, from a code structure perspective, I'm not sure why this check is present in the "Depth > MaxCastDepth" branch at all. This is a recursion cut-off that is //supposed// to produce sub-optimal SCEV expressions, and there does not seem be any strong cause why this particular optimization needs to be applied unconditionally.
> >I believe this optimization can only be useful in the first place for SCEVUnknown
> 
> This patch was indeed motivated by assumptions, but the analysis first needs to get to the SCEVUnknowns. Calling GetMinTrailingZeros on any SCEV that got through GetTruncateExpr complements what ever the latter didn't simplify, but admittedly GetMinTrailingZeros is unboundedly recursive itself (even if cached and unrelated to getTruncateExpr's recursion).
> 
> >This is a recursion cut-off that is supposed to produce sub-optimal SCEV expressions
> 
> Cutting off potentially exponential recursion at some point makes a lot of sense. Not sure it implies not doing any work at the leaves though.
> 
> To be on the safe side let's start by calling GetMinTrailingZeros only for SCEVUnknowns at the end of the function and extend as needed.
>> I believe this optimization can only be useful in the first place for SCEVUnknown
> This patch was indeed motivated by assumptions, but the analysis first needs to get to the SCEVUnknowns. Calling GetMinTrailingZeros on any SCEV that got through GetTruncateExpr complements what ever the latter didn't simplify, but admittedly GetMinTrailingZeros is unboundedly recursive itself (even if cached and unrelated to getTruncateExpr's recursion).

I am not sure, is there's anything conceptually making this only useful for `SCEVUnknown`?

 `GetMinTrailingZeroes` can provide useful bounds for a range of expressions which may be helpful for this optimization. One example involving a `UMax` expression below. This example probably highlights a missing fold for truncates, but the main point is that the reasoning in this patch here is complimentary and may catch additional cases. There is some overlap with the folds in the function, but I am not sure this means we should restrict this only to `SCEVUnknown`.  If `GetMinTrailingZeros` gets improved, it would be good to not miss out of the benefits in this function.

```
define i8 @trunc_to_assumed_zeros0(i32* %p, i32* %p.2, i1 %c) {
  %a = load i32, i32* %p
  %b = load i32, i32* %p.2
  %and.1 = and i32 %a, 255
  %cmp.1 = icmp eq i32 %and.1, 0

  tail call void @llvm.assume(i1 %cmp.1)
  %and.2 = and i32 %b, 255
  %cmp.2 = icmp eq i32 %and.2, 0
  tail call void @llvm.assume(i1 %cmp.2)

  %lt = icmp ugt i32 %a, %b
  %sel = select i1 %lt, i32 %a, i32 %b

  %t1 = trunc i32 %sel to i8
  %t2 = trunc i32 %a to i8
  %t3 = trunc i32 %b to i8
  ret i8 %t1
}
```

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93973/new/

https://reviews.llvm.org/D93973