[PATCH] D48853: [SCEV] Add [zs]ext{C, +, x} -> (D + [zs]ext{C-D, +, x})<nuw><nsw> transform

Thu Jul 19 16:24:37 PDT 2018

rtereshin added inline comments.

================
Comment at: lib/Analysis/ScalarEvolution.cpp:1566-1568
+static APInt extractConstantWithoutWrapping(ScalarEvolution &SE,
+                                            const SCEVConstant *ConstantTerm,
+                                            const SCEVAddExpr *FullExpr) {
----------------
mzolotukhin wrote:
> rtereshin wrote:
> > mzolotukhin wrote:
> > > Could you please also add a description for `FullExpr`? It might be helpful to add even more examples here and describe the intended use (e.g. `ConstantTerm` is `Start` and `FullExpr` is `Step` of an `AddRec` expression).
> > Thanks for looking into this!
> > 
> > > Could you please also add a description for FullExpr?
> > Sure, will do.
> > 
> > > It might be helpful to add even more examples here and describe the intended use (e.g. ConstantTerm is Start and FullExpr is Step of an AddRec expression).
> > The next overload is the one that handles AddRec with parameter names being `ConstantStart` and `Step`. Do you think the names are self-explanatory or I need to elaborate in a comment?
> > 
> > As for AddExpr-version here I'm going to elaborate what `FullExpr` is and hopefully that will be clear enough.
> Thanks! I'm fine with whatever way you choose, I'd just like to see some example/description of what `FullExpr` should be.
> 
> For instance, I spent some time thinking that `FullExpr` would be, e.g. `3 + 4*x + 6*y` (which was inspired by your examples) and I couldn't understand how you can ever get TZ not equal to 0. It all made sense in the end, but saying that `ConstantStart` is 3 and `FullExpr` is `4*x + 6*y` would've saved me some time :)
Thing is, you were right, `SCEVAddExpr *FullExpr` is `3 + 4*x + 6*y`. It's the iteration (`for (unsigned I = 1,...`) that goes from operand 1 instead of operand 0.

I feel like it would be a little trickier to ask clients of this function to provide a reference (a pair of operand iterators, for instance) to `4*x + 6*y`.

================
Comment at: lib/Analysis/ScalarEvolution.cpp:1585-1588
+  for (unsigned I = 1, E = FullExpr->getNumOperands(); I < E && TZ; ++I)
+    TZ = std::min(TZ, SE.GetMinTrailingZeros(FullExpr->getOperand(I)));
+  if (TZ)
+    return TZ < BitWidth ? C.trunc(TZ).zext(BitWidth) : C;
----------------
mzolotukhin wrote:
> rtereshin wrote:
> > mzolotukhin wrote:
> > > Just checking my understanding: we're basically finding the largest common denominator here, which is also a power of 2, right?
> > > we're basically finding the largest common denominator here, which is also a power of 2, right?
> > This
> > ```
> >   uint32_t TZ = BitWidth;
> >   for (unsigned I = 1, E = FullExpr->getNumOperands(); I < E && TZ; ++I)
> >     TZ = std::min(TZ, SE.GetMinTrailingZeros(FullExpr->getOperand(I)))
> > ```
> > piece effectively does exactly that, yes. Another way to look at it is to say that we have an `AddExpr` that looks like `(C + x + y + ...)`, where `C` is a constant and x, y, ... are arbitrary SCEVs, and we're computing the minimum number of trailing zeroes guaranteed of the sum w/o the constant term: `(x + y + ...)`. If, for example, those terms look like follows:
> > ```
> >         i
> > XXXX...X000
> > YYYY...YY00
> >     ...
> > ZZZZ...0000
> > ```
> > then the rightmost non-guaranteed zero bit (a potential one at i-th position above) can change the bits of the sum to the left, but it can not possibly change the bits to the right. So we can compute the number of trailing zeroes by taking a minimum between the numbers of trailing zeroes of the terms.
> > 
> > Now let's say that our original sum with the constant is effectively just `C + X`, where `X = x + y + ...`. Let's say we've got 2 guaranteed trailing zeros for `X`:
> > ```
> >          j
> > CCCC...CCCC
> > XXXX...XX00
> > ```
> > Any bit of `C` to the left of `j` may in the end cause the `C + X` sum to wrap, but the rightmost 2 bits of `C` (at positions `j` and `j - 1`) do not affect wrapping in any way. If the upper bits cause a wrap, it will be a wrap regardless of the values of the 2 least significant bits of `C`. If the upper bits do not cause a wrap, it won't be a wrap regardless of the values of the 2 bits on the right (again).
> > 
> > So let's split C to 2 constants like follows:
> > ```
> > 0000...00CC  = D
> > CCCC...CC00  = (C - D)
> > ```
> > and the whole sum like `D + (C - D + X)`. The second term of this new sum looks like this:
> > ```
> > CCCC...CC00
> > XXXX...XX00
> > -----------
> > YYYY...YY00
> > ```
> > The sum above (let's call it `Y`)) may or may not wrap, we don't know, so we need to keep it under a sext/zext. Adding `D` to that sum though will never wrap, signed or unsigned, if performed on the original bit width or the extended one, because all that that final add does is setting the 2 least significant bits of `Y` to the bits of `D`:
> > ```
> > YYYY...YY00 = Y
> > 0000...00CC = D
> > ----------- <nuw><nsw>
> > YYYY...YYCC
> > ```
> > Which means we can safely move that D out of the sext or zext and claim that the top-level sum neither sign wraps nor unsigned wraps.
> > 
> > Let's run an example, let's say we're working in `i8`s and the original expression (zext's or sext's operand) is `21 + 12x + 8y`. So it goes like this:
> > ```
> > 0001 0101  // 21
> > XXXX XX00  // 12x
> > YYYY Y000  // 8y
> > 
> > 0001 0101  // 21
> > ZZZZ ZZ00  // 12x + 8y  // true, alternatively one can say that gcd(12, 8) is guaranteed to have 2 zeroes on the right
> > 
> > 0000 0001  // D
> > 0001 0100  // 21 - D = 20
> > ZZZZ ZZ00  // 12x + 8y
> > 
> > 0000 0001  // D
> > WWWW WW00  // 21 - D + 12x + 8y = 20 + 12x + 8y
> > ```
> > therefore `zext(21 + 12x + 8y)` = `(1 + zext(20 + 12x + 8y)<nuw><nsw>`
> Thanks for the great explanation! I think it's worth having it or its shorter version somewhere in comments. And just to be clear: I think that the patch is already very well-commented (thank you for that!), my remarks are just nit-picks.
You are welcome!

Hm... Do you think it could be better if I just put this as is in the commit message instead? If someone goes curious, they git blame and see a detailed explanation attributed to the exact version of the code that that explanation describes? This way we don't have a really huge comment in code that will most certainly get out of synch with the implementation at some point in the future.

Repository:
  rL LLVM

https://reviews.llvm.org/D48853