[PATCH] D86364: [ValueTracking] Interpret GEPs as a series of adds multiplied by the related scaling factor

Wed Aug 26 10:22:17 PDT 2020

qcolombet added a comment.

Hi @nikic,

Thanks for the updated numbers!

> This is better than before, but still quite expensive for what it does.

What's an acceptable target for you?

The reason I am asking is because I fear we won't be able to shave much more from the implementation, while for us not having this information is pretty damaging. Put differently, the previous implementation is lacking critical information compared to hand written pointer arithmetic that could lead to missed optimization opportunities and dramatic performance lose.

I agree, that in itself this change doesn't seem worth the added compile time, but I think we should see it as an enabler for more optimizations. For instance, this change doesn't fix https://llvm.org/PR47241 but is a step in that direction. In other words, it doesn't fix it, because InstCombine didn't have this kind of information before and thus doesn't know how to act on it, but longer term, having this information would likely improve several optimizations.

Cheers,
-Quentin

================
Comment at: llvm/lib/Analysis/ValueTracking.cpp:1506
+        if (IndexTypeSize.isScalable())
+          AddrKnownBits.resetAll();
+        if (!AddrKnownBits.isUnknown()) {
----------------
nikic wrote:
> qcolombet wrote:
> > @nikic FYI here is where we stop to track the full address but we still compute the trailing zeros.
> > 
> > If we don't do that, we will lose some information compared to the previous implementation.
> > E.g., test/Analysis/ScalarEvolution/scalable-vector.ll will fail
> > :5:10: error: CHECK: expected string not found in input
> > ; CHECK: --> (3 * sizeof(<vscale x 4 x i32>)) U: [0,-15) S: [-9223372036854775808,9223372036854775793)
> >          ^
> > <stdin>:4:2: note: scanning from here
> >  --> (3 * sizeof(<vscale x 4 x i32>)) U: full-set S: full-set
> Okay, I see. I think we should still be able to combine these with some special handling in this spot. Would something like this work?
> 
>  * Combine IndexBits using computeKnownBitsMul as usual.
>  * For scalable vectors, the result is additionally multiplied by an unknown vscale. As an approximation, only keep the trailing zero bits of IndexBits in this case, and then continue as usual.
That should be doable though I think it will degrade compile time even more given I would expect that calling computeKnownBitsMul would be more expensive than just doing the math on `TrailZ`.

Also, are you suggesting that we get rid of `TrailZ` all together and just carry the trailing zeros bits in AddrKnownBits?
If so, we would add back a bunch of computeKnownBits calls since AddrKnownBits won't be unknown anymore.

What do you think?

================
Comment at: llvm/lib/Analysis/ValueTracking.cpp:1527
+      else if (IndexBitWidth > BitWidth)
+        IndexBits = IndexBits.trunc(BitWidth);
+
----------------
nikic wrote:
> We should add an sextOrTrunc() method to KnownBits. We already have zextOrTrunc() and anyextOrTrunc().
Sure.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86364/new/

https://reviews.llvm.org/D86364