[PATCH] D86364: [ValueTracking] Interpret GEPs as a series of adds multiplied by the related scaling factor

Sun Sep 6 13:28:16 PDT 2020

nikic added a comment.

In D86364#2239835 <https://reviews.llvm.org/D86364#2239835>, @qcolombet wrote:

> One thing we could do to help compile time is maybe add more caching to the Query API.
> Right now it only caches `llvm.assume` AFAICT, whereas we could cache known bits for values. That's what we did in the Machine IR variant of computeKnownBits.

To clarify, are you referring to caching within a single known bits query here (for the case where one instruction is recursively referenced multiple times), or across multiple queries? Not sure how useful the former would be, I don't have a good idea how often we do redundant known bits computations in one query. The latter has been tried at some point, but I think the conclusion was that properly invalidating that cache in InstCombine is too fragile.

> Let me know if the current compile time impact is acceptable or if I need to come up with a different strategy.

I don't think the current impact here is acceptable: This patch has very little benefit for standard targets (we really don't care about alignment all that much), but is a pretty significant compile-time regression, about 1.5% for tramp4-3d. That's not a good tradeoff.

I do have an old experiment lying around that would make this more feasible: Moving alignment inferral out of InstCombine into a separate pass that only runs once shortly before codegen (https://github.com/llvm/llvm-project/commit/d95001f63a7fb57a3c08002e05a751106ec8e10c for the basic idea -- it was a geomean 1% improvement when I tested it). This reduces the number of known bits queries on GEPs a lot and would make us less susceptible to how expensive they are.

================
Comment at: llvm/lib/Analysis/ValueTracking.cpp:1506
+        if (IndexTypeSize.isScalable())
+          AddrKnownBits.resetAll();
+        if (!AddrKnownBits.isUnknown()) {
----------------
qcolombet wrote:
> nikic wrote:
> > qcolombet wrote:
> > > @nikic FYI here is where we stop to track the full address but we still compute the trailing zeros.
> > > 
> > > If we don't do that, we will lose some information compared to the previous implementation.
> > > E.g., test/Analysis/ScalarEvolution/scalable-vector.ll will fail
> > > :5:10: error: CHECK: expected string not found in input
> > > ; CHECK: --> (3 * sizeof(<vscale x 4 x i32>)) U: [0,-15) S: [-9223372036854775808,9223372036854775793)
> > >          ^
> > > <stdin>:4:2: note: scanning from here
> > >  --> (3 * sizeof(<vscale x 4 x i32>)) U: full-set S: full-set
> > Okay, I see. I think we should still be able to combine these with some special handling in this spot. Would something like this work?
> > 
> >  * Combine IndexBits using computeKnownBitsMul as usual.
> >  * For scalable vectors, the result is additionally multiplied by an unknown vscale. As an approximation, only keep the trailing zero bits of IndexBits in this case, and then continue as usual.
> That should be doable though I think it will degrade compile time even more given I would expect that calling computeKnownBitsMul would be more expensive than just doing the math on `TrailZ`.
> 
> Also, are you suggesting that we get rid of `TrailZ` all together and just carry the trailing zeros bits in AddrKnownBits?
> If so, we would add back a bunch of computeKnownBits calls since AddrKnownBits won't be unknown anymore.
> 
> What do you think?
I don't think doing this would have a (negative) impact on compile-time, as it only affects the case where GEPs on scalable vectors are used (i.e. very rarely).

And yes, with this change it should be possible to get rid of TrailZ entirely and only use AddrKnownBits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86364/new/

https://reviews.llvm.org/D86364