[PATCH] D38662: [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)

Wed Oct 11 23:52:08 PDT 2017

hfinkel added inline comments.

================
Comment at: lib/Analysis/BasicAliasAnalysis.cpp:379
+  unsigned ShiftBits = Offset.getBitWidth() - PointerSize;
+  return (Offset << ShiftBits).ashr(ShiftBits);
+}
----------------
mppf wrote:
> This is meant to just take the bottom PointerSize bits. Shouldn't you use APInt.trunc to express it more simply?
Yes, where the result is sign extended. We can't just trunc the APInt, however, or its size will be wrong (I believe that this function ends up dealing with cases where we're dealing with a pointer size that'smaller than the maximum/initial one).

You're right, however, that writing Offset.trunc(PointerSize).sextOrSelf(PrevBitWidth) would be clearer.

================
Comment at: lib/Analysis/BasicAliasAnalysis.cpp:387
+
+  return MaxPointerSize;
 }
----------------
mppf wrote:
> I find it surprising that you needed to do this, but I don't have a great understanding of what's going on. In the change description, you pointed out 
> 
> >This is because, as noted in the patch, when GetLinearExpression decomposes an expression into C1*V+C2, and we then multiply this by Scale, and distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can overflow. If this happens, later logic will draw invalid conclusions from the (base) offset value. 
> 
> Can you explain why it's not sufficient to do these computations in the number ring for the GEP? E.g. if all GEPs were with 32-bit pointers, mathematically the distribution should work if everything is done with 32-bit numbers, including with overflow. And the LLVM spec says that is what these GEPs mean...
> 
> Is it just that the alias analysis pass then conservatively throws up its hands if overflow occurred? Or is it the case that the computations done here might cross different pointer sizes?
I don't have a great answer to this question, but I can say that some of the logic here doesn't work if we assume that we're working mod 2^n. For one thing, I think that makes it very hard to do any kind of relational comparisons. a < b does not have a useful meaning if a and b are both really congruence classes (i.e., if all we know is that a is really a + k*2^n, for some k, and b is really b + j*2^n, for some j, then it's impossible to conclude which number is greater or smaller than some other).

As I recall, we were specifically running into a issue with this check:
    // If we know all the variables are positive, then GEP1 >= GEP1BasePtr.
    // If GEP1BasePtr > V2 (GEP1BaseOffset > 0) then we know the pointers
    // don't alias if V2Size can fit in the gap between V2 and GEP1BasePtr.
    if (AllPositive && GEP1BaseOffset.sgt(0) && GEP1BaseOffset.uge(V2Size))
      return NoAlias;

The GEP1BaseOffset > 0 isn't something you can meaningfully do if all we know is the value is some element of its congruence class (because, in that case, all numbers are positive and negative). The other comparison has this problem too.

================
Comment at: lib/Analysis/BasicAliasAnalysis.cpp:492
+            CIdx->getValue().sextOrSelf(MaxPointerSize))
+              .sextOrTrunc(MaxPointerSize);
         continue;
----------------
mppf wrote:
> I'm probably missing something, but isn't .sextOrSelf(x).sextOrTrunc(x) going to do the same thing as .sextOrTrunc(x) ?
I think you've overlooked one parenthesis.

I'm trying to match the original code here, so I'm sign extending CIdx before the multiplication. It  seemed like the result of the multiplication may have returned a 64-bit APInt (even if it was originally 32-bits), and so in that case, we need to truncate again (and there is no truncOrSelf). Looking through the APInt source code, I can't explain that behavior, so I'll look at this again.

================
Comment at: lib/Analysis/BasicAliasAnalysis.cpp:519
+      // relevant values of V, (C2*Scale) can overflow. In that case, we cannot
+      // decompose the expression in this way.
+      APInt WideScaledOffset = IndexOffset.sextOrTrunc(MaxPointerSize*2) *
----------------
efriedma wrote:
> hfinkel wrote:
> > efriedma wrote:
> > > hfinkel wrote:
> > > > efriedma wrote:
> > > > > I'm not sure I understand this comment.
> > > > > 
> > > > > It's true that C2*Scale can overflow, but I'm not sure it makes sense to try to address that here.  The multiply by the scale can overflow no matter where the scale comes from (assuming the GEP isn't marked inbounds).
> > > > Can you please elaborate?
> > > > 
> > > > If this overflows, then the offset here isn't an offset, and the reasoning done later won't be correct.
> > > > 
> > > > In the test case, the GEPs are marked as inbounds. I don't think that's relevant here.
> > > Consider the following testcase:
> > > 
> > > ```
> > > target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
> > > define i32* @b1(i32 *%a) {
> > > %r = getelementptr i32, i32* %a, i32 -2147483648
> > > ret i32* %r
> > > }
> > > define i32* @b2(i32 *%a) {
> > > %o = add i32 -2147483648, 0
> > > %r = getelementptr i32, i32* %a, i32 %o
> > > ret i32* %r
> > > }
> > > ```
> > > 
> > > In both cases, %a and %r should be mustalias.  If you run "-aa-eval", we somehow conclude that the pointers are mustalias in b1, and noalias in b2.   If you're trying to make overflow well-behaved, it doesn't make sense to treat the overflow introduced by GetLinearExpression differently from the overflow inherent in scaling.
> > I agree, however, as I attempted to explain in the comment:
> > 
> >   1. The problem is not in GetLinearExpression itself, it is with how the results of GetLinearExpression are being used here.
> >   2. What's happening here, where we're multiplying by the scale (which by itself is always fine), and applying the distributive property, can introduce an overflow in cases where an overflow might not have existed in the original program.
> > 
> > I believe that the test cases demonstrate this: there are certainly values for the function parameters for which they'll be no overflow in the address calculations.
> I think aliasGEP needs to be changed to accept that decompsed GEP arithmetic will overflow, or somehow adopt much more restrictive overflow checking (in many more places than this patch does).
> 
> ----
> 
> Suppose we want to adopt more restrictive overflow checking.  First, so we have an expression `(C1*V+C2)*Scale`.  If we treat `(C1*V+C2)` as opaque, there's one place this can overflow: the multiply by Scale.  Currently, we completely ignore this possibility.
> 
> So now, instead of treating the inner expression as opaque, we want to decompose it to something like `(C1*Scale)*V+C2*Scale`.  There are four arithmetic operations here.  All four of them can potentially overflow.  And two of them can't be overflow-checked here because V isn't known at compile-time.
> 
> Given that, I don't see how overflow-checking `C2 * Scale` accomplishes anything except hiding the problem for your exact testcase.
> I think aliasGEP needs to be changed to accept that decompsed GEP arithmetic will overflow, or somehow adopt much more restrictive overflow checking (in many more places than this patch does).

I think that it does need to be changed, but if we can fix it like this, that probably preserves our optimization abilities. Changing it in other ways, AFAIKT, will weaken it (perhaps unnecessarily). The only other place I see that might have this issue (introducing new compile-time overflows) is in BasicAAResult::constantOffsetHeuristic. There may, however, be a more-general problem...

> If we treat (C1*V+C2) as opaque, there's one place this can overflow: the multiply by Scale. Currently, we completely ignore this possibility.

If it is opaque, then it is just a value (regardless of whether or not its computation depended on some overflowing result). Is there something we'd need to do?

> So now, instead of treating the inner expression as opaque, we want to decompose it to something like (C1*Scale)*V+C2*Scale. There are four arithmetic operations here. All four of them can potentially overflow. And two of them can't be overflow-checked here because V isn't known at compile-time.

When I looked at this previously, my thought had been that only the C2*Scale overflow could happen in well-defined code. I may have been wrong about that. The issue is that if C1*Scale overflows then C1*V is likely to overflow in the original program. That, however, is clearly not necessarily true. So, yes, I agree, we should check that too.

Then there's the other two operations. They, indeed, might overflow. I don't see anything to prevent them from overflowing even if the original expression did not (essentially in some cases where |V| < Scale and |C1|,|C2| are large in the right ways). In the face of such overflows, the decomposition may not be sound, unfortunately.

> Given that, I don't see how overflow-checking C2 * Scale accomplishes anything except hiding the problem for your exact testcase.

This may be a fair criticism. I think that the general problem of dealing with overflows is going to need a careful audit and a lot of changes. There's a lot of logic here that likely isn't completely sound. However, I don't want to make things worse, certainly, and this was a pre-existing regression test. I should check for C1*Scale overflow too (to avoid making anything worse).

More-general pre-existing problems, however, I'd recommend treating as orthogonal to this change.  I think that more-consistently using APInts makes addressing these problems easier, not harder, in follow-up work.

================
Comment at: lib/Analysis/BasicAliasAnalysis.cpp:553

-      if (Scale) {
-        VariableGEPIndex Entry = {Index, ZExtBits, SExtBits,
-                                  static_cast<int64_t>(Scale)};
+      if (!!Scale) {
+        VariableGEPIndex Entry = {Index, ZExtBits, SExtBits, Scale};
----------------
mppf wrote:
> I've never seen !! used like this before. Is it intentional? Is it the same as if (Scale != 0) ? If so, wouldn't that be clearer?
Yea, it's intentional. IIRC, APInt does not have a Boolean conversion. This is used other places in LLVM's codebase.

================
Comment at: lib/Analysis/BasicAliasAnalysis.cpp:1098
+  if (C1->getBitWidth() > 64 || C2->getBitWidth() > 64)
+    return MayAlias;
+
----------------
mppf wrote:
> This addition seems surprising to me. Why wouldn't we just use APInt for the computation below? Is this a workaround for something?
Ah, sort of. Updating StructLayout::getElementOffset to take an APInt does not seem worthwhile. You're right, however: this is the wrong test. It should say:

if (C1->getActiveBits() > 64 || C2->getActiveBits() > 64)

https://reviews.llvm.org/D38662