[PATCH] D37419: Teach scalar evolution to handle inttoptr/ptrtoint

Sun Sep 3 18:16:20 PDT 2017

On Sun, Sep 3, 2017 at 4:53 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>> > I also don't understand "  // It's tempting to handle inttoptr and
>> > ptrtoint
>> > as no-ops, however this can
>> >   // lead to pointer expressions which cannot safely be expanded to
>> > GEPs,
>> >   // because ScalarEvolution doesn't respect the GEP aliasing rules when
>> >   // simplifying integer expressions."
>> >
>> > 1. What are "the gep aliasing rules", exactly?
>> > https://llvm.org/docs/LangRef.html#pointeraliasing does not mention
>> > special
>> > rules for gep :)
>>
>> I think the rule it is talking about is that reads and writes on a GEP
>> are allowed to only read from and write to the allocation that is
>> passed in as the first parameter to the GEP
>
> When we had the discussion about "inrange" and "inbounds", (and elsewhere,
> like field accesses) it was raised repeatedly that it's just fine to gep
> outside what would be the normal "range" of a pointer, because you can't
> ever really tell what it is.
> IE a gep of a given pointer can access anything it wants.
>
> In fact, it seemed generally accepted that you can happily GEP from a
> pointer to anywhere you want unless there's an inrange or inbounds on it.

I think there are two issues here.

Non-inbounds GEPs to outside the base allocation is allowed by
themselves, and produce a bitwise value.  Inbounds GEPs to outside the
base allocation produces poison.

So this is allowed:

  %p0 = malloc(16);
  %p1 = malloc(16);
  %gep = gep i8* %p0, i64 18
  %cmp = icmp eq %gep, %p1
  br i1 %cmp, ...

but it would not be allowed if the GEP was inbounds (we'd be branching
on poison).

*However* (this is the "and issue reads/writes" bit in my previous
email) for reading / writing through the result of a GEP the GEP must
*also* follow the pointer aliasing rules.  Of course the GEP must not
have produced poison either, that's necessary but not sufficient for
safe memory operations.

> So i'm not sure what you mean by "the allocation", since none of these
> allocations are  defined in terms of llvm memory semantics (IE they aren't
> alloca's, they are malloc/frees).
> If that's not the case, we are not being consistent (IE we're losing
> optimizations we shouldn't, and we have correctness issues we don't think we
> do).

[snip]

> This set of rules, without anything more, implies we have both correcntess
> and optimization issues :)

The semantic in "pointer aliasing rules" is already implemented --
BasicAA returns NoAlias for %p1 and %gep even though @subtract could
have returned an offset that will guarantee that at runtime %p1 ==
%gep:

```
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define void @f(i64 %arg, i64 %arg1) {
bb:
  %p0 = tail call noalias i8* @malloc(i64 %arg) #3
  %p1 = tail call noalias i8* @malloc(i64 %arg) #3
  %offset = tail call i64 @subtract(i8* %p1, i8* %p0)
  %gep = getelementptr i8, i8* %p0, i64 %offset  ;; not inbounds
  ret void
}

declare noalias i8* @malloc(i64)
declare i64 @subtract(i8*, i8*)
```

I believe this follows from the GetUnderlyingObject and
isIdentifiedObject related logic in  BasicAAResult::aliasCheck.

And I'm not aware of any transforms in LLVM today that contradict the
pointer aliasing rules.

-- Sanjoy