[llvm-dev] getelementptr inbounds with offset 0

Doerfert, Johannes via llvm-dev llvm-dev at lists.llvm.org
Fri Mar 15 09:57:22 PDT 2019


Hi Ralf,

On 03/15, Ralf Jung wrote:
> > From the Lang-Ref statement
> > 
> >   "With the inbounds keyword, the result value of the GEP is undefined
> >   if the address is outside the actual underlying allocated object and
> >   not the address one-past-the-end."
> > 
> > I'd argue that the actual offset value (here 0) is irrelevant. The GEP
> > value is undefined if inbounds is present and the resulting pointer does
> > not point into, or one-past-the-end, of an allocated object. This
> > object, in my understanding, has to be the same one the base pointer of
> > the GEP points into, or one-past-the-end, or you get again an undefined
> > result.
> 
> Yes, I agree with that reading.

That's reassuring for me ;)


> However, the notion of "allocated object" here is not entirely clear.

True.


> LLVM has to operate under the assumption that there are allocations
> and allocators it doe snot know anything about.  Just imagine some
> embedded project writing to well-known address 0xDeadCafe because
> there is a hardware register there.

True.


> So, the thinking here is: LLVM cannot exclude the possibility of an
> object of size 0 existing at any given address.  The pointer returned
> by "GEPi p 0" then would be one-past-the-end of such a 0-sized object.
> Thus, "GEPi p 0" is the identitiy function for any p, it will not
> return poison.

I don't see the problem. The behavior I hope we want and implement is:

Either LLVM knows that %p points to an invalid address (=non-object) or
it doesn't. If it does, %p and all GEPs on it yield poison. If it
doesn't, it has to assume %p points to a valid address and offset 0, 1,
2, ... might all yield valid pointers. The special case is when we know
%p is valid and has extend of (at most) S, then all offsets <= S,
including 0, are potentially valid (negative extends are similar).


> > Now if that might cause any problems, e.g., if LLVM is able to act
> > on this fact, depends on various factors including what you do with
> > the GEP. Your initial problem seemed to be that LLVM "might be able
> > to deduce dereferencable memory at location 4" but that should never
> > be the case if you only form the aforementioned GEP, with or without
> > the inbounds actually. Forming a pointer that has a undefined value
> > is just that, a pointer with an undefined value.
> 
> Ah, good point.  First of all I was indeed unclear; the case I am
> worried about here is GEPi returning poison.  (These values might be
> used in further computations and eventually surface as UB.) But also,
> clearly a "GEPi 0" alone cannot introduce any dereferencability
> assumption because of the "one-past-the-end" case. That point is
> inbounds but cannot be dereferenced.
> 
> So, for the sake of a more concrete example (and please excuse me
> butchering LLVM syntax, I usually deal with this in terms of C or Rust
> syntax): Can %G in the following programs be poison?  If yes, what is
> the analysis that would be weakened or the optimization that could no
> longer happen if "GEPi %P 0" was instead defined to always return %P?
> 
> # example1
> 
> %P1 = int2ptr 4
> %G1 = gep inbounds %P1 0
> 
> # example2
> 
> %P2 = call noalias i8* @malloc(i64 12)
> call void @free(i8* %P2)
> %G2 = gep inbounds %P2 0
> 
> The first happens in Rust all the time, and we rely on not getting
> poison.  The second doesn't occur in Rust (to my knowledge), but it
> seems somewhat inconsistent to return poison in one case and not the
> other.

Let's start with example2, note that I renamed the values above.

%P2 is dangling (and we know it) after the free. %P2 is therefore
poison* and so is %G2.

* or undef I'm always confused which might be bad in this conversation.



In example1, without further information, I'd say that there is no
poison (statically). Address 4 could be an allocated object until proven
otherwise.


I am still a little confused about the problem you see. If what I wrote
about the implemented behavior holds true (which I am not totally sure
of), you should not have a problem with poison even if you would
sprinkle GEP (inbounds) %p 0 all over the place. Either %p was known to
be invalid and so is the GEP, or %p was not known to be invalid and
neither is the GEP. Am I missing something here?

Cheers,
  Johannes

> > A side-effect based on the GEP will however __locally__ introduce an
> > dereferencability assumption (in my opinion at least). Let's say the
> > code looks like this:
> > 
> > 
> >   %G = gep inbounds (int2ptr 4) 0 ; We don't know anything about the
> >   dereferencability of ; the memory at address 4 here.  br %cnd,
> >   %BB0, %BB1
> > 
> > BB0: ; We don't know anything about the dereferencability of ; the
> > memory at address 4 here.  load %G ; We know the memory at address 4
> > is dereferenceable here.  ; Though, that is due to the load and not
> > the inbounds.  ...  br %BB1
> > 
> > BB1: ; We don't know anything about the dereferencability of ; the
> > memory at address 4 here.
> > 
> > 
> > It is a different story if you start to use the GEP in other
> > operations, e.g., to alter control flow. Then the (potential)
> > undefined value can propagate.
> > 
> > 
> > Any thought on this? Did I at least get your problem description
> > right?
> > 
> > Cheers, Johannes
> > 
> > 
> > 
> > P.S. Sorry if this breaks the thread and apologies that I had to
> > remove Bruce from the CC. It turns out replying to an email you did
> > not receive is complicated and getting on the LLVM-Dev list is
> > nowadays as well...
> > 
> > 
> > On 02/25, Ralf Jung via llvm-dev wrote:
> >> Hi Bruce,
> >>
> >> On 25.02.19 13:10, Bruce Hoult wrote:
> >>> LLVM has no idea whether the address computed by GEP is actually
> >>> within a legal object. The "inbounds" keyword is just you, the
> >>> programmer, promising LLVM that you know it's ok and that you
> >>> don't care what happens if it is actually out of bounds.
> >>>
> >>> https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds
> >>
> >> The LangRef says I get a poison value when I am violating the
> >> bounds. What I am asking is what exactly this means when the offset
> >> is 0 -- what *are* the conditions under which an offset-by-0 is
> >> "out of bounds" and hence yields poison?  Of course LLVM cannot
> >> always statically determine this, but it relies on (dynamically, on
> >> the "LLVM abstract machine") such things not happening, and I am
> >> asking what exactly these dynamic conditions are.
> >>
> >> Kind regards, Ralf
> >>
> >>>
> >>> On Sun, Feb 24, 2019 at 9:05 AM Ralf Jung via llvm-dev
> >>> <llvm... at lists.llvm.org> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> What exactly are the rules for `getelementptr inbounds` with
> >>>> offset 0?
> >>>>
> >>>> In Rust, we are relying on the fact that if we use, for example,
> >>>> `inttoptr` to turn `4` into a pointer, we can then do
> >>>> `getelementptr inbounds` with offset 0 on that without LLVM
> >>>> deducing that there actually is any dereferencable memory at
> >>>> location 4.  The argument is that we can think of there being a
> >>>> zero-sized allocation. Is that a reasonable assumption?  Can
> >>>> something like this be documented in the LangRef?
> >>>>
> >>>> Relatedly, how does the situation change if the pointer is not
> >>>> created "out of thin air" from a fixed integer, but is actually a
> >>>> dangling pointer obtained previously from `malloc` (or `alloca`
> >>>> or whatever)?  Is getelementptr inbounds` with offset 0 on such a
> >>>> pointer a NOP, or does it result in `poison`?  And if that makes
> >>>> a difference, how does that square with the fact that, e.g., the
> >>>> integer `0x4000` could well be inside such an allocation, but
> >>>> doing `getelementptr inbounds` with offset 0 on that would fall
> >>>> under the first question above?
> >>>>
> >>>> Kind regards, Ralf
> >>>> _______________________________________________ LLVM Developers
> >>>> mailing list llvm... at lists.llvm.org
> >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> _______________________________________________ LLVM Developers
> >> mailing list llvm... at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 

-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190315/3c567e7a/attachment.sig>


More information about the llvm-dev mailing list