[llvm-dev] getelementptr inbounds with offset 0

Ralf Jung via llvm-dev llvm-dev at lists.llvm.org
Wed Apr 10 08:14:27 PDT 2019


Hi,

>>> I see. Is there a quick answer to the questions why you need inbounds
>>> GEPs in that case? Can't you just use non-inbounds GEPs if you know you
>>> might not have a valid base ptr and "optimize" it to inbounds once that
>>> is proven?
>>
>> You mean on the Rust side?  We emit GEPi for field accesses and array indexing.
>>  We cannot always statically determine if this is happening for a ZST or not.
>> At the same time, given that no memory access ever happens for a ZST, allocating
>> a ZST (Box::new in Rust, think of it like new in C++) does not actually allocate
>> any memory, it just returns an integer (sufficiently aligned) cast to a pointer.
> 
> OK, but why not emit non-inbonuds GEPs instead? They do not come with
> the problems you have now, or maybe I misunderstand.

The problem is statically figuring out whether it should be inbounds or
non-inbounds.  When we have code like `&x[n]`, this might be an offset-by-0 in
an empty slice and hence fall into the scope of my question, or it might be a
"normal" array access where we definitely want inbounds.

>> Sure, UB is definitely *defined* in a runtime-value dependent way.  The problem
>> here is that it is not defined in a precise way -- something where one could
>> write an interpreter that tracks all the extra state that is needed (like
>> poison/undef and where allocations lie) and then says precisely under which
>> conditions we have UB and under which we do not.
>> What I am asking here for is the exact definition of GEPi if, *at run-time*, the
>> offset is 0, and the base pointer is (a) an integer, or (b) dangling.
> 
> That last part is given by the lang-ref (imo):
>   "If the inbounds keyword is present, the result value of the
>    getelementptr is a poison value if the base pointer is not an in
>    bounds address of an allocated object"
> 
> I read this as: If you have a GEPi, you get poison if the base pointer
> is not an allocated object. That is a dangling pointer (b) causes the
> GEPi to be poison and a pointer from integer (a) may, if the address
> denoted by the integer is not inside, or one past, an allocated object.
> Now any offset except 0 will add more possible ways to generate a poison
> value.

Thanks.  That makes sense from reading the docs (though I am not convinced that
it actually helps with optimizations to be this strict here).

For the (a) case, the question about "0-sized objects" remains, but it doesn't
seem like the answer could affect what LLVM does.

It would be really nice to have a reference interpreter for LLVM IR that can
explicitly check for all the UB.  Maybe, one day... ;)

Kind regards,
Ralf


More information about the llvm-dev mailing list