[LLVMdev] GEP vs IntToPtr/PtrToInt

Wed Apr 20 11:11:11 PDT 2011

On Wed, Apr 20, 2011 at 10:21 AM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:
> On Wed, Apr 20, 2011 at 12:20 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>> On Wed, Apr 20, 2011 at 8:08 AM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:
>>> I have a question about when we should apply these pointer aliasing
>>> rules. Do the rules tell us when a load/store is safe?
>>> "Any memory access must be done through a pointer value associated
>>> with an address range of the memory access, otherwise the behavior is
>>> undefined."
>>>
>>> So this means the conversion discussed here is still safe in terms of
>>> memory safety, but its meaning after conversion could be weird. Am I
>>> correct?
>>
>> Per http://llvm.org/docs/LangRef.html#pointeraliasing, it's undefined
>> behavior, so it isn't safe in any sense.  In practice, I can't think
>> of a common transformation that would cause a crash, but it's best not
>> to depend on that.
>
> My confusion could be what is considered to be undefined from the
> rules. It says a memory access is defined if
>  "Any memory access must be done through a pointer value associated
> with an address range of the memory access".
>
> Does this implicitly mean that the value of the pointer must be within
> the address range of the memory access it is associated with? It seems
> to be true to me from the rules about global variables, alloca and
> even external pointers.
>
> For example
>    %p = alloca i32;
>    %q = getelementptr %p, i32 42;
>    store i32 0, i32* %q;
>
> Is this a fine memory access (although I don't think it is)? Here, %q
> is based on %p, and %p is associated with the address range from
> alloca i32. But the range of the result from alloca is definitely
> smaller than 42. Since the LLVM IR does not state that load/store-ing
> out-of-bound address is undefined
>   http://llvm.org/docs/LangRef.html#i_load
>   http://llvm.org/docs/LangRef.html#i_store
> I looked into the alias-rule to find answers.

That doesn't really have anything to do with aliasing, but it's
definitely undefined.  Don't know off the top of my head where that is
stated in LangRef.

> Now, come back to the inttoptr and ptrtoint questions. When we
> consider a memory access via pointers from int is defined, do we mean
>  1) the value of the pointer happens to equal to an address within a
> range of an allocated object, or
>  2) the value of the pointer happens to be based on some allocated
> objects per these rules, but it is fine if it is out of their ranges
> (I don' think this is true, but the rules do not explicitly tell me if
> this is legal). Here, the intuitive meaning of based-on is like you
> explained in the bellow.
>
> But I still have some questions about the 'based-on' things. It seems
> to state an aliasing relation between pointers. Then in the case if a
> result inttoptr is based on some objects, why can we consider this to
> be a good memory access? Because it is very likely a pointer points
> some other allocated objects that we don't want them to be changed. So
> this comes to my question --- what property does a defined
> memory-access give use?

A properly-defined memory access is fully within the bounds of some
defined object, and "based" (in the LangRef.html#pointeraliasing
sense) on that object.

>>
>>> Then it comes to my another question. The base-on relation has this rule:
>>> "A pointer value formed by an inttoptr is based on all pointer values
>>> that contribute (directly or indirectly) to the computation of the
>>> pointer's value."
>>>
>>> Suppose an int value 'i'  is computed by a lot of int variables that
>>> are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
>>> to a point p, how should I decide which pointer value the 'p' forms?
>>>
>>> If those p_j are ptrtoint to a i_j, and the computation for i is i =
>>> i_0 + i_1 + ... i_n, does it mean
>>>  we can take either p_j as a base pointer, and other int variables
>>> its offset, say we take p_2 as the base pointer, and the p from i
>>> points to
>>>       p_2 + (i_0 + i_1 + i_3 + .. i_n)
>>>  ?
>>>
>>> So in the transformation example, the result is different when we take
>>> %196 or %193 as a base pointer.
>>>
>>> For alias-analysis, we may say the p can point to a memory any of the
>>> p_j points to. But if we consider memory safety, should we say p is
>>> safe to access if p is not out-of-bound no matter which p_j is taken
>>> as a base pointer?
>>
>> See above.
>>
>>> Could anyone explain this rule more precisely? For
>>> example, how can we find "
>>> all pointer values that contribute (directly or indirectly)" ?
>>
>> There isn't any straightforward way to calculate that set.  Another
>> way of stating the rule is that if changing the numerical value of the
>> address of some object might change the calculated value of the
>> operand of an inttoptr, it's one of the "pointer values that
>> contribute".  It's intentionally defined a bit loosely because there's
>> a lot of different ways for that to be the case.  You can extract
>> information about a pointer by a inttoptr, a load of part or all of
>> the address from memory, pointer comparisons, and possibly some other
>> ways I'm not thinking of.
>>
>>> This would be helpful to understand
>>>  http://llvm.org/docs/GetElementPtr.html#ptrdiff
>>> http://llvm.org/docs/GetElementPtr.html#null
>>> which suggest that we can do some 'wild' pointer arithmetic by
>>> inttoptr and ptrtoint.
>>>
>>> For example, given a pointer p, can we safely do?
>>>   i = ptrtoint p;
>>>   j = i + null;
>>>   q = inttoptr j;
>>>   v = load q;
>>>
>>> Thanks a lot.
>>
>> inttoptr(ptrtoint(x)) is just x; inttoptr(ptrtoint(x+10)) can be
>> safely translated to gep i8* x, 10.  Translating
>> inttoptr(ptrtoint(x+y)) to gep i8* x, y is not safe in general.
>
> While in http://llvm.org/docs/GetElementPtr.html#ptrdiff, the
> difference between two pointers computed from GEP has to be a
> variable, but not a constant, how could that work?

In my example I was assuming "y" was some unknown value.  I'm not sure
what you're asking here.

> Also, given p1 and p2 from GEP, if we do
>  i1 = ptrtoint p1;
>  i2 = ptrtoint p2;
>  i3 = i2 - i1;
>  i3' = f (i3);       // suppose f is an identical function that
> returns i3 directly.
>  i4 = i3' + i1;
>  p = inttoptr i4;
>  .. = load p;      // is this load defined?
>
>  http://llvm.org/docs/GetElementPtr.html#ptrdiff seems to say, we can
> access out-of-bound memory via GEP, but it is safe to do that from
> inttoptr or ptrtoint as long as the result points an allocated object.
> Is this the right way to understand it ?

I assume this is supposed to be "we cannot access out-of-bounds memory via GEP".

The load in the given code is well-defined, and equivalent to a load
directly from p2.  The issue with translating i3' + i1 into gep p1,
i3' is that you end up with a load from a pointer into p2 that is not
"based" on p2.

-Eli