[LLVMdev] GEP vs IntToPtr/PtrToInt

Wed Apr 20 13:35:10 PDT 2011

On Wed, Apr 20, 2011 at 1:30 PM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:
> On Wed, Apr 20, 2011 at 2:11 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>> On Wed, Apr 20, 2011 at 10:21 AM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:
>>> On Wed, Apr 20, 2011 at 12:20 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>> On Wed, Apr 20, 2011 at 8:08 AM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:
>>>>> I have a question about when we should apply these pointer aliasing
>>>>> rules. Do the rules tell us when a load/store is safe?
>>>>> "Any memory access must be done through a pointer value associated
>>>>> with an address range of the memory access, otherwise the behavior is
>>>>> undefined."
>>>>>
>>>>> So this means the conversion discussed here is still safe in terms of
>>>>> memory safety, but its meaning after conversion could be weird. Am I
>>>>> correct?
>>>>
>>>> Per http://llvm.org/docs/LangRef.html#pointeraliasing, it's undefined
>>>> behavior, so it isn't safe in any sense.  In practice, I can't think
>>>> of a common transformation that would cause a crash, but it's best not
>>>> to depend on that.
>>>
>>> My confusion could be what is considered to be undefined from the
>>> rules. It says a memory access is defined if
>>>  "Any memory access must be done through a pointer value associated
>>> with an address range of the memory access".
>>>
>>> Does this implicitly mean that the value of the pointer must be within
>>> the address range of the memory access it is associated with? It seems
>>> to be true to me from the rules about global variables, alloca and
>>> even external pointers.
>>>
>>> For example
>>>    %p = alloca i32;
>>>    %q = getelementptr %p, i32 42;
>>>    store i32 0, i32* %q;
>>>
>>> Is this a fine memory access (although I don't think it is)? Here, %q
>>> is based on %p, and %p is associated with the address range from
>>> alloca i32. But the range of the result from alloca is definitely
>>> smaller than 42. Since the LLVM IR does not state that load/store-ing
>>> out-of-bound address is undefined
>>>   http://llvm.org/docs/LangRef.html#i_load
>>>   http://llvm.org/docs/LangRef.html#i_store
>>> I looked into the alias-rule to find answers.
>>
>> That doesn't really have anything to do with aliasing, but it's
>> definitely undefined.  Don't know off the top of my head where that is
>> stated in LangRef.
>>
>>> Now, come back to the inttoptr and ptrtoint questions. When we
>>> consider a memory access via pointers from int is defined, do we mean
>>>  1) the value of the pointer happens to equal to an address within a
>>> range of an allocated object, or
>>>  2) the value of the pointer happens to be based on some allocated
>>> objects per these rules, but it is fine if it is out of their ranges
>>> (I don' think this is true, but the rules do not explicitly tell me if
>>> this is legal). Here, the intuitive meaning of based-on is like you
>>> explained in the bellow.
>>>
>>> But I still have some questions about the 'based-on' things. It seems
>>> to state an aliasing relation between pointers. Then in the case if a
>>> result inttoptr is based on some objects, why can we consider this to
>>> be a good memory access? Because it is very likely a pointer points
>>> some other allocated objects that we don't want them to be changed. So
>>> this comes to my question --- what property does a defined
>>> memory-access give use?
>>
>> A properly-defined memory access is fully within the bounds of some
>> defined object, and "based" (in the LangRef.html#pointeraliasing
>> sense) on that object.
>>
>>>>
>>>>> Then it comes to my another question. The base-on relation has this rule:
>>>>> "A pointer value formed by an inttoptr is based on all pointer values
>>>>> that contribute (directly or indirectly) to the computation of the
>>>>> pointer's value."
>>>>>
>>>>> Suppose an int value 'i'  is computed by a lot of int variables that
>>>>> are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
>>>>> to a point p, how should I decide which pointer value the 'p' forms?
>>>>>
>>>>> If those p_j are ptrtoint to a i_j, and the computation for i is i =
>>>>> i_0 + i_1 + ... i_n, does it mean
>>>>>  we can take either p_j as a base pointer, and other int variables
>>>>> its offset, say we take p_2 as the base pointer, and the p from i
>>>>> points to
>>>>>       p_2 + (i_0 + i_1 + i_3 + .. i_n)
>>>>>  ?
>>>>>
>>>>> So in the transformation example, the result is different when we take
>>>>> %196 or %193 as a base pointer.
>>>>>
>>>>> For alias-analysis, we may say the p can point to a memory any of the
>>>>> p_j points to. But if we consider memory safety, should we say p is
>>>>> safe to access if p is not out-of-bound no matter which p_j is taken
>>>>> as a base pointer?
>>>>
>>>> See above.
>>>>
>>>>> Could anyone explain this rule more precisely? For
>>>>> example, how can we find "
>>>>> all pointer values that contribute (directly or indirectly)" ?
>>>>
>>>> There isn't any straightforward way to calculate that set.  Another
>>>> way of stating the rule is that if changing the numerical value of the
>>>> address of some object might change the calculated value of the
>>>> operand of an inttoptr, it's one of the "pointer values that
>>>> contribute".  It's intentionally defined a bit loosely because there's
>>>> a lot of different ways for that to be the case.  You can extract
>>>> information about a pointer by a inttoptr, a load of part or all of
>>>> the address from memory, pointer comparisons, and possibly some other
>>>> ways I'm not thinking of.
>>>>
>>>>> This would be helpful to understand
>>>>>  http://llvm.org/docs/GetElementPtr.html#ptrdiff
>>>>> http://llvm.org/docs/GetElementPtr.html#null
>>>>> which suggest that we can do some 'wild' pointer arithmetic by
>>>>> inttoptr and ptrtoint.
>>>>>
>>>>> For example, given a pointer p, can we safely do?
>>>>>   i = ptrtoint p;
>>>>>   j = i + null;
>>>>>   q = inttoptr j;
>>>>>   v = load q;
>>>>>
>>>>> Thanks a lot.
>>>>
>>>> inttoptr(ptrtoint(x)) is just x; inttoptr(ptrtoint(x+10)) can be
>>>> safely translated to gep i8* x, 10.  Translating
>>>> inttoptr(ptrtoint(x+y)) to gep i8* x, y is not safe in general.
>>>
>>> While in http://llvm.org/docs/GetElementPtr.html#ptrdiff, the
>>> difference between two pointers computed from GEP has to be a
>>> variable, but not a constant, how could that work?
>>
>> In my example I was assuming "y" was some unknown value.  I'm not sure
>> what you're asking here.
>>
>>> Also, given p1 and p2 from GEP, if we do
>>>  i1 = ptrtoint p1;
>>>  i2 = ptrtoint p2;
>>>  i3 = i2 - i1;
>>>  i3' = f (i3);       // suppose f is an identical function that
>>> returns i3 directly.
>>>  i4 = i3' + i1;
>>>  p = inttoptr i4;
>>>  .. = load p;      // is this load defined?
>>>
>>>  http://llvm.org/docs/GetElementPtr.html#ptrdiff seems to say, we can
>>> access out-of-bound memory via GEP, but it is safe to do that from
>>> inttoptr or ptrtoint as long as the result points an allocated object.
>>> Is this the right way to understand it ?
>>
>> I assume this is supposed to be "we cannot access out-of-bounds memory via GEP".
>
> Yes.
>
>>
>> The load in the given code is well-defined, and equivalent to a load
>> directly from p2.  The issue with translating i3' + i1 into gep p1,
>> i3' is that you end up with a load from a pointer into p2 that is not
>> "based" on p2.
>
> Is it supposed to be "end up with a load from a pointer into p2 that
> is "based" on p2"? Otherwise this is not well-defined. Because
>
> " A properly-defined memory access is fully within the bounds of some
>  defined object, and "based" (in the LangRef.html#pointeraliasing
>  sense) on that object."
>
> I think p2 contributes to the computation of p, because changing the
> value at p2 affects the value at p. By the intuitive meaning of
> base-on you stated, p1 also contributes if we analyze this by a coarse
> alias-analysis.

Let me try restating.  The given code is legal.

If you tried to transform the given code to do "p = gep p2, i3'", it
would not be legal.

-Eli