[LLVMdev] GEP vs IntToPtr/PtrToInt

Wed Apr 20 13:30:03 PDT 2011

On Wed, Apr 20, 2011 at 2:11 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> On Wed, Apr 20, 2011 at 10:21 AM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:
>> On Wed, Apr 20, 2011 at 12:20 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>> On Wed, Apr 20, 2011 at 8:08 AM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:
>>>> I have a question about when we should apply these pointer aliasing
>>>> rules. Do the rules tell us when a load/store is safe?
>>>> "Any memory access must be done through a pointer value associated
>>>> with an address range of the memory access, otherwise the behavior is
>>>> undefined."
>>>>
>>>> So this means the conversion discussed here is still safe in terms of
>>>> memory safety, but its meaning after conversion could be weird. Am I
>>>> correct?
>>>
>>> Per http://llvm.org/docs/LangRef.html#pointeraliasing, it's undefined
>>> behavior, so it isn't safe in any sense.  In practice, I can't think
>>> of a common transformation that would cause a crash, but it's best not
>>> to depend on that.
>>
>> My confusion could be what is considered to be undefined from the
>> rules. It says a memory access is defined if
>>  "Any memory access must be done through a pointer value associated
>> with an address range of the memory access".
>>
>> Does this implicitly mean that the value of the pointer must be within
>> the address range of the memory access it is associated with? It seems
>> to be true to me from the rules about global variables, alloca and
>> even external pointers.
>>
>> For example
>>    %p = alloca i32;
>>    %q = getelementptr %p, i32 42;
>>    store i32 0, i32* %q;
>>
>> Is this a fine memory access (although I don't think it is)? Here, %q
>> is based on %p, and %p is associated with the address range from
>> alloca i32. But the range of the result from alloca is definitely
>> smaller than 42. Since the LLVM IR does not state that load/store-ing
>> out-of-bound address is undefined
>>   http://llvm.org/docs/LangRef.html#i_load
>>   http://llvm.org/docs/LangRef.html#i_store
>> I looked into the alias-rule to find answers.
>
> That doesn't really have anything to do with aliasing, but it's
> definitely undefined.  Don't know off the top of my head where that is
> stated in LangRef.
>
>> Now, come back to the inttoptr and ptrtoint questions. When we
>> consider a memory access via pointers from int is defined, do we mean
>>  1) the value of the pointer happens to equal to an address within a
>> range of an allocated object, or
>>  2) the value of the pointer happens to be based on some allocated
>> objects per these rules, but it is fine if it is out of their ranges
>> (I don' think this is true, but the rules do not explicitly tell me if
>> this is legal). Here, the intuitive meaning of based-on is like you
>> explained in the bellow.
>>
>> But I still have some questions about the 'based-on' things. It seems
>> to state an aliasing relation between pointers. Then in the case if a
>> result inttoptr is based on some objects, why can we consider this to
>> be a good memory access? Because it is very likely a pointer points
>> some other allocated objects that we don't want them to be changed. So
>> this comes to my question --- what property does a defined
>> memory-access give use?
>
> A properly-defined memory access is fully within the bounds of some
> defined object, and "based" (in the LangRef.html#pointeraliasing
> sense) on that object.
>
>>>
>>>> Then it comes to my another question. The base-on relation has this rule:
>>>> "A pointer value formed by an inttoptr is based on all pointer values
>>>> that contribute (directly or indirectly) to the computation of the
>>>> pointer's value."
>>>>
>>>> Suppose an int value 'i'  is computed by a lot of int variables that
>>>> are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
>>>> to a point p, how should I decide which pointer value the 'p' forms?
>>>>
>>>> If those p_j are ptrtoint to a i_j, and the computation for i is i =
>>>> i_0 + i_1 + ... i_n, does it mean
>>>>  we can take either p_j as a base pointer, and other int variables
>>>> its offset, say we take p_2 as the base pointer, and the p from i
>>>> points to
>>>>       p_2 + (i_0 + i_1 + i_3 + .. i_n)
>>>>  ?
>>>>
>>>> So in the transformation example, the result is different when we take
>>>> %196 or %193 as a base pointer.
>>>>
>>>> For alias-analysis, we may say the p can point to a memory any of the
>>>> p_j points to. But if we consider memory safety, should we say p is
>>>> safe to access if p is not out-of-bound no matter which p_j is taken
>>>> as a base pointer?
>>>
>>> See above.
>>>
>>>> Could anyone explain this rule more precisely? For
>>>> example, how can we find "
>>>> all pointer values that contribute (directly or indirectly)" ?
>>>
>>> There isn't any straightforward way to calculate that set.  Another
>>> way of stating the rule is that if changing the numerical value of the
>>> address of some object might change the calculated value of the
>>> operand of an inttoptr, it's one of the "pointer values that
>>> contribute".  It's intentionally defined a bit loosely because there's
>>> a lot of different ways for that to be the case.  You can extract
>>> information about a pointer by a inttoptr, a load of part or all of
>>> the address from memory, pointer comparisons, and possibly some other
>>> ways I'm not thinking of.
>>>
>>>> This would be helpful to understand
>>>>  http://llvm.org/docs/GetElementPtr.html#ptrdiff
>>>> http://llvm.org/docs/GetElementPtr.html#null
>>>> which suggest that we can do some 'wild' pointer arithmetic by
>>>> inttoptr and ptrtoint.
>>>>
>>>> For example, given a pointer p, can we safely do?
>>>>   i = ptrtoint p;
>>>>   j = i + null;
>>>>   q = inttoptr j;
>>>>   v = load q;
>>>>
>>>> Thanks a lot.
>>>
>>> inttoptr(ptrtoint(x)) is just x; inttoptr(ptrtoint(x+10)) can be
>>> safely translated to gep i8* x, 10.  Translating
>>> inttoptr(ptrtoint(x+y)) to gep i8* x, y is not safe in general.
>>
>> While in http://llvm.org/docs/GetElementPtr.html#ptrdiff, the
>> difference between two pointers computed from GEP has to be a
>> variable, but not a constant, how could that work?
>
> In my example I was assuming "y" was some unknown value.  I'm not sure
> what you're asking here.
>
>> Also, given p1 and p2 from GEP, if we do
>>  i1 = ptrtoint p1;
>>  i2 = ptrtoint p2;
>>  i3 = i2 - i1;
>>  i3' = f (i3);       // suppose f is an identical function that
>> returns i3 directly.
>>  i4 = i3' + i1;
>>  p = inttoptr i4;
>>  .. = load p;      // is this load defined?
>>
>>  http://llvm.org/docs/GetElementPtr.html#ptrdiff seems to say, we can
>> access out-of-bound memory via GEP, but it is safe to do that from
>> inttoptr or ptrtoint as long as the result points an allocated object.
>> Is this the right way to understand it ?
>
> I assume this is supposed to be "we cannot access out-of-bounds memory via GEP".

Yes.

>
> The load in the given code is well-defined, and equivalent to a load
> directly from p2.  The issue with translating i3' + i1 into gep p1,
> i3' is that you end up with a load from a pointer into p2 that is not
> "based" on p2.

Is it supposed to be "end up with a load from a pointer into p2 that
is "based" on p2"? Otherwise this is not well-defined. Because

" A properly-defined memory access is fully within the bounds of some
 defined object, and "based" (in the LangRef.html#pointeraliasing
 sense) on that object."

I think p2 contributes to the computation of p, because changing the
value at p2 affects the value at p. By the intuitive meaning of
base-on you stated, p1 also contributes if we analyze this by a coarse
alias-analysis.

>
> -Eli
>

-- 
Jianzhou