[LLVMdev] GEP vs IntToPtr/PtrToInt

Jianzhou Zhao jianzhou at seas.upenn.edu
Wed Apr 20 10:44:12 PDT 2011

On Wed, Apr 20, 2011 at 12:59 PM, John Criswell <criswell at illinois.edu> wrote:
> On 4/20/11 10:08 AM, Jianzhou Zhao wrote:
>> I have a question about when we should apply these pointer aliasing
>> rules. Do the rules tell us when a load/store is safe?
>> "Any memory access must be done through a pointer value associated
>> with an address range of the memory access, otherwise the behavior is
>> undefined."
> I don't think the pointer aliasing rules indicate when a memory access is
> safe.  Rather, they set down rules for what the compiler can consider to be
> defined and undefined behavior.  It lays down the law for what optimizations
> are considered correct and which are not.

I see. The rules are the 'abstract' semantics used to check aliasing.
I looked into the section because LLVM IR does not say out-of-bound
load/store is  undefined. Is it because if or not such access is
defined depends on the semantics of the high-level language from which
the IR is compiled?

>> So this means the conversion discussed here is still safe in terms of
>> memory safety, but its meaning after conversion could be weird. Am I
>> correct?
> I am not sure what you mean.  However, if you're asking whether casting a
> pointer to an integer and then casting the integer back to a pointer is
> correct, I believe the answer is yes.  We certainly treat it that way in
> SAFECode although in the current implementation, it can weaken the safety
> guarantees.  Our points-to analysis, DSA, doesn't track pointers through
> integers, and so SAFECode uses more lenient checks on pointer values coming
> from inttoptr casts; DSA can't always guarantee that it knows everything
> about the memory objects feeding into it.

Yes. That is what I meant.

> That is, consequently, one of the reasons why we'd like to do Arushi's
> transformation.  It will make DSA less conservative and SAFECode more
> stringent.
>> Then it comes to my another question. The base-on relation has this rule:
>> "A pointer value formed by an inttoptr is based on all pointer values
>> that contribute (directly or indirectly) to the computation of the
>> pointer's value."
>> Suppose an int value 'i'  is computed by a lot of int variables that
>> are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
>> to a point p, how should I decide which pointer value the 'p' forms?
>> If those p_j are ptrtoint to a i_j, and the computation for i is i =
>> i_0 + i_1 + ... i_n, does it mean
>>   we can take either p_j as a base pointer, and other int variables
>> its offset, say we take p_2 as the base pointer, and the p from i
>> points to
>>        p_2 + (i_0 + i_1 + i_3 + .. i_n)
>>   ?
> So, in your example, if you do:
> i1 = ptrtoint p1;
> i2 = ptrtoint p2;
> ...
> in = ptrtoint pn;
> i = i1 + i2 ... + in;
> p = inttoptr i;
> ..., then p can point to any memory object p1, p2, ... pn.  The reasoning is
> that the integer add instruction obscures which integer is the base pointer
> and which is the index, so the aliasing rules conservatively assume that
> either operand is the base pointer.
>> So in the transformation example, the result is different when we take
>> %196 or %193 as a base pointer.
> Yes, which is why the transform that Arushi suggested is not legal unless
> you can prove that %196 can't be a pointer to a memory object.
>> For alias-analysis, we may say the p can point to a memory any of the
>> p_j points to. But if we consider memory safety, should we say p is
>> safe to access if p is not out-of-bound no matter which p_j is taken
>> as a base pointer?
> That is how I would interpret memory safety: p is safe if it is within the
> bounds of any of the p_j memory objects.
>>  Could anyone explain this rule more precisely? For
>> example, how can we find "
>> all pointer values that contribute (directly or indirectly)" ?
> I think this can be conservatively done using simple data-flow analysis.
>  The only tricky part is when a pointer travels through memory (i.e., it is
> stored into memory by a store instruction and loaded later by a load
> instruction).  An enhanced version of DSA which tracks pointers through
> integers could handle this.
>> This would be helpful to understand
>>   http://llvm.org/docs/GetElementPtr.html#ptrdiff
>> http://llvm.org/docs/GetElementPtr.html#null
>> which suggest that we can do some 'wild' pointer arithmetic by
>> inttoptr and ptrtoint.
>> For example, given a pointer p, can we safely do?
>>    i = ptrtoint p;
>>    j = i + null;
>>    q = inttoptr j;
>>    v = load q;
> That's a weird one (aside: you need to cast NULL to int first before using
> it in the add).  Since NULL doesn't point to a valid memory range, it may be
> that you can technically consider q to just point to p.  However, I'm not
> sure about that; maybe q is technically aliased with null and can point to
> some offset of NULL.
> However, in practice, even if the aliasing rules say that q can point to p
> or some offset of NULL, I would say that q points to just p since you know
> (for most implementations) that NULL is equivalent to zero.
> -- John T.
>> Thanks a lot.
>> On Mon, Apr 4, 2011 at 9:34 PM, Eli Friedman<eli.friedman at gmail.com>
>>  wrote:
>>> On Mon, Apr 4, 2011 at 7:10 AM, John Criswell<criswell at illinois.edu>
>>>  wrote:
>>>> On 4/4/2011 6:45 PM, Eli Friedman wrote:
>>>>> On Mon, Apr 4, 2011 at 5:02 PM, Arushi Aggarwal<arushi987 at gmail.com>
>>>>>  wrote:
>>>>>>> Hi,
>>>>>>> Is it correct to convert,
>>>>>>>   %196 = load i32* %195, align 8                  ;<i32>    [#uses=1]
>>>>>>>   %197 = zext i32 %196 to i64                     ;<i64>    [#uses=1]
>>>>>>>   %198 = ptrtoint i8* %193 to i64                 ;<i64>    [#uses=1]
>>>>>>>   %199 = add i64 %198, %197                       ;<i64>    [#uses=1]
>>>>>>>   %200 = inttoptr i64 %199 to i8*                 ;<i8*>    [#uses=1]
>>>>>>> into
>>>>>>> %200 = getelementptr %193, %196
>>>>>>> Reducing the unnecessary casts of converting to integers and then
>>>>>>> back?
>>>>>>> Thanks,
>>>>>>> Arushi
>>>>> See http://llvm.org/docs/LangRef.html#pointeraliasing ; it's not
>>>>> correct in general.  It is correct if %196 isn't dependent on the
>>>>> address of any memory object, though.
>>>> Can you clarify why the transform isn't correct?  Is it because in the
>>>> original code, %200 is based on both the originally cast pointer (%193)
>>>> and
>>>> the indexed offset from it (%197) while the transformed code is only
>>>> based
>>>> on %193?
>>> Yes, exactly.
>>> -Eli
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> --
>> Jianzhou


More information about the llvm-dev mailing list