[LLVMdev] GEP vs IntToPtr/PtrToInt

Eli Friedman eli.friedman at gmail.com
Wed Apr 20 09:20:15 PDT 2011


On Wed, Apr 20, 2011 at 8:08 AM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:
> I have a question about when we should apply these pointer aliasing
> rules. Do the rules tell us when a load/store is safe?
> "Any memory access must be done through a pointer value associated
> with an address range of the memory access, otherwise the behavior is
> undefined."
>
> So this means the conversion discussed here is still safe in terms of
> memory safety, but its meaning after conversion could be weird. Am I
> correct?

Per http://llvm.org/docs/LangRef.html#pointeraliasing, it's undefined
behavior, so it isn't safe in any sense.  In practice, I can't think
of a common transformation that would cause a crash, but it's best not
to depend on that.

> Then it comes to my another question. The base-on relation has this rule:
> "A pointer value formed by an inttoptr is based on all pointer values
> that contribute (directly or indirectly) to the computation of the
> pointer's value."
>
> Suppose an int value 'i'  is computed by a lot of int variables that
> are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
> to a point p, how should I decide which pointer value the 'p' forms?
>
> If those p_j are ptrtoint to a i_j, and the computation for i is i =
> i_0 + i_1 + ... i_n, does it mean
>  we can take either p_j as a base pointer, and other int variables
> its offset, say we take p_2 as the base pointer, and the p from i
> points to
>       p_2 + (i_0 + i_1 + i_3 + .. i_n)
>  ?
>
> So in the transformation example, the result is different when we take
> %196 or %193 as a base pointer.
>
> For alias-analysis, we may say the p can point to a memory any of the
> p_j points to. But if we consider memory safety, should we say p is
> safe to access if p is not out-of-bound no matter which p_j is taken
> as a base pointer?

See above.

> Could anyone explain this rule more precisely? For
> example, how can we find "
> all pointer values that contribute (directly or indirectly)" ?

There isn't any straightforward way to calculate that set.  Another
way of stating the rule is that if changing the numerical value of the
address of some object might change the calculated value of the
operand of an inttoptr, it's one of the "pointer values that
contribute".  It's intentionally defined a bit loosely because there's
a lot of different ways for that to be the case.  You can extract
information about a pointer by a inttoptr, a load of part or all of
the address from memory, pointer comparisons, and possibly some other
ways I'm not thinking of.

> This would be helpful to understand
>  http://llvm.org/docs/GetElementPtr.html#ptrdiff
> http://llvm.org/docs/GetElementPtr.html#null
> which suggest that we can do some 'wild' pointer arithmetic by
> inttoptr and ptrtoint.
>
> For example, given a pointer p, can we safely do?
>   i = ptrtoint p;
>   j = i + null;
>   q = inttoptr j;
>   v = load q;
>
> Thanks a lot.

inttoptr(ptrtoint(x)) is just x; inttoptr(ptrtoint(x+10)) can be
safely translated to gep i8* x, 10.  Translating
inttoptr(ptrtoint(x+y)) to gep i8* x, y is not safe in general.

-Eli

> On Mon, Apr 4, 2011 at 9:34 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>
>> On Mon, Apr 4, 2011 at 7:10 AM, John Criswell <criswell at illinois.edu> wrote:
>> > On 4/4/2011 6:45 PM, Eli Friedman wrote:
>> >>
>> >> On Mon, Apr 4, 2011 at 5:02 PM, Arushi Aggarwal<arushi987 at gmail.com>
>> >>  wrote:
>> >>>
>> >>>> Hi,
>> >>>> Is it correct to convert,
>> >>>>   %196 = load i32* %195, align 8                  ;<i32>  [#uses=1]
>> >>>>   %197 = zext i32 %196 to i64                     ;<i64>  [#uses=1]
>> >>>>   %198 = ptrtoint i8* %193 to i64                 ;<i64>  [#uses=1]
>> >>>>   %199 = add i64 %198, %197                       ;<i64>  [#uses=1]
>> >>>>   %200 = inttoptr i64 %199 to i8*                 ;<i8*>  [#uses=1]
>> >>>> into
>> >>>> %200 = getelementptr %193, %196
>> >>>> Reducing the unnecessary casts of converting to integers and then back?
>> >>>> Thanks,
>> >>>> Arushi
>> >>>>
>> >> See http://llvm.org/docs/LangRef.html#pointeraliasing ; it's not
>> >> correct in general.  It is correct if %196 isn't dependent on the
>> >> address of any memory object, though.
>> >
>> > Can you clarify why the transform isn't correct?  Is it because in the
>> > original code, %200 is based on both the originally cast pointer (%193) and
>> > the indexed offset from it (%197) while the transformed code is only based
>> > on %193?
>>
>> Yes, exactly.
>>
>> -Eli
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> --
> Jianzhou
>




More information about the llvm-dev mailing list