[LLVMdev] GEP vs IntToPtr/PtrToInt
John Criswell
criswell at illinois.edu
Wed Apr 20 09:59:02 PDT 2011
On 4/20/11 10:08 AM, Jianzhou Zhao wrote:
> I have a question about when we should apply these pointer aliasing
> rules. Do the rules tell us when a load/store is safe?
> "Any memory access must be done through a pointer value associated
> with an address range of the memory access, otherwise the behavior is
> undefined."
I don't think the pointer aliasing rules indicate when a memory access
is safe. Rather, they set down rules for what the compiler can consider
to be defined and undefined behavior. It lays down the law for what
optimizations are considered correct and which are not.
> So this means the conversion discussed here is still safe in terms of
> memory safety, but its meaning after conversion could be weird. Am I
> correct?
I am not sure what you mean. However, if you're asking whether casting
a pointer to an integer and then casting the integer back to a pointer
is correct, I believe the answer is yes. We certainly treat it that way
in SAFECode although in the current implementation, it can weaken the
safety guarantees. Our points-to analysis, DSA, doesn't track pointers
through integers, and so SAFECode uses more lenient checks on pointer
values coming from inttoptr casts; DSA can't always guarantee that it
knows everything about the memory objects feeding into it.
That is, consequently, one of the reasons why we'd like to do Arushi's
transformation. It will make DSA less conservative and SAFECode more
stringent.
> Then it comes to my another question. The base-on relation has this rule:
> "A pointer value formed by an inttoptr is based on all pointer values
> that contribute (directly or indirectly) to the computation of the
> pointer's value."
>
> Suppose an int value 'i' is computed by a lot of int variables that
> are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
> to a point p, how should I decide which pointer value the 'p' forms?
>
> If those p_j are ptrtoint to a i_j, and the computation for i is i =
> i_0 + i_1 + ... i_n, does it mean
> we can take either p_j as a base pointer, and other int variables
> its offset, say we take p_2 as the base pointer, and the p from i
> points to
> p_2 + (i_0 + i_1 + i_3 + .. i_n)
> ?
So, in your example, if you do:
i1 = ptrtoint p1;
i2 = ptrtoint p2;
...
in = ptrtoint pn;
i = i1 + i2 ... + in;
p = inttoptr i;
..., then p can point to any memory object p1, p2, ... pn. The
reasoning is that the integer add instruction obscures which integer is
the base pointer and which is the index, so the aliasing rules
conservatively assume that either operand is the base pointer.
> So in the transformation example, the result is different when we take
> %196 or %193 as a base pointer.
Yes, which is why the transform that Arushi suggested is not legal
unless you can prove that %196 can't be a pointer to a memory object.
> For alias-analysis, we may say the p can point to a memory any of the
> p_j points to. But if we consider memory safety, should we say p is
> safe to access if p is not out-of-bound no matter which p_j is taken
> as a base pointer?
That is how I would interpret memory safety: p is safe if it is within
the bounds of any of the p_j memory objects.
> Could anyone explain this rule more precisely? For
> example, how can we find "
> all pointer values that contribute (directly or indirectly)" ?
I think this can be conservatively done using simple data-flow
analysis. The only tricky part is when a pointer travels through memory
(i.e., it is stored into memory by a store instruction and loaded later
by a load instruction). An enhanced version of DSA which tracks
pointers through integers could handle this.
> This would be helpful to understand
> http://llvm.org/docs/GetElementPtr.html#ptrdiff
> http://llvm.org/docs/GetElementPtr.html#null
> which suggest that we can do some 'wild' pointer arithmetic by
> inttoptr and ptrtoint.
>
> For example, given a pointer p, can we safely do?
> i = ptrtoint p;
> j = i + null;
> q = inttoptr j;
> v = load q;
>
That's a weird one (aside: you need to cast NULL to int first before
using it in the add). Since NULL doesn't point to a valid memory range,
it may be that you can technically consider q to just point to p.
However, I'm not sure about that; maybe q is technically aliased with
null and can point to some offset of NULL.
However, in practice, even if the aliasing rules say that q can point to
p or some offset of NULL, I would say that q points to just p since you
know (for most implementations) that NULL is equivalent to zero.
-- John T.
> Thanks a lot.
>
> On Mon, Apr 4, 2011 at 9:34 PM, Eli Friedman<eli.friedman at gmail.com> wrote:
>> On Mon, Apr 4, 2011 at 7:10 AM, John Criswell<criswell at illinois.edu> wrote:
>>> On 4/4/2011 6:45 PM, Eli Friedman wrote:
>>>> On Mon, Apr 4, 2011 at 5:02 PM, Arushi Aggarwal<arushi987 at gmail.com>
>>>> wrote:
>>>>>> Hi,
>>>>>> Is it correct to convert,
>>>>>> %196 = load i32* %195, align 8 ;<i32> [#uses=1]
>>>>>> %197 = zext i32 %196 to i64 ;<i64> [#uses=1]
>>>>>> %198 = ptrtoint i8* %193 to i64 ;<i64> [#uses=1]
>>>>>> %199 = add i64 %198, %197 ;<i64> [#uses=1]
>>>>>> %200 = inttoptr i64 %199 to i8* ;<i8*> [#uses=1]
>>>>>> into
>>>>>> %200 = getelementptr %193, %196
>>>>>> Reducing the unnecessary casts of converting to integers and then back?
>>>>>> Thanks,
>>>>>> Arushi
>>>>>>
>>>> See http://llvm.org/docs/LangRef.html#pointeraliasing ; it's not
>>>> correct in general. It is correct if %196 isn't dependent on the
>>>> address of any memory object, though.
>>> Can you clarify why the transform isn't correct? Is it because in the
>>> original code, %200 is based on both the originally cast pointer (%193) and
>>> the indexed offset from it (%197) while the transformed code is only based
>>> on %193?
>> Yes, exactly.
>>
>> -Eli
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> --
> Jianzhou
More information about the llvm-dev
mailing list