[llvm-dev] Is it ok to allocate > half of address space?
Nuno Lopes via llvm-dev
llvm-dev at lists.llvm.org
Wed Nov 8 15:18:12 PST 2017
>On 11/8/2017 9:24 AM, Nuno Lopes via llvm-dev wrote:
>> Hi,
>>
>> I was looking into the semantics of GEP inbounds and some BasicAA rules
>> and I'm wondering if it's valid in LLVM IR to allocate more than half of
>> the address space with a global variable or an alloca.
>> If that's a scenario want to consider, then we have problems :)
>>
>> Consider this C code (32 bits):
>> #include <string.h>
>>
>> char obj[0x80000008];
>>
>> char f() {
>> char *p = obj + 0x79999999;
>> char *q = obj + 0x80000000;
>> *q = 1;
>> memcpy(p, "abcd", 4);
>> return *q;
>> }
>>
>>
>> Clearly the stores alias, and the memcpy should override the value
>> written by "*q = 1".
>>
>> I dunno if this is legal in C or not, but the IR produced by clang looks
>> like (32 bits):
>>
>> @obj = common global [2147483656 x i8] zeroinitializer, align 1
>>
>> define signext i8 @f() {
>> store i8 1, i8* getelementptr inbounds (i8, i8* getelementptr inbounds
>> ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0),
>> i32 -2147483648), align 1
>> call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds
>> ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 2040109465), i8*
>> getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i32 4,
>> i32 1, i1 false)
>> %1 = load i8, i8* getelementptr inbounds (i8, i8* getelementptr
>> inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0),
>> i32 -2147483648), align 1
>> ret i8 %1
>> }
>>
>> With -O2, the store to q gets forwarded, and so we get "ret i8 1".
>> So, BasicAA concluded that p and q don't alias. The culprit is an
>> overflow in BasicAAResult::isGEPBaseAtNegativeOffset().
>>
>> So my question is do we care about this use case where a single
>> allocation can take more than half of the address space?
>>
>
> Accoding to LangRef, your IR currently has undefined behavior: the rules
> for "inbounds" GEPs say that indexes are treated as signed values. And
> solving that would involve changing the way we represent GEPs in IR, so I
> think you can consider that out of scope.
Sorry, that was a typo. The test case was supposed to not have inbounds (it
should work without as well).
The current definition of GEP inbounds is complicated, though.. It disallows
the following:
%a = gep %p, 0x88888888
%b = gep inbounds %a, 1
If %a is within bounds, the "gep inbounds" gives a signed overflow even
though it's just a +1 (since 0x88888888 + 1 overflows).
So GEP inbounds disables large objects outright.
BTW I've always wondered why EmitGEPOffset
(http://llvm.org/doxygen/Local_8h_source.html#l00247) doesn't use 'add nsw'
if the semantics of GEP inbounds allows that (if my reading of LangRef is
correct).
> Assuming we're not dealing with inbounds GEPs (e.g. you pass -fwrapv to
> clang), I don't see any particular reason to disallow allocations more
> than half the address-space.
Ok, I can file bug reports for the cases I'm seeing. I can verify
correctness of fixes as well. But only starting in a week from now; I'm
quite busy at the moment.
Nuno
More information about the llvm-dev
mailing list