[llvm-dev] Is it ok to allocate > half of address space?

Wed Nov 8 15:18:12 PST 2017

>On 11/8/2017 9:24 AM, Nuno Lopes via llvm-dev wrote:
>> Hi,
>>
>> I was looking into the semantics of GEP inbounds and some BasicAA rules 
>> and I'm wondering if it's valid in LLVM IR to allocate more than half of 
>> the address space with a global variable or an alloca.
>> If that's a scenario want to consider, then we have problems :)
>>
>> Consider this C code (32 bits):
>> #include <string.h>
>>
>> char obj[0x80000008];
>>
>> char f() {
>>   char *p = obj + 0x79999999;
>>   char *q = obj + 0x80000000;
>>   *q = 1;
>>   memcpy(p, "abcd", 4);
>>   return *q;
>> }
>>
>>
>> Clearly the stores alias, and the memcpy should override the value 
>> written by "*q = 1".
>>
>> I dunno if this is legal in C or not, but the IR produced by clang looks 
>> like (32 bits):
>>
>> @obj = common global [2147483656 x i8] zeroinitializer, align 1
>>
>> define signext i8 @f() {
>>   store i8 1, i8* getelementptr inbounds (i8, i8* getelementptr inbounds 
>> ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), 
>> i32 -2147483648), align 1
>>   call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds 
>> ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 2040109465), i8* 
>> getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i32 4, 
>> i32 1, i1 false)
>>   %1 = load i8, i8* getelementptr inbounds (i8, i8* getelementptr 
>> inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), 
>> i32 -2147483648), align 1
>>   ret i8 %1
>> }
>>
>> With -O2, the store to q gets forwarded, and so we get "ret i8 1".
>> So, BasicAA concluded that p and q don't alias. The culprit is an 
>> overflow in BasicAAResult::isGEPBaseAtNegativeOffset().
>>
>> So my question is do we care about this use case where a single 
>> allocation can take more than half of the address space?
>>
>
> Accoding to LangRef, your IR currently has undefined behavior: the rules 
> for "inbounds" GEPs say that indexes are treated as signed values.  And 
> solving that would involve changing the way we represent GEPs in IR, so I 
> think you can consider that out of scope.

Sorry, that was a typo. The test case was supposed to not have inbounds (it 
should work without as well).
The current definition of GEP inbounds is complicated, though.. It disallows 
the following:
%a = gep %p, 0x88888888
%b = gep inbounds %a, 1

If %a is within bounds, the "gep inbounds" gives a signed overflow even 
though it's just a +1  (since 0x88888888 + 1 overflows).
So GEP inbounds disables large objects outright.

BTW I've always wondered why EmitGEPOffset 
(http://llvm.org/doxygen/Local_8h_source.html#l00247) doesn't use 'add nsw' 
if the semantics of GEP inbounds allows that (if my reading of LangRef is 
correct).

> Assuming we're not dealing with inbounds GEPs (e.g. you pass -fwrapv to 
> clang), I don't see any particular reason to disallow allocations more 
> than half the address-space.

Ok, I can file bug reports for the cases I'm seeing.  I can verify 
correctness of fixes as well.  But only starting in a week from now; I'm 
quite busy at the moment.

Nuno