[llvm-dev] Is it ok to allocate > half of address space?

Friedman, Eli via llvm-dev llvm-dev at lists.llvm.org
Wed Nov 8 14:06:19 PST 2017


On 11/8/2017 9:24 AM, Nuno Lopes via llvm-dev wrote:
> Hi,
>
> I was looking into the semantics of GEP inbounds and some BasicAA 
> rules and I'm wondering if it's valid in LLVM IR to allocate more than 
> half of the address space with a global variable or an alloca.
> If that's a scenario want to consider, then we have problems :)
>
> Consider this C code (32 bits):
> #include <string.h>
>
> char obj[0x80000008];
>
> char f() {
>   char *p = obj + 0x79999999;
>   char *q = obj + 0x80000000;
>   *q = 1;
>   memcpy(p, "abcd", 4);
>   return *q;
> }
>
>
> Clearly the stores alias, and the memcpy should override the value 
> written by "*q = 1".
>
> I dunno if this is legal in C or not, but the IR produced by clang 
> looks like (32 bits):
>
> @obj = common global [2147483656 x i8] zeroinitializer, align 1
>
> define signext i8 @f() {
>   store i8 1, i8* getelementptr inbounds (i8, i8* getelementptr 
> inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), 
> i32 -2147483648), align 1
>   call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds 
> ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 2040109465), 
> i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), 
> i32 4, i32 1, i1 false)
>   %1 = load i8, i8* getelementptr inbounds (i8, i8* getelementptr 
> inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), 
> i32 -2147483648), align 1
>   ret i8 %1
> }
>
> With -O2, the store to q gets forwarded, and so we get "ret i8 1".
> So, BasicAA concluded that p and q don't alias. The culprit is an 
> overflow in BasicAAResult::isGEPBaseAtNegativeOffset().
>
> So my question is do we care about this use case where a single 
> allocation can take more than half of the address space?
>

Accoding to LangRef, your IR currently has undefined behavior: the rules 
for "inbounds" GEPs say that indexes are treated as signed values.  And 
solving that would involve changing the way we represent GEPs in IR, so 
I think you can consider that out of scope.

Assuming we're not dealing with inbounds GEPs (e.g. you pass -fwrapv to 
clang), I don't see any particular reason to disallow allocations more 
than half the address-space.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project



More information about the llvm-dev mailing list