[LLVMdev] [Patch][RFC] Change R600 data layout

Tue Dec 31 15:04:53 PST 2013

Mostly n64 is there for historical reasons as AMDGPU backend originally derived from the AMDIL backend. The AMDIL backend did support 64bit native types on some hardware chips of the r6XX derived architectures, so for simplicity software efficiently emulated them on the ones it did not.

Also GCN does support 64bit on many operations, so I wouldn't remove it as a native type there.  

Basically what you want to for clang to emit better code is to use per pointer address spaces. So in the address spaces where you want 32bit computation, you specify it in the data layout, and in the address spaces where you need 64bit, you do the same. LLVM produces the right code in these situations.

> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On Behalf Of Jon Pry
> Sent: Tuesday, December 31, 2013 1:56 PM
> To: llvmdev at cs.uiuc.edu
> Subject: [LLVMdev] [Patch][RFC] Change R600 data layout
> 
> Hi,
> 
>      I've prepared patches for both LLVM and Clang to change the datalayout
> for R600. This may seem like a bold move, but I think it is warranted.  R600/SI
> is a strange architecture in that it uses 64bit pointers but does not support 64
> bit arithmetic except for load/store operations that roughly map onto
> getelementptr.
> 
>     The current datalayout for r600 includes n32:64, which is odd because r600
> cannot actually do any 64bit arith natively. This causes particular problems in
> the optimizer with the following kernel:
> 
> __kernel void if_eq(__global unsigned long* out, unsigned  arg0) {
>     int i=0;
>     for(i = 0; i < arg0; i++){
>             out[i] =  i;
>        }
> }
> 
> Clang decides that instead of adding a sext i32 %i to i64 before getelementptr,
> it would be best to just go ahead and promote the variable i to i64. Which
> would be all good if i64 was actually a native type.
> 
> By changing the native types to n32 *only*. clang emits better code for r600,
> such as
> 
> ; Function Attrs: nounwind
> define void @if_eq(i64 addrspace(1)* nocapture %out, i32 %arg0) #0 {
> entry:
>   %cmp4 = icmp eq i32 %arg0, 0
>   br i1 %cmp4, label %for.end, label %for.body
> 
> for.body:                                         ; preds = %for.body, %entry
>   %i.05 = phi i32 [ %inc, %for.body ], [ 0, %entry ]
>   %conv = sext i32 %i.05 to i64
>   %arrayidx = getelementptr inbounds i64 addrspace(1)* %out, i64 %conv
>   store i64 %conv, i64 addrspace(1)* %arrayidx, align 8, !tbaa !2
>   %inc = add nsw i32 %i.05, 1
>   %exitcond = icmp eq i32 %inc, %arg0
>   br i1 %exitcond, label %for.end, label %for.body
> 
> for.end:                                          ; preds = %for.body, %entry
>   ret void
> }
> 
> Another upside to this is that i64 addition on r600 can even be enabled as it
> the lowering code need only be called for arithmetic and not pointers, for
> which it actually works. In the future, r600 will require more patches wrt this
> issue as the old IR generated by clang was perfectly valid but crashes llc.
> 
> For now, I think this patch is a good solution because it makes better code
> and allows me to compile programs much longer than my previous 3 line
> record :)  Not so much a bandaid as a something else that needed to be done.
> 
> 
> Regards,
> 
> Jon Pry
> jonpry at gmail.com