I'm looking into extending the PTX Clang target to support code generation from OpenCL/CUDA code, so I'm wondering about the current state of these two Clang implementations. As a test, I've implemented the AddrSpaceMap map in the PTX target in lib/Basic/Targets.cpp, but I'm not sure what other hooks are required. From what I can tell, basic functionality is working quite well! I hope to commit a small patch soon to support the AddrSpaceMap for PTX.<div>
<br></div><div>I'm currently investigating the following issues/concerns:</div><div><ol><li>What is the plan for language-specific functions and other constructs, such as __syncthreads/barrier, get_local_id/threadIdx, etc.? Is it up to the back-end to define compatible definitions of these, or is there a plan to introduce generic LLVM intrinsics for these? Since OpenCL has pre-defined functions that do not require header files, it may be awkward to require OpenCL to include a back-end specific header file when compiling with Clang.</li>
<li>What is the status of the address space mapping? The CUDA frontend does not seem to respect the mapping (I get address-space-less alloca's for __shared__ arrays), and the OpenCL frontend seems to respect the address mapping but does not emit complete array definitions for locally-defined __local arrays. Does the front-end currently not support __local arrays embedded in the code? It seems to work if the __local arrays are passed as pointers to the kernel.</li>
</ol><div>As an example of the OpenCL issue:</div><div><br></div><div><div><font face="'courier new', monospace">jholewinski@aquila [tests]$ cat <a href="http://kernel1.cl">kernel1.cl</a> </font></div><div><font face="'courier new', monospace">__kernel</font></div>
<div><font face="'courier new', monospace">void foo(__global float* a) {</font></div><div><font face="'courier new', monospace"> __local float buffer[64];</font></div><div><font face="'courier new', monospace"> buffer[0] = a[0];</font></div>
<div><font face="'courier new', monospace"> // PTX-specific intrinsic</font></div><div><font face="'courier new', monospace"> __builtin_ptx_bar_sync(0);</font></div><div><font face="'courier new', monospace"> a[0] = buffer[0];</font></div>
<div><font face="'courier new', monospace">}</font></div><div><font face="'courier new', monospace"><br></font></div><div><font face="'courier new', monospace">jholewinski@aquila [tests]$ clang -ccc-host-triple ptx64 -S -emit-llvm <a href="http://kernel1.cl">kernel1.cl</a> -o kernel1.ll</font></div>
<div><font face="'courier new', monospace">jholewinski@aquila [tests]$ cat kernel1.ll</font></div><div><font face="'courier new', monospace">; ModuleID = '<a href="http://kernel1.cl">kernel1.cl</a>'</font></div>
<div><font face="'courier new', monospace">target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64"</font></div><div><font face="'courier new', monospace">target triple = "ptx64--"</font></div>
<div><font face="'courier new', monospace"><br></font></div><div><font face="'courier new', monospace">@foo.buffer.0 = internal addrspace(4) unnamed_addr global float 0.000000e+00</font></div><div><font face="'courier new', monospace"><br>
</font></div><div><font face="'courier new', monospace">define ptx_kernel void @foo(float* nocapture %a) nounwind {</font></div><div><font face="'courier new', monospace">entry:</font></div><div><font face="'courier new', monospace"> %0 = load float* %a, align 4, !tbaa !1</font></div>
<div><font face="'courier new', monospace"> store float %0, float addrspace(4)* @foo.buffer.0, align 4, !tbaa !1</font></div><div><font face="'courier new', monospace"> tail call void @llvm.ptx.bar.sync(i32 0)</font></div>
<div><font face="'courier new', monospace"> %1 = load float addrspace(4)* @foo.buffer.0, align 4, !tbaa !1</font></div><div><font face="'courier new', monospace"> store float %1, float* %a, align 4, !tbaa !1</font></div>
<div><font face="'courier new', monospace"> ret void</font></div><div><font face="'courier new', monospace">}</font></div><div><font face="'courier new', monospace"><br></font></div><div><font face="'courier new', monospace">declare void @llvm.ptx.bar.sync(i32) nounwind</font></div>
<div><font face="'courier new', monospace"><br></font></div><div><font face="'courier new', monospace">!opencl.kernels = !{!0}</font></div><div><font face="'courier new', monospace"><br></font></div>
<div><font face="'courier new', monospace">!0 = metadata !{void (float*)* @foo}</font></div><div><font face="'courier new', monospace">!1 = metadata !{metadata !"float", metadata !2}</font></div>
<div><font face="'courier new', monospace">!2 = metadata !{metadata !"omnipotent char", metadata !3}</font></div><div><font face="'courier new', monospace">!3 = metadata !{metadata !"Simple C/C++ TBAA", null}</font></div>
</div><div><br></div><div>The definition of the local array is present in the LLVM IR, but it does not provide an array size.</div><div><br></div>-- </div><div><br><div>Thanks,</div><div><br></div><div>Justin Holewinski</div>
<br>
</div>