[cfe-dev] OpenCL/CUDA Interop with PTX Back-End

Mon Oct 3 09:45:28 PDT 2011

I'm looking into extending the PTX Clang target to support code generation
from OpenCL/CUDA code, so I'm wondering about the current state of these two
Clang implementations.  As a test, I've implemented the AddrSpaceMap map in
the PTX target in lib/Basic/Targets.cpp, but I'm not sure what other hooks
are required.  From what I can tell, basic functionality is working quite
well!  I hope to commit a small patch soon to support the AddrSpaceMap for
PTX.

I'm currently investigating the following issues/concerns:

   1. What is the plan for language-specific functions and other constructs,
   such as __syncthreads/barrier, get_local_id/threadIdx, etc.?  Is it up to
   the back-end to define compatible definitions of these, or is there a plan
   to introduce generic LLVM intrinsics for these?  Since OpenCL has
   pre-defined functions that do not require header files, it may be awkward to
   require OpenCL to include a back-end specific header file when compiling
   with Clang.
   2. What is the status of the address space mapping?  The CUDA frontend
   does not seem to respect the mapping (I get address-space-less alloca's for
   __shared__ arrays), and the OpenCL frontend seems to respect the address
   mapping but does not emit complete array definitions for locally-defined
   __local arrays.  Does the front-end currently not support __local arrays
   embedded in the code?  It seems to work if the __local arrays are passed as
   pointers to the kernel.

As an example of the OpenCL issue:

jholewinski at aquila [tests]$ cat kernel1.cl
__kernel
void foo(__global float* a) {
  __local float buffer[64];
  buffer[0] = a[0];
  // PTX-specific intrinsic
  __builtin_ptx_bar_sync(0);
  a[0] = buffer[0];
}

jholewinski at aquila [tests]$ clang -ccc-host-triple ptx64 -S -emit-llvm
kernel1.cl -o kernel1.ll
jholewinski at aquila [tests]$ cat kernel1.ll
; ModuleID = 'kernel1.cl'
target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64"
target triple = "ptx64--"

@foo.buffer.0 = internal addrspace(4) unnamed_addr global float 0.000000e+00

define ptx_kernel void @foo(float* nocapture %a) nounwind {
entry:
  %0 = load float* %a, align 4, !tbaa !1
  store float %0, float addrspace(4)* @foo.buffer.0, align 4, !tbaa !1
  tail call void @llvm.ptx.bar.sync(i32 0)
  %1 = load float addrspace(4)* @foo.buffer.0, align 4, !tbaa !1
  store float %1, float* %a, align 4, !tbaa !1
  ret void
}

declare void @llvm.ptx.bar.sync(i32) nounwind

!opencl.kernels = !{!0}

!0 = metadata !{void (float*)* @foo}
!1 = metadata !{metadata !"float", metadata !2}
!2 = metadata !{metadata !"omnipotent char", metadata !3}
!3 = metadata !{metadata !"Simple C/C++ TBAA", null}

The definition of the local array is present in the LLVM IR, but it does not
provide an array size.

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20111003/61e62978/attachment.html>