[cfe-dev] OpenCL/CUDA Interop with PTX Back-End
justin.holewinski at gmail.com
Tue Oct 4 13:23:59 PDT 2011
On Tue, Oct 4, 2011 at 2:28 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
> On Mon, Oct 03, 2011 at 04:45:28PM +0000, Justin Holewinski wrote:
> > I'm looking into extending the PTX Clang target to support code
> > from OpenCL/CUDA code, so I'm wondering about the current state of these
> > Clang implementations. As a test, I've implemented the AddrSpaceMap map
> > the PTX target in lib/Basic/Targets.cpp, but I'm not sure what other
> > are required. From what I can tell, basic functionality is working quite
> > well! I hope to commit a small patch soon to support the AddrSpaceMap
> > PTX.
> The AddrSpaceMap you added for PTX seems to be correct. The main
> other hook for OpenCL is the CGOpenCLRuntime class, but PTX should
> be able to use the default implementation.
> > I'm currently investigating the following issues/concerns:
> > 1. What is the plan for language-specific functions and other
> > such as __syncthreads/barrier, get_local_id/threadIdx, etc.? Is it up
> > the back-end to define compatible definitions of these, or is there a
> > to introduce generic LLVM intrinsics for these? Since OpenCL has
> > pre-defined functions that do not require header files, it may be
> awkward to
> > require OpenCL to include a back-end specific header file when
> > with Clang.
> For OpenCL, the implementation should provide definitions of
> the built-in functions described in section 6.11 of the OpenCL
> specification. For at least some of those functions, the definitions
> would be the same for any OpenCL implementation. (FWIW, I have
> developed a set of generic implementations of section 6.11 built-ins
> as part of an OpenCL implementation I have been working on, which I
> will be open sourcing soon.)
> For the rest (e.g. work-item functions), the implementation would
> need to be specific to the OpenCL implementation. For example, on
> a CPU, the exact implementation details of work-item functions would
> be highly dependent on how the implementation stores work-item IDs,
> so it would not be appropriate to use a generic intrinsic.
Right. I'm wondering what the implementation plan for this is with Clang.
Are you going to expose the OpenCL functions as LLVM intrinsics, and let
back-ends provide appropriate implementations? Right now, I'm defining
these functions in terms of PTX builtin functions, but this is obviously not
optimal because you need to include an additional header in OpenCL code.
> For CUDA, the NVIDIA header files provide appropriate declarations,
> but as far as I can tell, variables such as threadIdx are handled
> specially by nvcc, and functions such as __syncthreads are treated
> as builtins. Clang does not currently implement the special handling
> for these variables or functions.
Are there any plans to implement any of these?
> > 2. What is the status of the address space mapping? The CUDA frontend
> > does not seem to respect the mapping (I get address-space-less
> alloca's for
> > __shared__ arrays)
> Clang does not currently implement CUDA address spaces correctly.
> The major challenge is that OpenCL (and LLVM) address spaces are, in
> C/C++ language terms, type qualifiers, while CUDA address spaces are
> declaration specifiers (despite being referred to as "type qualifiers"
> in the documentation). This means that in CUDA, pointers lack correct
> type information. nvcc performs type inference to determine whether
> a pointer is to __shared__ or __device__ memory, which Clang does
> not currently implement.
Fair enough. OpenCL is my main interest at the moment. Ideally, I would
like to create a semi-functional workflow from OpenCL -> PTX with Clang
before the LLVM 3.0 branch.
> As an alternative to implementing type inference, we could require
> CUDA users to use a board with a unified address space, but this
> is sub-optimal, and still depends on some support from Clang.
> > and the OpenCL frontend seems to respect the address
> > mapping but does not emit complete array definitions for
> > __local arrays. Does the front-end currently not support __local
> > embedded in the code? It seems to work if the __local arrays are
> passed as
> > pointers to the kernel.
> Clang should support __local arrays, and this looks like a genuine
> bug in the IR generator. I will investigate.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev