[cfe-dev] Proposal: CUDA support; outline, initial patches

Sun Nov 21 01:06:05 PST 2010

Why not OpenCL as more people can actually run it and on far more devices?

On Sat, Nov 20, 2010 at 10:58 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
> Hi,
>
> This is intended to outline how we could add CUDA support to Clang,
> feedback appreciated.  I've also attached patches with initial
> progress.
>
> Architecture
> ------------
>
> The nvcc driver is somewhat idiosyncratic.  It first performs
> a source-to-source transformation on the CUDA code which splits
> the code into declarations targeted for the host and the device (a
> set of type qualifiers controls this process) to produce a pair of
> preprocessed source files.  The host file is compiled by the system
> compiler (e.g. gcc) and the device source file is compiled to PTX
> using a custom Open64-based compiler (nvopencc) and then compiled to
> the nvidia target using ptxas.  At a later stage the resultant object
> files are combined to produce a single executable.
>
> With a Clang-based driver, we can do better, by parsing the source
> file once to produce a single AST, and performing two separate
> CodeGen passes to produce 2 modules: one targeted to the host and
> the other to the device.  CodeGen would need to be taught to skip
> certain declarations depending on their type qualifiers.
>
> >From there, the process would be target specific, but for nvidia
> targets, we can use our PTX backend -> ptxas to produce a cubin file
> (actually an ELF format file).  Then parse the cubin and link it
> into the host module in the same way as nvcc (which has yet to be
> determined).
>
> Clang Requirements
> ------------------
>
> (references are to CUDA Programming Guide, version 3.0)
>
> - lex/parse/sema/ast/codegen support for kernel call expression (B.13)
> - lex/parse/sema/ast/codegen support for type qualifiers: (B.1 B.2)
>    __device__ __global__ __host__ __constant__ __shared__
>  - multi-pass/filtering codegen
> - lex/parse/sema/ast/codegen support for launch bounds (B.14)
> - support for built-in variable semantics (B.4.6)
> - nvcc workalike driver
> - others?
>
> LLVM Requirements
> -----------------
>
> - (nvidia only) PTX backend
> - (nvidia only) capability to parse cubin (object file library?)
> - others?
>
> Kernel calls
> ------------
>
> nvcc translates kernel calls into equivalent calls to the CUDA
> runtime API during the initial source-to-source transformation.
> I propose that we do the same thing by default (at the CodeGen level)
> while making this overridable for a specific target.
>
> Progress so far
> ---------------
>
> Kernel call AST node has been added and lex/parse/sema have been
> modified to produce the node.  Patches attached are for review only
> and not to be committed yet.  For one thing, they lack test cases.
> For another, I'm highly bothered by the hardcoded type names in sema's
> execution configuration handling code (patch 8), which stem from how
> execution configurations are specified in the CUDA Programming Guide
> (B.13).  Any suggestions for a better way to handle this would be
> very welcome.
>
> Next steps
> ----------
>
> I intend to continue building out the Clang requirements.  Next steps
> are codegen support for kernel call and parse/sema support for type
> qualifiers.
>
> Thanks,
> --
> Peter
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>