[cfe-dev] Parallelism TS implementation and feasibility of GPUexecution policies

Mon Apr 6 11:48:56 PDT 2015

Hi. 

Sorry to interrupt, but I understood there is way to emit llvm ir from cuda code?

Is there any documentation on that? 

I'm really interested

Thanks

-----Message d'origine-----
De : "Eli Bendersky" <eliben at google.com>
Envoyé : ‎06/‎04/‎2015 17:43
À : "Andrew Corrigan" <andrew.corrigan at nrl.navy.mil>
Cc : "clang-dev Developers" <cfe-dev at cs.uiuc.edu>
Objet : Re: [cfe-dev] Parallelism TS implementation and feasibility of GPUexecution policies

On Thu, Apr 2, 2015 at 5:58 AM, Andrew Corrigan <andrew.corrigan at nrl.navy.mil> wrote:

Newcomer here, I hope this isn’t off-topic, but this seemed to be the most appropriate place to ask:

Are there plans to implement Parallelism-TS in libc++/Clang?  If so, what execution policies might be supported?

Besides multi-threaded CPU execution policies, are there plans, or would it even be feasible to implement a GPU execution policy in libc++/Clang, which targets the NVPTX backend, using the Clang frontend only (i.e., without NVCC)?

This would be extremely useful, since even with the latest CUDA 7 release of NVCC, it remains slow, buggy, and consumes massive amounts of memory in comparison to Clang.  If I compile my Thrust-based code it takes just a minute or so, and consumes just a few gigabytes, using Clang, against Thrust’s TBB backend. If I compile that exact same code, only using Thrust's CUDA backend with NVCC, it consumes ~20 gigabytes of memory and it takes well over an hour to compile (on my 24-GB workstation, on my 16-GB laptop it never finishes).  Obviating the need for NVCC for compiling code targeting NVIDIA GPUs via a Parallelism TS implementation would be extremely useful.

I can't speak for Parallelism-TS, but for the past year or so we've been steadily trickling more CUDA support into upstream Clang. Internally, we use a Clang-based compiler for CUDA-->PTX (alongside nvcc), and as you mention, one of its strengths vs. nvcc is compilation time & resource consumption. For large template-metaprogramming code Clang's frontend is an order of magnitude faster.

The pace of our upstreaming is picking up. Take a look at http://reviews.llvm.org/D8463, for example; and feel free to help out with reviews.

Finally, are there plans, or would it even be feasible, to target OpenCL/SYCL/SPIR(-V) via Parallelism-TS?  I am aware of existing OpenCL-based parallel algorithms library but I am really hoping for a Parallelism TS execution policy for OpenCL devices, so that it is a single-source, fully-integrated approach that one can pass C++ function objects to directly, as opposed to being restricted to passing strings containing OpenCL C99 syntax, or having to pre-instantiatiate template functors with macro wrappers.

It is certainly *possible* to target something like SPIR(-V) from Clang for CUDA - since it just generates LLVM IR now. Not sure if anyone is planning it at this time though.

Eli

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150406/f69d55be/attachment.html>