[cfe-dev] Parallelism TS implementation and feasibility of GPU execution policies

Eli Bendersky eliben at google.com
Mon Apr 6 08:41:20 PDT 2015


On Thu, Apr 2, 2015 at 5:58 AM, Andrew Corrigan <
andrew.corrigan at nrl.navy.mil> wrote:

> Newcomer here, I hope this isn’t off-topic, but this seemed to be the most
> appropriate place to ask:
>
> Are there plans to implement Parallelism-TS in libc++/Clang?  If so, what
> execution policies might be supported?
>
> Besides multi-threaded CPU execution policies, are there plans, or would
> it even be feasible to implement a GPU execution policy in libc++/Clang,
> which targets the NVPTX backend, using the Clang frontend only (i.e.,
> without NVCC)?
>
> This would be extremely useful, since even with the latest CUDA 7 release
> of NVCC, it remains slow, buggy, and consumes massive amounts of memory in
> comparison to Clang.  If I compile my Thrust-based code it takes just a
> minute or so, and consumes just a few gigabytes, using Clang, against
> Thrust’s TBB backend. If I compile that exact same code, only using
> Thrust's CUDA backend with NVCC, it consumes ~20 gigabytes of memory and it
> takes well over an hour to compile (on my 24-GB workstation, on my 16-GB
> laptop it never finishes).  Obviating the need for NVCC for compiling code
> targeting NVIDIA GPUs via a Parallelism TS implementation would be
> extremely useful.
>

I can't speak for Parallelism-TS, but for the past year or so we've been
steadily trickling more CUDA support into upstream Clang. Internally, we
use a Clang-based compiler for CUDA-->PTX (alongside nvcc), and as you
mention, one of its strengths vs. nvcc is compilation time & resource
consumption. For large template-metaprogramming code Clang's frontend is an
order of magnitude faster.

The pace of our upstreaming is picking up. Take a look at
http://reviews.llvm.org/D8463, for example; and feel free to help out with
reviews.


> Finally, are there plans, or would it even be feasible, to target
> OpenCL/SYCL/SPIR(-V) via Parallelism-TS?  I am aware of existing
> OpenCL-based parallel algorithms library but I am really hoping for a
> Parallelism TS execution policy for OpenCL devices, so that it is a
> single-source, fully-integrated approach that one can pass C++ function
> objects to directly, as opposed to being restricted to passing strings
> containing OpenCL C99 syntax, or having to pre-instantiatiate template
> functors with macro wrappers.


It is certainly *possible* to target something like SPIR(-V) from Clang for
CUDA - since it just generates LLVM IR now. Not sure if anyone is planning
it at this time though.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150406/55c8f4a4/attachment.html>


More information about the cfe-dev mailing list