[cfe-dev] Parallelism TS implementation and feasibility of GPU execution policies

Thu Apr 2 05:58:42 PDT 2015

Newcomer here, I hope this isn’t off-topic, but this seemed to be the most appropriate place to ask:

Are there plans to implement Parallelism-TS in libc++/Clang?  If so, what execution policies might be supported?

Besides multi-threaded CPU execution policies, are there plans, or would it even be feasible to implement a GPU execution policy in libc++/Clang, which targets the NVPTX backend, using the Clang frontend only (i.e., without NVCC)?  

This would be extremely useful, since even with the latest CUDA 7 release of NVCC, it remains slow, buggy, and consumes massive amounts of memory in comparison to Clang.  If I compile my Thrust-based code it takes just a minute or so, and consumes just a few gigabytes, using Clang, against Thrust’s TBB backend. If I compile that exact same code, only using Thrust's CUDA backend with NVCC, it consumes ~20 gigabytes of memory and it takes well over an hour to compile (on my 24-GB workstation, on my 16-GB laptop it never finishes).  Obviating the need for NVCC for compiling code targeting NVIDIA GPUs via a Parallelism TS implementation would be extremely useful.

Finally, are there plans, or would it even be feasible, to target OpenCL/SYCL/SPIR(-V) via Parallelism-TS?  I am aware of existing OpenCL-based parallel algorithms library but I am really hoping for a Parallelism TS execution policy for OpenCL devices, so that it is a single-source, fully-integrated approach that one can pass C++ function objects to directly, as opposed to being restricted to passing strings containing OpenCL C99 syntax, or having to pre-instantiatiate template functors with macro wrappers.

Andrew Corrigan