<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Apr 2, 2015 at 5:58 AM, Andrew Corrigan <span dir="ltr"><<a href="mailto:andrew.corrigan@nrl.navy.mil" target="_blank">andrew.corrigan@nrl.navy.mil</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Newcomer here, I hope this isn’t off-topic, but this seemed to be the most appropriate place to ask:<br>

<br>

Are there plans to implement Parallelism-TS in libc++/Clang?  If so, what execution policies might be supported?<br>

<br>

Besides multi-threaded CPU execution policies, are there plans, or would it even be feasible to implement a GPU execution policy in libc++/Clang, which targets the NVPTX backend, using the Clang frontend only (i.e., without NVCC)?<br>

<br>

This would be extremely useful, since even with the latest CUDA 7 release of NVCC, it remains slow, buggy, and consumes massive amounts of memory in comparison to Clang.  If I compile my Thrust-based code it takes just a minute or so, and consumes just a few gigabytes, using Clang, against Thrust’s TBB backend. If I compile that exact same code, only using Thrust's CUDA backend with NVCC, it consumes ~20 gigabytes of memory and it takes well over an hour to compile (on my 24-GB workstation, on my 16-GB laptop it never finishes).  Obviating the need for NVCC for compiling code targeting NVIDIA GPUs via a Parallelism TS implementation would be extremely useful.<br></blockquote><div><br></div><div>I can't speak for Parallelism-TS, but for the past year or so we've been steadily trickling more CUDA support into upstream Clang. Internally, we use a Clang-based compiler for CUDA-->PTX (alongside nvcc), and as you mention, one of its strengths vs. nvcc is compilation time & resource consumption. For large template-metaprogramming code Clang's frontend is an order of magnitude faster.</div><div><br></div><div>The pace of our upstreaming is picking up. Take a look at <a href="http://reviews.llvm.org/D8463">http://reviews.llvm.org/D8463</a>, for example; and feel free to help out with reviews.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

Finally, are there plans, or would it even be feasible, to target OpenCL/SYCL/SPIR(-V) via Parallelism-TS?  I am aware of existing OpenCL-based parallel algorithms library but I am really hoping for a Parallelism TS execution policy for OpenCL devices, so that it is a single-source, fully-integrated approach that one can pass C++ function objects to directly, as opposed to being restricted to passing strings containing OpenCL C99 syntax, or having to pre-instantiatiate template functors with macro wrappers.</blockquote><div><br></div><div>It is certainly *possible* to target something like SPIR(-V) from Clang for CUDA - since it just generates LLVM IR now. Not sure if anyone is planning it at this time though.</div><div><br></div><div>Eli</div><div><br></div><div> </div></div><br></div></div>