[cfe-dev] Parallelism TS implementation and feasibility of GPUexecution policies

Mon Apr 6 14:47:08 PDT 2015

On Mon, Apr 6, 2015 at 11:48 AM, Régis Portalez <regis.portalez at altimesh.com
> wrote:

> Hi.
>
> Sorry to interrupt, but I understood there is way to emit llvm ir from
> cuda code?
>
>
In general, the Clang frontend (-cc1) can generate LLVM IR for the nvptx
triples/targets, when passed -fcuda-is-device. To use this in practice,
you'll need to supply a bunch of things in headers (definitions of
builtins, CUDA types and such), and no such headers exist in the open yet.
Clang won't be able to parse the NVIDIA headers as these collide with the
standard C++ headers in some ways.

Look at the code review I linked to earlier for more progress in making
Clang a viable compiler for CUDA.

> Is there any documentation on that?
>
>
Not that I know of, at this time.

Eli

> Thanks
> ------------------------------
> De : Eli Bendersky <eliben at google.com>
> Envoyé : ‎06/‎04/‎2015 17:43
> À : Andrew Corrigan <andrew.corrigan at nrl.navy.mil>
> Cc : clang-dev Developers <cfe-dev at cs.uiuc.edu>
> Objet : Re: [cfe-dev] Parallelism TS implementation and feasibility of
> GPUexecution policies
>
> On Thu, Apr 2, 2015 at 5:58 AM, Andrew Corrigan <
> andrew.corrigan at nrl.navy.mil> wrote:
>
>> Newcomer here, I hope this isn’t off-topic, but this seemed to be the
>> most appropriate place to ask:
>>
>> Are there plans to implement Parallelism-TS in libc++/Clang?  If so, what
>> execution policies might be supported?
>>
>> Besides multi-threaded CPU execution policies, are there plans, or would
>> it even be feasible to implement a GPU execution policy in libc++/Clang,
>> which targets the NVPTX backend, using the Clang frontend only (i.e.,
>> without NVCC)?
>>
>> This would be extremely useful, since even with the latest CUDA 7 release
>> of NVCC, it remains slow, buggy, and consumes massive amounts of memory in
>> comparison to Clang.  If I compile my Thrust-based code it takes just a
>> minute or so, and consumes just a few gigabytes, using Clang, against
>> Thrust’s TBB backend. If I compile that exact same code, only using
>> Thrust's CUDA backend with NVCC, it consumes ~20 gigabytes of memory and it
>> takes well over an hour to compile (on my 24-GB workstation, on my 16-GB
>> laptop it never finishes).  Obviating the need for NVCC for compiling code
>> targeting NVIDIA GPUs via a Parallelism TS implementation would be
>> extremely useful.
>>
>
> I can't speak for Parallelism-TS, but for the past year or so we've been
> steadily trickling more CUDA support into upstream Clang. Internally, we
> use a Clang-based compiler for CUDA-->PTX (alongside nvcc), and as you
> mention, one of its strengths vs. nvcc is compilation time & resource
> consumption. For large template-metaprogramming code Clang's frontend is an
> order of magnitude faster.
>
> The pace of our upstreaming is picking up. Take a look at
> http://reviews.llvm.org/D8463, for example; and feel free to help out
> with reviews.
>
>
>> Finally, are there plans, or would it even be feasible, to target
>> OpenCL/SYCL/SPIR(-V) via Parallelism-TS?  I am aware of existing
>> OpenCL-based parallel algorithms library but I am really hoping for a
>> Parallelism TS execution policy for OpenCL devices, so that it is a
>> single-source, fully-integrated approach that one can pass C++ function
>> objects to directly, as opposed to being restricted to passing strings
>> containing OpenCL C99 syntax, or having to pre-instantiatiate template
>> functors with macro wrappers.
>
>
> It is certainly *possible* to target something like SPIR(-V) from Clang
> for CUDA - since it just generates LLVM IR now. Not sure if anyone is
> planning it at this time though.
>
> Eli
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150406/5efe3c17/attachment.html>