[cfe-dev] [RFC][OpenMP][CUDA] Unified Offloading Support in Clang Driver

Samuel F Antao via cfe-dev cfe-dev at lists.llvm.org
Thu Mar 3 11:40:02 PST 2016


Hi Ronan,

Thanks for the feedback!


2016-03-03 5:50 GMT-05:00 Ronan KERYELL via cfe-dev <cfe-dev at lists.llvm.org>
:

> >>>>> On Wed, 24 Feb 2016 19:01:31 -0500, Samuel F Antao via cfe-dev <
> cfe-dev at lists.llvm.org> said:
>
>     Samuel> Hi all,
>
> Hi Samuel!
>
>     Samuel>  I’d like to propose a change in the Driver implementation
>     Samuel> to support programming models that require offloading with a
>     Samuel> unified infrastructure.  The goal is to have a design that
>     Samuel> is general enough to cover different programming models with
>     Samuel> as little as possible customization that is
>     Samuel> programming-model specific. Some of this discussion already
>     Samuel> took place in http://reviews.llvm.org/D9888 but would like
>     Samuel> to continue that here in he mailing list and try to collect
>     Samuel> as much feedback as possible.
>
>     Samuel> Currently, there are two programming models supported by
>     Samuel> clang that require offloading - CUDA and OpenMP. Examples of
>     Samuel> other offloading models that can could benefit of a unified
>     Samuel> driver design as they become supported in clang are also
>     Samuel> SYCL (https://www.khronos.org/sycl) and OpenACC
>     Samuel> (http://www.openacc.org/).
>
> Great proposal!
>
> Very à propos since I am just thinking about implementing it with Clang
> in my SYCL implementation (see
> https://github.com/amd/triSYCL#possible-futures for possible way I am
> thinking of).
>

That's great what I am proposing is aligned with your project needs!


>
>     Samuel> OpenMP (Host IR has to be read by the device to determine
>     Samuel> which declarations have to be emitted and the device binary
>     Samuel> is embedded in the host binary at link phase through a
>     Samuel> proper linker script):
>
>     Samuel> Src -> Host PP -> A
>
>     Samuel> A -> HostCompile -> B
>
>     Samuel> A,B -> DeviceCompile -> C
>
>     Samuel> C -> DeviceAssembler -> D
>
>     Samuel> E -> DeviceLinker -> F
>
>     Samuel> B -> HostAssembler -> G
>
>     Samuel> G,F -> HostLinker -> Out
>
> In SYCL it would be pretty close. Something like:
>
> Src -> Host PP -> A
>
> A -> HostCompile -> B
>
> B -> HostAssembler -> C
>
> Src -> Device PP -> D
>
> D -> DeviceCompile -> E
>
> E -> DeviceAssembler -> F
>
> F -> DeviceLinker -> G
>
> C,G -> HostLinker -> Out
>

The idea of the driver design is to allow you to use the DAG that is more
appropriate for your programming model without having to touch parts of the
infrastructure that do not necessarily relate with offloading. Your DAG
should be easily supported by the design.


>
>     Samuel> As an hypothetical example, lets assume we wanted to compile
>     Samuel> code that uses both CUDA for a nvptx64 device, OpenMP for an
>     Samuel> x86_64 device, and a powerpc64le host, one could invoke the
>     Samuel> driver as:
>
>     Samuel> clang -target powerpc64le-ibm-linux-gnu <more host options>
>
>     Samuel> -target-offload=nvptx64-nvidia-cuda -fcuda -mcpu sm_35 <more
>     Samuel> options for the nvptx toolchain>
>
>     Samuel> -target-offload=x86_64-pc-linux-gnu -fopenmp <more options
>     Samuel> for the x86_64 toolchain>
>
> Just to be sure to understand: you are thinking about being able to
> outline several "languages" at once, such as CUDA *and* OpenMP, right ?
>

That's correct. All the toolchains foreach programming model would be
extracted from the input arguments before getting into the commands
generation.


>
> I think it is required for serious applications. For example, in the HPC
> world, it is common to have hybrid multi-node heterogeneous applications
> that use MPI+OpenMP+OpenCL for example. Since MPI and OpenCL are just
> libraries, there is only OpenMP to off-load here. But if we move to
> OpenCL SYCL instead with MPI+OpenMP+SYCL then both OpenMP and SYCL have
> to be managed by the Clang off-loading infrastructure at the same time
> and be sure they combine gracefully...
>

Yes SYCL could coexist with OpenMP, and OpenMP should coexist with CUDA.  I
assume you are talking about the clang-driver infrastructure, because for
the codegen infrastructure, each programming model will have its own
specifics.


>
> I think your second proposal about (un)bundling can already manage this.
>
> Otherwise, what about the code outlining itself used in the off-loading
> process? The code generation itself requires to outline the kernel code
> to some external functions to be compiled by the kernel compiler. Do you
> think it is up to the programmer to re-use the recipes used by OpenMP
> and CUDA for example or it would be interesting to have a third proposal
> to abstract more the outliner to be configurable to handle globally
> OpenMP, CUDA, SYCL...?
>

The code generation has to be implemented/tuned for the programming model.
The driver itself can only help in the outlining by forwarding the right
files to the right place.

In codegen there is some sort of filtering going on. The AST for host an
device is similar. In the device frontend, for CUDA (and OpenCL I believe)
the declarations that don't have the device attribute are just ignored -
basically the outlining is already done, only have to device what is
device/host. In OpenMP is more complicated because the model allows
offloading to fail, so the host and device codegen have to match- that's
why the host IR is passed to the device frontend so that it makes sure it
matches the host implementation in terms of outlined kernels (the host
produces metadata to ease this process).

Having said that, I think that it would be hard to have something very
different from what we have already in clang, so I suspect different
programming models will fall in one of these two categories: explicit
outlining (CUDA OpenCL) and implicit outlining (OpenMP). For both
categories, the way device attributes are specified in the frontend or the
metadata used to pass information from host to device can be adapted to
better suit multiple programing models. However, this is a separate
discussion from the driver support.

Thanks again,
Samuel

>
> Thanks a lot,
> --
>   Ronan KERYELL
>   Xilinx Research Labs, Dublin, Ireland
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160303/c984d415/attachment.html>


More information about the cfe-dev mailing list