[cfe-dev] [RFC][OpenMP][CUDA] Unified Offloading Support in Clang Driver
Ronan KERYELL via cfe-dev
cfe-dev at lists.llvm.org
Thu Mar 3 02:50:04 PST 2016
>>>>> On Wed, 24 Feb 2016 19:01:31 -0500, Samuel F Antao via cfe-dev <cfe-dev at lists.llvm.org> said:
Samuel> Hi all,
Hi Samuel!
Samuel> I’d like to propose a change in the Driver implementation
Samuel> to support programming models that require offloading with a
Samuel> unified infrastructure. The goal is to have a design that
Samuel> is general enough to cover different programming models with
Samuel> as little as possible customization that is
Samuel> programming-model specific. Some of this discussion already
Samuel> took place in http://reviews.llvm.org/D9888 but would like
Samuel> to continue that here in he mailing list and try to collect
Samuel> as much feedback as possible.
Samuel> Currently, there are two programming models supported by
Samuel> clang that require offloading - CUDA and OpenMP. Examples of
Samuel> other offloading models that can could benefit of a unified
Samuel> driver design as they become supported in clang are also
Samuel> SYCL (https://www.khronos.org/sycl) and OpenACC
Samuel> (http://www.openacc.org/).
Great proposal!
Very à propos since I am just thinking about implementing it with Clang
in my SYCL implementation (see
https://github.com/amd/triSYCL#possible-futures for possible way I am
thinking of).
Samuel> OpenMP (Host IR has to be read by the device to determine
Samuel> which declarations have to be emitted and the device binary
Samuel> is embedded in the host binary at link phase through a
Samuel> proper linker script):
Samuel> Src -> Host PP -> A
Samuel> A -> HostCompile -> B
Samuel> A,B -> DeviceCompile -> C
Samuel> C -> DeviceAssembler -> D
Samuel> E -> DeviceLinker -> F
Samuel> B -> HostAssembler -> G
Samuel> G,F -> HostLinker -> Out
In SYCL it would be pretty close. Something like:
Src -> Host PP -> A
A -> HostCompile -> B
B -> HostAssembler -> C
Src -> Device PP -> D
D -> DeviceCompile -> E
E -> DeviceAssembler -> F
F -> DeviceLinker -> G
C,G -> HostLinker -> Out
Samuel> As an hypothetical example, lets assume we wanted to compile
Samuel> code that uses both CUDA for a nvptx64 device, OpenMP for an
Samuel> x86_64 device, and a powerpc64le host, one could invoke the
Samuel> driver as:
Samuel> clang -target powerpc64le-ibm-linux-gnu <more host options>
Samuel> -target-offload=nvptx64-nvidia-cuda -fcuda -mcpu sm_35 <more
Samuel> options for the nvptx toolchain>
Samuel> -target-offload=x86_64-pc-linux-gnu -fopenmp <more options
Samuel> for the x86_64 toolchain>
Just to be sure to understand: you are thinking about being able to
outline several "languages" at once, such as CUDA *and* OpenMP, right ?
I think it is required for serious applications. For example, in the HPC
world, it is common to have hybrid multi-node heterogeneous applications
that use MPI+OpenMP+OpenCL for example. Since MPI and OpenCL are just
libraries, there is only OpenMP to off-load here. But if we move to
OpenCL SYCL instead with MPI+OpenMP+SYCL then both OpenMP and SYCL have
to be managed by the Clang off-loading infrastructure at the same time
and be sure they combine gracefully...
I think your second proposal about (un)bundling can already manage this.
Otherwise, what about the code outlining itself used in the off-loading
process? The code generation itself requires to outline the kernel code
to some external functions to be compiled by the kernel compiler. Do you
think it is up to the programmer to re-use the recipes used by OpenMP
and CUDA for example or it would be interesting to have a third proposal
to abstract more the outliner to be configurable to handle globally
OpenMP, CUDA, SYCL...?
Thanks a lot,
--
Ronan KERYELL
Xilinx Research Labs, Dublin, Ireland
More information about the cfe-dev
mailing list