[cfe-dev] [RFC][OpenMP] Usability improvement, allow dropping offload targets
Dmitriev, Serguei N via cfe-dev
cfe-dev at lists.llvm.org
Mon Jul 30 16:50:05 PDT 2018
The existing OpenMP offloading implementation in clang does not allow dropping
offload targets at link time. That is, if an object file is created with one set
of offload targets you must use exactly the same set of offload targets at the
link stage. Otherwise, linking will fail
$ clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu,nvptx64-nvidia-cuda foo.c -c
$ clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu foo.o
/tmp/foo-dd79f7.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg]+0x20): undefined reference to `.omp_offloading.img_start.nvptx64-nvidia-cuda'
/tmp/foo-dd79f7.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg]+0x28): undefined reference to `.omp_offloading.img_end.nvptx64-nvidia-cuda'
clang-7: error: linker command failed with exit code 1 (use -v to see invocation)
This limits OpenMP offload usability. So far, this has not been a high priority
issue but the importance of this problem will grow once clang offload starts
supporting static libraries with offload functionality. For instance, this
limitation won't allow creating general purpose static libraries targeting
multiple types of offload devices and later linking them into a program that
uses only one offload target.
Offload targets cannot be dropped at the link phase because object files
produced by the compiler for the host have dependencies on the offload targets
specified during compilation. These dependencies arise from the offload
The clang front-end adds offload initialization code to each host object in
addition to all necessary processing of OpenMP constructs. This initialization
code is intended to register target binaries for all offload targets in the
runtime library at program startup. This code consists of two compiler-generated
routines. One of these routines is added to the list of global constructors and
the other to the global destructors. The constructor routine calls a
libomptarget API which registers the target binaries and the destructor
correspondingly calls a similar API for unregistering target binaries.
Both these APIs accept a pointer to the target binary descriptor object which
specifies the number of offload target binaries to register and the start/end
addresses of target binary images. Since the start/end addresses of target
binaries are not available at compile time, the target binary descriptors are
initialized using link-time constants which reference (undefined) symbols
containing the start/end addresses of all target images. These symbols are
created by the dynamically-generated linker script which the clang driver
creates for the host link action.
References to the target specific symbols from host objects make them dependent
on particular offload targets and prevents dropping offload targets at the link
step. Therefore, the OpenMP offload initialization needs to be redesigned to
make offload targets discardable.
Host objects should be independent of offload targets in order to allow dropping
code for offload targets. That can be achieved by removing offload
initialization code from host objects. The compiler should not inject this code
into host objects.
However, offload initialization should still be done, so it is proposed to move
the initialization code into a special dynamically generated object file
(referred to as 'wrapper object' here onwards), which, besides the
initialization code, will also contain embedded images for offload targets.
The wrapper object file will be generated by the clang driver with the help of
a new tool: clang-offload-wrapper. This tool will take offload target binaries
as input and produce bitcode files containing offload initialization code and
embedded target images. The output bitcode is then passed to the backend and
assembler tools from the host toolchain to produce the wrapper object which is
then added as an input to the linker for host linking.
The offload action builder in the clang driver needs to be changed to use this
tool while building the actions graph for OpenMP offload compilations.
A patch with initial implementation of the proposed changes has been uploaded to
phabricator for review - https://reviews.llvm.org/D49510.
Looking for a feedback for this proposal.
More information about the cfe-dev