[cfe-dev] [RFC][OpenMP] Usability improvement, allow dropping offload targets
Hal Finkel via cfe-dev
cfe-dev at lists.llvm.org
Thu Aug 2 08:53:12 PDT 2018
On 08/02/2018 10:32 AM, Dmitriev, Serguei N wrote:
>
> Hi Hal,
>
>
>
> The offload initialization code is pretty simple, in pseudo code it
> would look like this
>
>
>
> // The device image information.
>
> struct __tgt_device_image {
>
> void *ImageStart; // Pointer to the target code start
>
> void *ImageEnd; // Pointer to the target code end
>
> ...
>
> };
>
>
>
> // Target binary descriptor.
>
> struct __tgt_bin_desc {
>
> int32_t NumDeviceImages; // Number of device images
>
> __tgt_device_image *DeviceImages; // Array of device images
>
> ...
>
> };
>
>
>
> // External symbols for start/end addresses for all N target images.
>
> // These symbols are defined by the linker script which is dynamically
>
> // generated by the clang driver for host link action.
>
> extern char ImageStart1[];
>
> extern char ImageEnd1[];
>
> ...
>
> extern char ImageStartN[];
>
> extern char ImageEndN[];
>
>
>
> static __tgt_device_image TargetImages[] = {
>
> { ImageStart1, ImageEnd1, ...},
>
> ...
>
> { ImageStartN, ImageEndN, ...},
>
> };
>
>
>
> static __tgt_device_image BinaryDesc = {
>
> sizeof(TargetImages)/sizeof(__tgt_device_image), // NumDeviceImages
>
> TargetImages, // DeviceImages
>
> ...
>
> };
>
>
>
> // Constructor && destructor which registers/unregisters device binaries
>
> static void registerBinaryDesc() __attribute__((constructor(0))) {
>
> __tgt_register_lib(&BinaryDesc);
>
> }
>
> static void unregisterBinaryDesc() __attribute__((destructor(0))) {
>
> __tgt_unregister_lib(&BinaryDesc);
>
> }
>
>
>
> Front end adds such code to every host object containing omp offload
> constructs. It is sufficient to execute it once at program startup, so
> it is created in comdat for efficiency. And for the current
> implementation it is Ok to have it in comdat because changing the list
> of offload targets between objects is now allowed now. Therefore all
> host objects are expected to have the same init code and thus it does
> not matter what instance is eventually linked in.
>
>
>
> Technically, making offload init code non-comdat together with
> Alexey’s fix which adds weak linkage to external references from init
> code would resolve this problem. That would force offload init code
> from all objects to be executed on startup (not just a single
> instance) and thus all target images will eventually be registered,
> but it would be less efficient than executing only one instance of
> init code.
>
Thanks for the details. Would my suggestion of hashing/encoding the list
of offloading targets into the comdat name/key also work? This seems
like it would be a simple solution and would limit the number of
initialization calls to one per unique target combination (which should
be no more than a few).
-Hal
>
>
> What I suggested below is completely removing offload init code from
> host objects and moving it to a separate object (I called it wrapper
> object) which is dynamically created by the clang driver for host link
> action with a help of the new clang-offload-wrapper tool. This object
> will have all target images as data and offload init code which
> registers the images. Such change would make host objects completely
> independent from offloading targets that were specified at compile
> time. And offload initialization will still be efficient since it will
> be done only once.
>
>
>
> Thanks,
>
> Serguei
>
>
>
> *From:*Hal Finkel [mailto:hfinkel at anl.gov]
> *Sent:* Wednesday, August 1, 2018 1:51 PM
> *To:* Dmitriev, Serguei N <serguei.n.dmitriev at intel.com>; Alexey
> Bataev <a.bataev at outlook.com>
> *Cc:* cfe-dev at lists.llvm.org
> *Subject:* Re: [cfe-dev] [RFC][OpenMP] Usability improvement, allow
> dropping offload targets
>
>
>
>
>
> On 08/01/2018 03:32 PM, Dmitriev, Serguei N wrote:
>
> Hi Alexey,
>
>
>
> Empty object file produced by the bundler is one of the problems
> with that example, but I was talking about the different issue
> which is related to the offload initialization code.
>
>
>
> Front end, as part of the generating offload initialization code,
> creates the target binary descriptor object which is passed to the
> libomptarget registration API. The binary descriptor object
> besides the start/end addresses of all target images contains the
> number of target images which need to be registered and compiler
> initializes this field to the number of offload targets that were
> specified in command line. So, in that example below, both a.o and
> b.o will have initialization code for registering only one target
> image because only one offload target was specified in command
> line. The initialization code is generated as a comdat group, so
> linker will choose either a.o’s or b.o’s instance of this code at
> link stage, but in any case it will register only one target image
> instead of two. So that example will not work as expected even if
> offload bundler problem is fixed. Delaying generation of the
> offload initialization code till link time would resolve this issue.
>
>
>
> I suspect there would also be problems with the offload entry
> table besides these two issues. The number of offload entries on
> the host side and target images won’t match, and I guess
> libomptarget cannot handle this correctly, so I assume runtime
> would require some changes as well.
>
>
> That certainly seems like a problem. We don't want varying lists of
> offloading targets between objects to create implicit ODR problems. Is
> the initialization code/tables large? Is it important that they're in
> comdat? Alternatively, we could hash/encode the list of targets in the
> comdat key, so we'll only get one copy of the init code per unique
> combination. That might be better than actually having each object
> contain an initializer?
>
> -Hal
>
>
>
>
> Thanks,
>
> Serguei
>
>
>
> *From:*Alexey Bataev [mailto:a.bataev at outlook.com]
> *Sent:* Wednesday, August 1, 2018 11:45 AM
> *To:* Dmitriev, Serguei N <serguei.n.dmitriev at intel.com>
> <mailto:serguei.n.dmitriev at intel.com>
> *Cc:* cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>; Hal
> Finkel <hfinkel at anl.gov> <mailto:hfinkel at anl.gov>
> *Subject:* Re: [cfe-dev] [RFC][OpenMP] Usability improvement,
> allow dropping offload targets
>
>
>
> Hi Serguei,
>
> I don't see a lot of problems with this example. As I can see
> there is only one problem: clang-offload-bundler generates an
> empty object file that cannot be recognized by the linker. If we
> teach clang-offload-bundler to generate correct empty object files
> (what we need to do anyway, because currently it may produce wrong
> utout), your example will work without any changes.
>
> -------------
>
> Best regards,
>
> Alexey Bataev
>
> 31.07.2018 16:50, Dmitriev, Serguei N пишет:
>
> Hi Alexey,
>
>
>
> Such change would fix the link issue, but I believe it would
> be a short term solution that will still be revised in future.
>
>
>
> Let me explain my concerns. Current implementation has one
> more limitation which I assume would also be addressed in
> future – target binaries are expected to have entries for all
> OpenMP target regions in the program, though it seems to be
> too restrictive. I assume there would be use cases when you
> would want to tune target regions in your program for
> particular targets and offloading to other targets would not
> make much sense for those regions (or probably won’t even be
> possible due to limitations of a particular target). It seems
> reasonable to compile those region only for the targets they
> were tuned for, thus I assume compiler will support the
> following usage model in future
>
>
>
> clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -c a.c
>
> clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu -c b.c
>
> clang -fopenmp
> -fopenmp-targets=nvptx64-nvidia-cuda,x86_64-pc-linux-gnu a.o b.o
>
>
>
> And such usage model would anyway require redesigning the way
> how offload initialization code is generated. It has to be
> delayed till link time because the final set of offload
> targets that need to be registered in runtime would be known
> only at link step and thus compiler cannot create correct
> target binary descriptor object (which is a part of the
> offload initialization code) at compile time as it is done now.
>
>
>
> Does that sound reasonable?
>
>
>
> Thanks,
>
> Serguei
>
>
>
> *From:*Alexey Bataev [mailto:a.bataev at outlook.com]
> *Sent:* Tuesday, July 31, 2018 9:55 AM
> *To:* Dmitriev, Serguei N <serguei.n.dmitriev at intel.com>
> <mailto:serguei.n.dmitriev at intel.com>; cfe-dev at lists.llvm.org
> <mailto:cfe-dev at lists.llvm.org>
> *Subject:* Re: [cfe-dev] [RFC][OpenMP] Usability improvement,
> allow dropping offload targets
>
>
>
> Hi Serguei,
>
> Actually your problem can be fixed easily with a simple patch
> that changes the linkage of the
>
> `.omp_offloading.img_[start|end]` symbols from external to
> external weak. After this change your example compiles and
> works perfectly without any additional changes. I'm going to
> commit this patch in few minutes.
>
> -------------
>
> Best regards,
>
> Alexey Bataev
>
> 30.07.2018 19:50, Dmitriev, Serguei N via cfe-dev пишет:
>
> Motivation
>
>
>
> The existing OpenMP offloading implementation in clang
> does not allow dropping
>
> offload targets at link time. That is, if an object file
> is created with one set
>
> of offload targets you must use exactly the same set of
> offload targets at the
>
> link stage. Otherwise, linking will fail
>
>
>
> $ clang -fopenmp
> -fopenmp-targets=x86_64-pc-linux-gnu,nvptx64-nvidia-cuda
> foo.c -c
>
> $ clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu foo.o
>
> /tmp/foo-dd79f7.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg]+0x20):
> undefined reference to
> `.omp_offloading.img_start.nvptx64-nvidia-cuda'
>
> /tmp/foo-dd79f7.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg]+0x28):
> undefined reference to
> `.omp_offloading.img_end.nvptx64-nvidia-cuda'
>
> clang-7: error: linker command failed with exit code 1
> (use -v to see invocation)
>
> $
>
>
>
> This limits OpenMP offload usability. So far, this has not
> been a high priority
>
> issue but the importance of this problem will grow once
> clang offload starts
>
> supporting static libraries with offload functionality.
> For instance, this
>
> limitation won't allow creating general purpose static
> libraries targeting
>
> multiple types of offload devices and later linking them
> into a program that
>
> uses only one offload target.
>
>
>
> Problem description
>
>
>
> Offload targets cannot be dropped at the link phase
> because object files
>
> produced by the compiler for the host have dependencies on
> the offload targets
>
> specified during compilation. These dependencies arise
> from the offload
>
> initialization code.
>
>
>
> The clang front-end adds offload initialization code to
> each host object in
>
> addition to all necessary processing of OpenMP constructs.
> This initialization
>
> code is intended to register target binaries for all
> offload targets in the
>
> runtime library at program startup. This code consists of
> two compiler-generated
>
> routines. One of these routines is added to the list of
> global constructors and
>
> the other to the global destructors. The constructor
> routine calls a
>
> libomptarget API which registers the target binaries and
> the destructor
>
> correspondingly calls a similar API for unregistering
> target binaries.
>
>
>
> Both these APIs accept a pointer to the target binary
> descriptor object which
>
> specifies the number of offload target binaries to
> register and the start/end
>
> addresses of target binary images. Since the start/end
> addresses of target
>
> binaries are not available at compile time, the target
> binary descriptors are
>
> initialized using link-time constants which reference
> (undefined) symbols
>
> containing the start/end addresses of all target images.
> These symbols are
>
> created by the dynamically-generated linker script which
> the clang driver
>
> creates for the host link action.
>
>
>
> References to the target specific symbols from host
> objects make them dependent
>
> on particular offload targets and prevents dropping
> offload targets at the link
>
> step. Therefore, the OpenMP offload initialization needs
> to be redesigned to
>
> make offload targets discardable.
>
>
>
> Proposed change
>
>
>
> Host objects should be independent of offload targets in
> order to allow dropping
>
> code for offload targets. That can be achieved by removing
> offload
>
> initialization code from host objects. The compiler should
> not inject this code
>
> into host objects.
>
>
>
> However, offload initialization should still be done, so
> it is proposed to move
>
> the initialization code into a special dynamically
> generated object file
>
> (referred to as 'wrapper object' here onwards), which,
> besides the
>
> initialization code, will also contain embedded images for
> offload targets.
>
>
>
> The wrapper object file will be generated by the clang
> driver with the help of
>
> a new tool: clang-offload-wrapper. This tool will take
> offload target binaries
>
> as input and produce bitcode files containing offload
> initialization code and
>
> embedded target images. The output bitcode is then passed
> to the backend and
>
> assembler tools from the host toolchain to produce the
> wrapper object which is
>
> then added as an input to the linker for host linking.
>
>
>
> The offload action builder in the clang driver needs to be
> changed to use this
>
> tool while building the actions graph for OpenMP offload
> compilations.
>
>
>
> Current status
>
>
>
> A patch with initial implementation of the proposed
> changes has been uploaded to
>
> phabricator for review - https://reviews.llvm.org/D49510.
>
>
>
> Looking for a feedback for this proposal.
>
>
>
> Thanks,
>
> Sergey
>
>
>
>
>
>
>
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180802/b4c12f42/attachment.html>
More information about the cfe-dev
mailing list