[cfe-dev] [RFC][OpenMP] Usability improvement, allow dropping offload targets

Hal Finkel via cfe-dev cfe-dev at lists.llvm.org
Thu Aug 2 08:53:12 PDT 2018


On 08/02/2018 10:32 AM, Dmitriev, Serguei N wrote:
>
> Hi Hal,
>
>  
>
> The offload initialization code is pretty simple, in pseudo code it
> would look like this
>
>  
>
> // The device image information.
>
> struct __tgt_device_image {
>
>   void *ImageStart;                  // Pointer to the target code start
>
>   void *ImageEnd;                    // Pointer to the target code end
>
> ...
>
> };
>
>  
>
> // Target binary descriptor.
>
> struct __tgt_bin_desc {
>
>   int32_t NumDeviceImages;           // Number of device images
>
>   __tgt_device_image *DeviceImages;  // Array of device images
>
> ...
>
> };
>
>  
>
> // External symbols for start/end addresses for all N target images.
>
> // These symbols are defined by the linker script which is dynamically
>
> // generated by the clang driver for host link action.
>
> extern char ImageStart1[];
>
> extern char ImageEnd1[];
>
> ...
>
> extern char ImageStartN[];
>
> extern char ImageEndN[];
>
>  
>
> static __tgt_device_image TargetImages[] = {
>
>   { ImageStart1, ImageEnd1, ...},
>
> ...
>
>   { ImageStartN, ImageEndN, ...},
>
> };
>
>  
>
> static __tgt_device_image BinaryDesc = {
>
>   sizeof(TargetImages)/sizeof(__tgt_device_image),  // NumDeviceImages
>
>   TargetImages,                                     // DeviceImages
>
> ...
>
> };
>
>  
>
> // Constructor && destructor which registers/unregisters device binaries
>
> static void registerBinaryDesc() __attribute__((constructor(0))) {
>
>   __tgt_register_lib(&BinaryDesc);
>
> }
>
> static void unregisterBinaryDesc() __attribute__((destructor(0))) {
>
>   __tgt_unregister_lib(&BinaryDesc);
>
> }
>
>  
>
> Front end adds such code to every host object containing omp offload
> constructs. It is sufficient to execute it once at program startup, so
> it is created in comdat for efficiency. And for the current
> implementation it is Ok to have it in comdat because changing the list
> of offload targets between objects is now allowed now. Therefore all
> host objects are expected to have the same init code and thus it does
> not matter what instance is eventually linked in.
>
>  
>
> Technically, making offload init code non-comdat together with
> Alexey’s fix which adds weak linkage to external references from init
> code would resolve this problem. That would force offload init code
> from all objects to be executed on startup (not just a single
> instance) and thus all target images will eventually be registered,
> but it would be less efficient than executing only one instance of
> init code.
>

Thanks for the details. Would my suggestion of hashing/encoding the list
of offloading targets into the comdat name/key also work? This seems
like it would be a simple solution and would limit the number of
initialization calls to one per unique target combination (which should
be no more than a few).

 -Hal

>  
>
> What I suggested below is completely removing offload init code from
> host objects and moving it to a separate object (I called it wrapper
> object) which is dynamically created by the clang driver for host link
> action with a help of the new clang-offload-wrapper tool. This object
> will have all target images as data and offload init code which
> registers the images. Such change would make host objects completely
> independent from offloading targets that were specified at compile
> time. And offload initialization will still be efficient since it will
> be done only once.
>
>  
>
> Thanks,
>
> Serguei
>
>  
>
> *From:*Hal Finkel [mailto:hfinkel at anl.gov]
> *Sent:* Wednesday, August 1, 2018 1:51 PM
> *To:* Dmitriev, Serguei N <serguei.n.dmitriev at intel.com>; Alexey
> Bataev <a.bataev at outlook.com>
> *Cc:* cfe-dev at lists.llvm.org
> *Subject:* Re: [cfe-dev] [RFC][OpenMP] Usability improvement, allow
> dropping offload targets
>
>  
>
>  
>
> On 08/01/2018 03:32 PM, Dmitriev, Serguei N wrote:
>
>     Hi Alexey,
>
>      
>
>     Empty object file produced by the bundler is one of the problems
>     with that example, but I was talking about the different issue
>     which is related to the offload initialization code.
>
>      
>
>     Front end, as part of the generating offload initialization code,
>     creates the target binary descriptor object which is passed to the
>     libomptarget registration API. The binary descriptor object
>     besides the start/end addresses of all target images contains the
>     number of target images which need to be registered and compiler
>     initializes this field to the number of offload targets that were
>     specified in command line. So, in that example below, both a.o and
>     b.o will have initialization code for registering only one target
>     image because only one offload target was specified in command
>     line. The initialization code is generated as a comdat group, so
>     linker will choose either a.o’s or b.o’s instance of this code at
>     link stage, but in any case it will register only one target image
>     instead of two. So that example will not work as expected even if
>     offload bundler problem is fixed. Delaying generation of the
>     offload initialization code till link time would resolve this issue.
>
>      
>
>     I suspect there would also be problems with the offload entry
>     table besides these two issues. The number of offload entries on
>     the host side and target images won’t match, and I guess
>     libomptarget cannot handle this correctly, so I assume runtime
>     would require some changes as well.
>
>
> That certainly seems like a problem. We don't want varying lists of
> offloading targets between objects to create implicit ODR problems. Is
> the initialization code/tables large? Is it important that they're in
> comdat? Alternatively, we could hash/encode the list of targets in the
> comdat key, so we'll only get one copy of the init code per unique
> combination. That might be better than actually having each object
> contain an initializer?
>
>  -Hal
>
>
>      
>
>     Thanks,
>
>     Serguei
>
>      
>
>     *From:*Alexey Bataev [mailto:a.bataev at outlook.com]
>     *Sent:* Wednesday, August 1, 2018 11:45 AM
>     *To:* Dmitriev, Serguei N <serguei.n.dmitriev at intel.com>
>     <mailto:serguei.n.dmitriev at intel.com>
>     *Cc:* cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>; Hal
>     Finkel <hfinkel at anl.gov> <mailto:hfinkel at anl.gov>
>     *Subject:* Re: [cfe-dev] [RFC][OpenMP] Usability improvement,
>     allow dropping offload targets
>
>      
>
>     Hi Serguei,
>
>     I don't see a lot of problems with this example. As I can see
>     there is only one problem: clang-offload-bundler generates an
>     empty object file that cannot be recognized by the linker. If we
>     teach clang-offload-bundler to generate correct empty object files
>     (what we need to do anyway, because currently it may produce wrong
>     utout), your example will work without any changes.
>
>     -------------
>
>     Best regards,
>
>     Alexey Bataev
>
>     31.07.2018 16:50, Dmitriev, Serguei N пишет:
>
>         Hi Alexey,
>
>          
>
>         Such change would fix the link issue, but I believe it would
>         be a short term solution that will still be revised in future.
>
>          
>
>         Let me explain my concerns. Current implementation has one
>         more limitation which I assume would also be addressed in
>         future – target binaries are expected to have entries for all
>         OpenMP target regions in the program, though it seems to be
>         too restrictive. I assume there would be use cases when you
>         would want to tune target regions in your program for
>         particular targets and offloading to other targets would not
>         make much sense for those regions (or probably won’t even be
>         possible due to limitations of a particular target). It seems
>         reasonable to compile those region only for the targets they
>         were tuned for, thus I assume compiler will support the
>         following usage model in future
>
>          
>
>         clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -c a.c
>
>         clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu -c b.c
>
>         clang -fopenmp
>         -fopenmp-targets=nvptx64-nvidia-cuda,x86_64-pc-linux-gnu a.o b.o
>
>          
>
>         And such usage model would anyway require redesigning the way
>         how offload initialization code is generated. It has to be
>         delayed till link time because the final set of offload
>         targets that need to be registered in runtime would be known
>         only at link step and thus compiler cannot create correct
>         target binary descriptor object (which is a part of the
>         offload initialization code) at compile time as it is done now.
>
>          
>
>         Does that sound reasonable?
>
>          
>
>         Thanks,
>
>         Serguei
>
>          
>
>         *From:*Alexey Bataev [mailto:a.bataev at outlook.com]
>         *Sent:* Tuesday, July 31, 2018 9:55 AM
>         *To:* Dmitriev, Serguei N <serguei.n.dmitriev at intel.com>
>         <mailto:serguei.n.dmitriev at intel.com>; cfe-dev at lists.llvm.org
>         <mailto:cfe-dev at lists.llvm.org>
>         *Subject:* Re: [cfe-dev] [RFC][OpenMP] Usability improvement,
>         allow dropping offload targets
>
>          
>
>         Hi Serguei,
>
>         Actually your problem can be fixed easily with a simple patch
>         that changes the linkage of the
>
>         `.omp_offloading.img_[start|end]` symbols from external to
>         external weak. After this change your example compiles and
>         works perfectly without any additional changes. I'm going to
>         commit this patch in few minutes.
>
>         -------------
>
>         Best regards,
>
>         Alexey Bataev
>
>         30.07.2018 19:50, Dmitriev, Serguei N via cfe-dev пишет:
>
>             Motivation
>
>              
>
>             The existing OpenMP offloading implementation in clang
>             does not allow dropping
>
>             offload targets at link time. That is, if an object file
>             is created with one set
>
>             of offload targets you must use exactly the same set of
>             offload targets at the
>
>             link stage. Otherwise, linking will fail
>
>              
>
>             $ clang -fopenmp
>             -fopenmp-targets=x86_64-pc-linux-gnu,nvptx64-nvidia-cuda
>             foo.c -c
>
>             $ clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu foo.o
>
>             /tmp/foo-dd79f7.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg]+0x20):
>             undefined reference to
>             `.omp_offloading.img_start.nvptx64-nvidia-cuda'
>
>             /tmp/foo-dd79f7.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg]+0x28):
>             undefined reference to
>             `.omp_offloading.img_end.nvptx64-nvidia-cuda'
>
>             clang-7: error: linker command failed with exit code 1
>             (use -v to see invocation)
>
>             $
>
>              
>
>             This limits OpenMP offload usability. So far, this has not
>             been a high priority
>
>             issue but the importance of this problem will grow once
>             clang offload starts
>
>             supporting static libraries with offload functionality.
>             For instance, this
>
>             limitation won't allow creating general purpose static
>             libraries targeting
>
>             multiple types of offload devices and later linking them
>             into a program that
>
>             uses only one offload target.
>
>              
>
>             Problem description
>
>              
>
>             Offload targets cannot be dropped at the link phase
>             because object files
>
>             produced by the compiler for the host have dependencies on
>             the offload targets
>
>             specified during compilation. These dependencies arise
>             from the offload
>
>             initialization code.
>
>              
>
>             The clang front-end adds offload initialization code to
>             each host object in
>
>             addition to all necessary processing of OpenMP constructs.
>             This initialization
>
>             code is intended to register target binaries for all
>             offload targets in the
>
>             runtime library at program startup. This code consists of
>             two compiler-generated
>
>             routines. One of these routines is added to the list of
>             global constructors and
>
>             the other to the global destructors. The constructor
>             routine calls a
>
>             libomptarget API which registers the target binaries and
>             the destructor
>
>             correspondingly calls a similar API for unregistering
>             target binaries.
>
>              
>
>             Both these APIs accept a pointer to the target binary
>             descriptor object which
>
>             specifies the number of offload target binaries to
>             register and the start/end
>
>             addresses of target binary images. Since the start/end
>             addresses of target
>
>             binaries are not available at compile time, the target
>             binary descriptors are
>
>             initialized using link-time constants which reference
>             (undefined) symbols
>
>             containing the start/end addresses of all target images.
>             These symbols are
>
>             created by the dynamically-generated linker script which
>             the clang driver
>
>             creates for the host link action.
>
>              
>
>             References to the target specific symbols from host
>             objects make them dependent
>
>             on particular offload targets and prevents dropping
>             offload targets at the link
>
>             step. Therefore, the OpenMP offload initialization needs
>             to be redesigned to
>
>             make offload targets discardable.
>
>              
>
>             Proposed change
>
>              
>
>             Host objects should be independent of offload targets in
>             order to allow dropping
>
>             code for offload targets. That can be achieved by removing
>             offload
>
>             initialization code from host objects. The compiler should
>             not inject this code
>
>             into host objects.
>
>              
>
>             However, offload initialization should still be done, so
>             it is proposed to move
>
>             the initialization code into a special dynamically
>             generated object file
>
>             (referred to as 'wrapper object' here onwards), which,
>             besides the
>
>             initialization code, will also contain embedded images for
>             offload targets.
>
>              
>
>             The wrapper object file will be generated by the clang
>             driver with the help of
>
>             a new tool: clang-offload-wrapper. This tool will take
>             offload target binaries
>
>             as input and produce bitcode files containing offload
>             initialization code and
>
>             embedded target images. The output bitcode is then passed
>             to the backend and
>
>             assembler tools from the host toolchain to produce the
>             wrapper object which is
>
>             then added as an input to the linker for host linking.
>
>              
>
>             The offload action builder in the clang driver needs to be
>             changed to use this
>
>             tool while building the actions graph for OpenMP offload
>             compilations.
>
>              
>
>             Current status
>
>              
>
>             A patch with initial implementation of the proposed
>             changes has been uploaded to
>
>             phabricator for review - https://reviews.llvm.org/D49510.
>
>              
>
>             Looking for a feedback for this proposal.
>
>              
>
>             Thanks,
>
>             Sergey
>
>              
>
>          
>
>      
>
>
>
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180802/b4c12f42/attachment.html>


More information about the cfe-dev mailing list