[cfe-dev] [RFC][OpenMP] Usability improvement, allow dropping offload targets

Hal Finkel via cfe-dev cfe-dev at lists.llvm.org
Wed Aug 1 13:51:16 PDT 2018


On 08/01/2018 03:32 PM, Dmitriev, Serguei N wrote:
>
> Hi Alexey,
>
>  
>
> Empty object file produced by the bundler is one of the problems with
> that example, but I was talking about the different issue which is
> related to the offload initialization code.
>
>  
>
> Front end, as part of the generating offload initialization code,
> creates the target binary descriptor object which is passed to the
> libomptarget registration API. The binary descriptor object besides
> the start/end addresses of all target images contains the number of
> target images which need to be registered and compiler initializes
> this field to the number of offload targets that were specified in
> command line. So, in that example below, both a.o and b.o will have
> initialization code for registering only one target image because only
> one offload target was specified in command line. The initialization
> code is generated as a comdat group, so linker will choose either
> a.o’s or b.o’s instance of this code at link stage, but in any case it
> will register only one target image instead of two. So that example
> will not work as expected even if offload bundler problem is fixed.
> Delaying generation of the offload initialization code till link time
> would resolve this issue.
>
>  
>
> I suspect there would also be problems with the offload entry table
> besides these two issues. The number of offload entries on the host
> side and target images won’t match, and I guess libomptarget cannot
> handle this correctly, so I assume runtime would require some changes
> as well.
>

That certainly seems like a problem. We don't want varying lists of
offloading targets between objects to create implicit ODR problems. Is
the initialization code/tables large? Is it important that they're in
comdat? Alternatively, we could hash/encode the list of targets in the
comdat key, so we'll only get one copy of the init code per unique
combination. That might be better than actually having each object
contain an initializer?

 -Hal

>  
>
> Thanks,
>
> Serguei
>
>  
>
> *From:*Alexey Bataev [mailto:a.bataev at outlook.com]
> *Sent:* Wednesday, August 1, 2018 11:45 AM
> *To:* Dmitriev, Serguei N <serguei.n.dmitriev at intel.com>
> *Cc:* cfe-dev at lists.llvm.org; Hal Finkel <hfinkel at anl.gov>
> *Subject:* Re: [cfe-dev] [RFC][OpenMP] Usability improvement, allow
> dropping offload targets
>
>  
>
> Hi Serguei,
>
> I don't see a lot of problems with this example. As I can see there is
> only one problem: clang-offload-bundler generates an empty object file
> that cannot be recognized by the linker. If we teach
> clang-offload-bundler to generate correct empty object files (what we
> need to do anyway, because currently it may produce wrong utout), your
> example will work without any changes.
>
> -------------
> Best regards,
> Alexey Bataev
>
> 31.07.2018 16:50, Dmitriev, Serguei N пишет:
>
>     Hi Alexey,
>
>      
>
>     Such change would fix the link issue, but I believe it would be a
>     short term solution that will still be revised in future.
>
>      
>
>     Let me explain my concerns. Current implementation has one more
>     limitation which I assume would also be addressed in future –
>     target binaries are expected to have entries for all OpenMP target
>     regions in the program, though it seems to be too restrictive. I
>     assume there would be use cases when you would want to tune target
>     regions in your program for particular targets and offloading to
>     other targets would not make much sense for those regions (or
>     probably won’t even be possible due to limitations of a particular
>     target). It seems reasonable to compile those region only for the
>     targets they were tuned for, thus I assume compiler will support
>     the following usage model in future
>
>      
>
>     clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -c a.c
>
>     clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu -c b.c
>
>     clang -fopenmp
>     -fopenmp-targets=nvptx64-nvidia-cuda,x86_64-pc-linux-gnu a.o b.o
>
>      
>
>     And such usage model would anyway require redesigning the way how
>     offload initialization code is generated. It has to be delayed
>     till link time because the final set of offload targets that need
>     to be registered in runtime would be known only at link step and
>     thus compiler cannot create correct target binary descriptor
>     object (which is a part of the offload initialization code) at
>     compile time as it is done now.
>
>      
>
>     Does that sound reasonable?
>
>      
>
>     Thanks,
>
>     Serguei
>
>      
>
>     *From:*Alexey Bataev [mailto:a.bataev at outlook.com]
>     *Sent:* Tuesday, July 31, 2018 9:55 AM
>     *To:* Dmitriev, Serguei N <serguei.n.dmitriev at intel.com>
>     <mailto:serguei.n.dmitriev at intel.com>; cfe-dev at lists.llvm.org
>     <mailto:cfe-dev at lists.llvm.org>
>     *Subject:* Re: [cfe-dev] [RFC][OpenMP] Usability improvement,
>     allow dropping offload targets
>
>      
>
>     Hi Serguei,
>
>     Actually your problem can be fixed easily with a simple patch that
>     changes the linkage of the
>
>     `.omp_offloading.img_[start|end]` symbols from external to
>     external weak. After this change your example compiles and works
>     perfectly without any additional changes. I'm going to commit this
>     patch in few minutes.
>
>     -------------
>
>     Best regards,
>
>     Alexey Bataev
>
>     30.07.2018 19:50, Dmitriev, Serguei N via cfe-dev пишет:
>
>         Motivation
>
>          
>
>         The existing OpenMP offloading implementation in clang does
>         not allow dropping
>
>         offload targets at link time. That is, if an object file is
>         created with one set
>
>         of offload targets you must use exactly the same set of
>         offload targets at the
>
>         link stage. Otherwise, linking will fail
>
>          
>
>         $ clang -fopenmp
>         -fopenmp-targets=x86_64-pc-linux-gnu,nvptx64-nvidia-cuda foo.c -c
>
>         $ clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu foo.o
>
>         /tmp/foo-dd79f7.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg]+0x20):
>         undefined reference to
>         `.omp_offloading.img_start.nvptx64-nvidia-cuda'
>
>         /tmp/foo-dd79f7.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg]+0x28):
>         undefined reference to
>         `.omp_offloading.img_end.nvptx64-nvidia-cuda'
>
>         clang-7: error: linker command failed with exit code 1 (use -v
>         to see invocation)
>
>         $
>
>          
>
>         This limits OpenMP offload usability. So far, this has not
>         been a high priority
>
>         issue but the importance of this problem will grow once clang
>         offload starts
>
>         supporting static libraries with offload functionality. For
>         instance, this
>
>         limitation won't allow creating general purpose static
>         libraries targeting
>
>         multiple types of offload devices and later linking them into
>         a program that
>
>         uses only one offload target.
>
>          
>
>         Problem description
>
>          
>
>         Offload targets cannot be dropped at the link phase because
>         object files
>
>         produced by the compiler for the host have dependencies on the
>         offload targets
>
>         specified during compilation. These dependencies arise from
>         the offload
>
>         initialization code.
>
>          
>
>         The clang front-end adds offload initialization code to each
>         host object in
>
>         addition to all necessary processing of OpenMP constructs.
>         This initialization
>
>         code is intended to register target binaries for all offload
>         targets in the
>
>         runtime library at program startup. This code consists of two
>         compiler-generated
>
>         routines. One of these routines is added to the list of global
>         constructors and
>
>         the other to the global destructors. The constructor routine
>         calls a
>
>         libomptarget API which registers the target binaries and the
>         destructor
>
>         correspondingly calls a similar API for unregistering target
>         binaries.
>
>          
>
>         Both these APIs accept a pointer to the target binary
>         descriptor object which
>
>         specifies the number of offload target binaries to register
>         and the start/end
>
>         addresses of target binary images. Since the start/end
>         addresses of target
>
>         binaries are not available at compile time, the target binary
>         descriptors are
>
>         initialized using link-time constants which reference
>         (undefined) symbols
>
>         containing the start/end addresses of all target images. These
>         symbols are
>
>         created by the dynamically-generated linker script which the
>         clang driver
>
>         creates for the host link action.
>
>          
>
>         References to the target specific symbols from host objects
>         make them dependent
>
>         on particular offload targets and prevents dropping offload
>         targets at the link
>
>         step. Therefore, the OpenMP offload initialization needs to be
>         redesigned to
>
>         make offload targets discardable.
>
>          
>
>         Proposed change
>
>          
>
>         Host objects should be independent of offload targets in order
>         to allow dropping
>
>         code for offload targets. That can be achieved by removing offload
>
>         initialization code from host objects. The compiler should not
>         inject this code
>
>         into host objects.
>
>          
>
>         However, offload initialization should still be done, so it is
>         proposed to move
>
>         the initialization code into a special dynamically generated
>         object file
>
>         (referred to as 'wrapper object' here onwards), which, besides the
>
>         initialization code, will also contain embedded images for
>         offload targets.
>
>          
>
>         The wrapper object file will be generated by the clang driver
>         with the help of
>
>         a new tool: clang-offload-wrapper. This tool will take offload
>         target binaries
>
>         as input and produce bitcode files containing offload
>         initialization code and
>
>         embedded target images. The output bitcode is then passed to
>         the backend and
>
>         assembler tools from the host toolchain to produce the wrapper
>         object which is
>
>         then added as an input to the linker for host linking.
>
>          
>
>         The offload action builder in the clang driver needs to be
>         changed to use this
>
>         tool while building the actions graph for OpenMP offload
>         compilations.
>
>          
>
>         Current status
>
>          
>
>         A patch with initial implementation of the proposed changes
>         has been uploaded to
>
>         phabricator for review - https://reviews.llvm.org/D49510.
>
>          
>
>         Looking for a feedback for this proposal.
>
>          
>
>         Thanks,
>
>         Sergey
>
>          
>
>      
>
>  
>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180801/ff790ea3/attachment.html>


More information about the cfe-dev mailing list