<font size=2 face="sans-serif">Hi Serguei,</font><br><br><font size=2 face="sans-serif">Thanks a lot for the proposal.</font><br><br><font size=2 face="sans-serif">My proposal reworks a little bit the

way the OpenMP-NVPTX toolchain creates device object files: the device

specific part of the object is "wrapped" in an NVLINK-friendly

C++ structure that is then compiled for the host. The result is a host

object file with a device part which NVLINK can detect (D.o). The D.o object

file is then partially linked against the host object file H.o and thus

we obtain HD.o. This is required because compilation is required to produce

a single output object file (when doing "-c -o" for example).

HD.o can now be passed to NVLINK directly or put in a static library and

then passed to NVLINK. Either way, NVLINK will be able to detect the device

part (due to the special wrapping that we did previously) without the need

to "unbundle" the object file (prior to passing it to NVLINK).</font><br><br><font size=2 face="sans-serif">The reason why the clang-offload-bundler

is not involved in this is because we are using the standard object format

for the object file that the OpenMP-NVPTX toolchain outputs so there's

no need for a custom format in this case. The partial linking step is required

to put together the host and device object files and to ensure that only

one object file is produced even if we actually invoked two toolchains

(one for host and one for the device).</font><br><br><br><font size=2 face="sans-serif">Regarding your proposal, from your slides

I understand that you perform a partial linking step as the first action

for all object files and/or static libraries given as input. So this clang

invocation:</font><br><font size=2 face="sans-serif">clang++ -L. -labc test.cpp -o test</font><br><font size=2 face="sans-serif">would result in the same compilation

steps as the current Clang version performs because the initial stage of

partial linking would have no work to do (since there are no object files

present to be partially linked).</font><br><br><br><font size=2 face="sans-serif">Another question I have is regarding

the "ld -r" box in your slides.</font><br><font size=2 face="sans-serif">How does ld -r work with "bundled"

objects? Your diagram seems to imply that ld -r does the concatenation

of all device images out of the box. Is this accurate?</font><br><br><br><font size=2 face="sans-serif">Thanks a lot,</font><br><br><font size=2 face="sans-serif">--Doru</font><br><br><font size=2 face="sans-serif"><br></font><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">"Dmitriev, Serguei

N" <serguei.n.dmitriev@intel.com></font><br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">Jonas Hahnfeld <hahnjo@hahnjo.de></font><br><font size=1 color=#5f5f5f face="sans-serif">Cc:      

 </font><font size=1 face="sans-serif">"'cfe-dev@lists.llvm.org'"

<cfe-dev@lists.llvm.org>, Doru Bercea <gheorghe-teod.bercea@ibm.com></font><br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">08/15/2018 04:27 PM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">RE: [cfe-dev]

[OpenMP] offload support for static libraries</font><br><hr noshade><br><br><br><tt><font size=2>Hi Jonas,<br><br>I guess this patch implements the proposal which Doru presented on the

"OpenMP / HPC in Clang / LLVM Multi-company" meeting. As I remember

he suggested to eliminate use of clang-offload-bundler tool when offload

target is NTVPTX by replacing bundling operation with partial linking of

host and device objects, and then relying of the NVPTX linker to perform

the unbundling operation at link phase. Based on Doru's explanations NVPTX

linker "knows" how to extract device parts from such objects,

so the explicit unbundling operation in not required. Doru, please correct

me if my understanding is not fully accurate. Doru's proposal definitely

achieves the same goal for NVPTX offloading target (i.e. enables offload

in static libraries), but it is NVPTX specific and cannot be extended to

other offloading targets (at least that is how it looked like when Doru

described it).<br><br>I propose slightly different solution which I think should work for any

generic OpenMP offload target (it was also discussed on the OpenMP multi-company

meeting). In general case we have to use clang-offload-bundler because

we cannot assume that device object(s) can be bundled with the host object

by performing partial linking of host and device objects. So bundling and

unbundling operation will still be done by the clang-offload-bundler tool.

The main part of my suggestion is adding partial linking of fat objects

(created by offload bundler tool) and static libraries (which are composed

of fat objects) and only after that do the unbundling operation on the

partially linked object (followed by the appropriate link actions for all

offloading devices and then for the host). This would guarantee that device

parts of fat objects from static libraries will participate in the device

link actions, and thus would enable offloading for static libraries.<br><br>Thanks,<br>Serguei<br><br>-----Original Message-----<br>From: Jonas Hahnfeld [</font></tt><a href="mailto:hahnjo@hahnjo.de"><tt><font size=2>mailto:hahnjo@hahnjo.de</font></tt></a><tt><font size=2>]

<br>Sent: Tuesday, August 14, 2018 1:52 PM<br>To: Dmitriev, Serguei N <serguei.n.dmitriev@intel.com><br>Cc: 'cfe-dev@lists.llvm.org' <cfe-dev@lists.llvm.org>; Doru Bercea

<gheorghe-teod.bercea@ibm.com><br>Subject: Re: [cfe-dev] [OpenMP] offload support for static libraries<br><br>This proposal has already been proposed for NVPTX in </font></tt><a href="https://reviews.llvm.org/D47394"><tt><font size=2>https://reviews.llvm.org/D47394</font></tt></a><tt><font size=2>,

adding Doru.<br><br>Cheers,<br>Jonas<br><br>On 2018-08-14 18:43, Dmitriev, Serguei N via cfe-dev wrote:<br>> PROBLEM OVERVIEW<br>> <br>> OpenMP offload functionality is currently not supported in static

<br>> libraries. Because of that an attempt to use offloading in static

<br>> libraries ends up with a fallback execution of target regions on the

<br>> host. This limitation clearly has significant impact on OpenMP offload

<br>> usability.<br>> <br>> An output object file that is created by the compiler for offload

<br>> compilation is a fat object. Such object files besides the code for

<br>> the host architecture also contains code for the offloading targets

<br>> which is stored as data in ELF sections with predefined names. Thus,

a <br>> static library that is created from object files produced by offload

<br>> compilation would be an archive of fat objects.<br>> <br>> Clang driver currently never passes fat objects directly to any <br>> toolchain. Instead it performs an unbundling operation for each fat

<br>> object which extract host and device parts from the object. These

<br>> parts are then independently processed by the corresponding target

<br>> toolchains. However, current implementation does not assume that <br>> static archives may also be composed from fat objects. No unbundling

<br>> is done for static archives (they are passed to linker as is) and

thus <br>> device parts of objects from such archives get ignored.<br>> <br>> SUGGESTED SOLUTION<br>> <br>> It seems feasible to resolve this problem by changing the offload

link <br>> process - adding an extra step to the link flow which will do a <br>> partial linking (ld -r) of fat objects and static libraries as shown

<br>> on this diagram<br>> <br>> [Fat objects] \                

                / [Target1 link]

\<br>> <br>>                [Partial linking]

- [Unbundling] - [TargetN link] - <br>> [Host link]<br>> <br>> [Static libs] /                

                \--- Host part

--/<br>> <br>> (You can also look at the .pdf file on this link <br>> </font></tt><a href="https://drive.google.com/file/d/1ZTNoB-Ghin1BTaiZ312FMSRS6rISDtlr/view"><tt><font size=2>https://drive.google.com/file/d/1ZTNoB-Ghin1BTaiZ312FMSRS6rISDtlr/view</font></tt></a><tt><font size=2><br>> ?usp=sharing [1] for illustrations for the suggested change)<br>> <br>> Linker will pull in all necessary dependencies from static libraries

<br>> while performing partial linking, so the result of partial linking

<br>> would be a fat object with concatenated device parts from input fat

<br>> objects and required dependencies from static libraries. These <br>> concatenated device objects will be stored in the corresponding ELF

<br>> sections of the partially linked object.<br>> <br>> Unbundling operation on the partially linked object will create one

or <br>> more device objects for each offloading target, and these objects

will <br>> be linked by corresponding target toolchains the same way as it is

<br>> done now. Offload bundler tool would require enhancements to support

<br>> unbundling of multiple concatenated device objects for each offloading

<br>> target.<br>> <br>> Host link action can be changed to use host part of the partially

<br>> linked object while linking the final image.<br>> <br>> Do you see any potential problems in the proposed change?<br>> <br>> Links:<br>> ------<br>> [1]<br>> </font></tt><a href="https://drive.google.com/file/d/1ZTNoB-Ghin1BTaiZ312FMSRS6rISDtlr/view"><tt><font size=2>https://drive.google.com/file/d/1ZTNoB-Ghin1BTaiZ312FMSRS6rISDtlr/view</font></tt></a><tt><font size=2><br>> ?usp=sharing _______________________________________________<br>> cfe-dev mailing list<br>> cfe-dev@lists.llvm.org<br>> </font></tt><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev"><tt><font size=2>http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</font></tt></a><tt><font size=2><br><br></font></tt><br><br><BR>