<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<div class="moz-cite-prefix">On 08/02/2018 10:32 AM, Dmitriev,
Serguei N wrote:<br>
</div>
<blockquote type="cite"
cite="mid:1112AE43C04F2E428633A4D42126DB32ACB894E0@ORSMSX111.amr.corp.intel.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;
color:black;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman",serif;
color:black;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";
color:black;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;
color:black;}
span.EmailStyle20
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle21
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle22
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle23
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle24
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle25
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">Hi Hal,<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">The offload initialization code is pretty
simple, in pseudo code it would look like this<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">// The device image information.<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">struct __tgt_device_image {<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"> void *ImageStart; // Pointer
to the target code start<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"> void *ImageEnd; // Pointer
to the target code end<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">...<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">};<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">// Target binary descriptor.<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">struct __tgt_bin_desc {<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"> int32_t NumDeviceImages; // Number
of device images<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"> __tgt_device_image *DeviceImages; // Array
of device images<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">...<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">};<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">// External symbols for start/end addresses for
all N target images.<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">// These symbols are defined by the linker
script which is dynamically<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">// generated by the clang driver for host link
action.<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">extern char ImageStart1[];<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">extern char ImageEnd1[];<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">...<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">extern char ImageStartN[];<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">extern char ImageEndN[];<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">static __tgt_device_image TargetImages[] = {<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"> { ImageStart1, ImageEnd1, ...},<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">...<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"> { ImageStartN, ImageEndN, ...},<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">};<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">static __tgt_device_image BinaryDesc = {<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">
sizeof(TargetImages)/sizeof(__tgt_device_image), //
NumDeviceImages<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"> TargetImages,
// DeviceImages<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">...<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">};<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">// Constructor && destructor which
registers/unregisters device binaries<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">static void registerBinaryDesc()
__attribute__((constructor(0))) {<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"> __tgt_register_lib(&BinaryDesc);<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">}<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">static void unregisterBinaryDesc()
__attribute__((destructor(0))) {<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US"> __tgt_unregister_lib(&BinaryDesc);<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span
style="font-size:9.5pt;font-family:Consolas;background:white;mso-highlight:white"
lang="EN-US">}<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">Front end adds such code to every host object
containing omp offload constructs. It is sufficient to
execute it once at program startup, so it is created in
comdat for efficiency. And for the current implementation it
is Ok to have it in comdat because changing the list of
offload targets between objects is now allowed now.
Therefore all host objects are expected to have the same
init code and thus it does not matter what instance is
eventually linked in.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">Technically, making offload init code
non-comdat together with Alexey’s fix which adds weak
linkage to external references from init code would resolve
this problem. That would force offload init code from all
objects to be executed on startup (not just a single
instance) and thus all target images will eventually be
registered, but it would be less efficient than executing
only one instance of init code.</span></p>
</div>
</blockquote>
<br>
Thanks for the details. Would my suggestion of hashing/encoding the
list of offloading targets into the comdat name/key also work? This
seems like it would be a simple solution and would limit the number
of initialization calls to one per unique target combination (which
should be no more than a few).<br>
<br>
-Hal<br>
<br>
<blockquote type="cite"
cite="mid:1112AE43C04F2E428633A4D42126DB32ACB894E0@ORSMSX111.amr.corp.intel.com">
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">What I suggested below is completely removing
offload init code from host objects and moving it to a
separate object (I called it wrapper object) which is
dynamically created by the clang driver for host link action
with a help of the new clang-offload-wrapper tool. This
object will have all target images as data and offload init
code which registers the images. Such change would make host
objects completely independent from offloading targets that
were specified at compile time. And offload initialization
will still be efficient since it will be done only once.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">Thanks,<o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"
lang="EN-US">Serguei<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><a name="_____replyseparator"
moz-do-not-send="true"></a><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"
lang="EN-US">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"
lang="EN-US"> Hal Finkel [<a class="moz-txt-link-freetext" href="mailto:hfinkel@anl.gov">mailto:hfinkel@anl.gov</a>] <br>
<b>Sent:</b> Wednesday, August 1, 2018 1:51 PM<br>
<b>To:</b> Dmitriev, Serguei N
<a class="moz-txt-link-rfc2396E" href="mailto:serguei.n.dmitriev@intel.com"><serguei.n.dmitriev@intel.com></a>; Alexey Bataev
<a class="moz-txt-link-rfc2396E" href="mailto:a.bataev@outlook.com"><a.bataev@outlook.com></a><br>
<b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>
<b>Subject:</b> Re: [cfe-dev] [RFC][OpenMP] Usability
improvement, allow dropping offload targets<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span lang="EN-US">On 08/01/2018 03:32
PM, Dmitriev, Serguei N wrote:<o:p></o:p></span></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">Hi Alexey,</span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"> </span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">Empty object file produced by the bundler is
one of the problems with that example, but I was talking
about the different issue which is related to the offload
initialization code.</span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"> </span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">Front end, as part of the generating offload
initialization code, creates the target binary descriptor
object which is passed to the libomptarget registration
API. The binary descriptor object besides the start/end
addresses of all target images contains the number of
target images which need to be registered and compiler
initializes this field to the number of offload targets
that were specified in command line. So, in that example
below, both a.o and b.o will have initialization code for
registering only one target image because only one offload
target was specified in command line. The initialization
code is generated as a comdat group, so linker will choose
either a.o’s or b.o’s instance of this code at link stage,
but in any case it will register only one target image
instead of two. So that example will not work as expected
even if offload bundler problem is fixed. Delaying
generation of the offload initialization code till link
time would resolve this issue.</span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"> </span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">I suspect there would also be problems with
the offload entry table besides these two issues. The
number of offload entries on the host side and target
images won’t match, and I guess libomptarget cannot handle
this correctly, so I assume runtime would require some
changes as well.</span><span lang="EN-US"><o:p></o:p></span></p>
</blockquote>
<p class="MsoNormal"><span lang="EN-US"><br>
That certainly seems like a problem. We don't want varying
lists of offloading targets between objects to create
implicit ODR problems. Is the initialization code/tables
large? Is it important that they're in comdat?
Alternatively, we could hash/encode the list of targets in
the comdat key, so we'll only get one copy of the init code
per unique combination. That might be better than actually
having each object contain an initializer?<br>
<br>
-Hal<br>
<br>
<br>
<o:p></o:p></span></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"> </span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">Thanks,</span><span lang="EN-US"><o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"
lang="EN-US">Serguei</span><span lang="EN-US"><o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"> </span><span lang="EN-US"><o:p></o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"
lang="EN-US">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"
lang="EN-US"> Alexey Bataev [</span><a
href="mailto:a.bataev@outlook.com"
moz-do-not-send="true"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"
lang="EN-US">mailto:a.bataev@outlook.com</span></a><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"
lang="EN-US">]
<br>
<b>Sent:</b> Wednesday, August 1, 2018 11:45 AM<br>
<b>To:</b> Dmitriev, Serguei N </span><a
href="mailto:serguei.n.dmitriev@intel.com"
moz-do-not-send="true"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"
lang="EN-US"><serguei.n.dmitriev@intel.com></span></a><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"
lang="EN-US"><br>
<b>Cc:</b> </span><a
href="mailto:cfe-dev@lists.llvm.org"
moz-do-not-send="true"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"
lang="EN-US">cfe-dev@lists.llvm.org</span></a><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"
lang="EN-US">; Hal Finkel </span><a
href="mailto:hfinkel@anl.gov" moz-do-not-send="true"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"
lang="EN-US"><hfinkel@anl.gov></span></a><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"
lang="EN-US"><br>
<b>Subject:</b> Re: [cfe-dev] [RFC][OpenMP] Usability
improvement, allow dropping offload targets</span><span
lang="EN-US"><o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p><span lang="EN-US">Hi Serguei,<o:p></o:p></span></p>
<p><span lang="EN-US">I don't see a lot of problems with this
example. As I can see there is only one problem:
clang-offload-bundler generates an empty object file that
cannot be recognized by the linker. If we teach
clang-offload-bundler to generate correct empty object
files (what we need to do anyway, because currently it may
produce wrong utout), your example will work without any
changes.
<o:p></o:p></span></p>
<pre><span lang="EN-US">-------------<o:p></o:p></span></pre>
<pre><span lang="EN-US">Best regards,<o:p></o:p></span></pre>
<pre><span lang="EN-US">Alexey Bataev<o:p></o:p></span></pre>
<div>
<p class="MsoNormal"><span lang="EN-US">31.07.2018 16:50,
Dmitriev, Serguei N </span>
пишет<span lang="EN-US">:<o:p></o:p></span></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">Hi Alexey,</span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"> </span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">Such change would fix the link issue, but I
believe it would be a short term solution that will
still be revised in future.</span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"> </span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">Let me explain my concerns. Current
implementation has one more limitation which I assume
would also be addressed in future – target binaries are
expected to have entries for all OpenMP target regions
in the program, though it seems to be too restrictive. I
assume there would be use cases when you would want to
tune target regions in your program for particular
targets and offloading to other targets would not make
much sense for those regions (or probably won’t even be
possible due to limitations of a particular target). It
seems reasonable to compile those region only for the
targets they were tuned for, thus I assume compiler will
support the following usage model in future</span><span
lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"> </span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">clang -fopenmp
-fopenmp-targets=nvptx64-nvidia-cuda -c a.c</span><span
lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">clang -fopenmp
-fopenmp-targets=x86_64-pc-linux-gnu -c b.c</span><span
lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">clang -fopenmp
-fopenmp-targets=nvptx64-nvidia-cuda,x86_64-pc-linux-gnu
a.o b.o</span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"> </span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">And such usage model would anyway require
redesigning the way how offload initialization code is
generated. It has to be delayed till link time because
the final set of offload targets that need to be
registered in runtime would be known only at link step
and thus compiler cannot create correct target binary
descriptor object (which is a part of the offload
initialization code) at compile time as it is done now.</span><span
lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"> </span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US">Does that sound reasonable?</span><span
lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"> </span><span lang="EN-US"><o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"
lang="EN-US">Thanks,</span><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"
lang="EN-US">Serguei</span><span lang="EN-US"><o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"
lang="EN-US"> </span><span lang="EN-US"><o:p></o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"
lang="EN-US">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"
lang="EN-US"> Alexey Bataev [</span><a
href="mailto:a.bataev@outlook.com"
moz-do-not-send="true"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"
lang="EN-US">mailto:a.bataev@outlook.com</span></a><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"
lang="EN-US">]
<br>
<b>Sent:</b> Tuesday, July 31, 2018 9:55 AM<br>
<b>To:</b> Dmitriev, Serguei N </span><a
href="mailto:serguei.n.dmitriev@intel.com"
moz-do-not-send="true"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"
lang="EN-US"><serguei.n.dmitriev@intel.com></span></a><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"
lang="EN-US">;
</span><a href="mailto:cfe-dev@lists.llvm.org"
moz-do-not-send="true"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"
lang="EN-US">cfe-dev@lists.llvm.org</span></a><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:windowtext"
lang="EN-US"><br>
<b>Subject:</b> Re: [cfe-dev] [RFC][OpenMP]
Usability improvement, allow dropping offload
targets</span><span lang="EN-US"><o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
<p><span lang="EN-US">Hi Serguei,<o:p></o:p></span></p>
<p><span lang="EN-US">Actually your problem can be fixed
easily with a simple patch that changes the linkage of
the
<o:p></o:p></span></p>
<pre><span lang="EN-US">`.omp_offloading.img_[start|end]` symbols from external to external weak. After this change your example compiles and works perfectly without any additional changes. I'm going to commit this patch in few minutes.<o:p></o:p></span></pre>
<pre><span lang="EN-US">-------------<o:p></o:p></span></pre>
<pre><span lang="EN-US">Best regards,<o:p></o:p></span></pre>
<pre><span lang="EN-US">Alexey Bataev<o:p></o:p></span></pre>
<div>
<p class="MsoNormal"><span lang="EN-US">30.07.2018 19:50,
Dmitriev, Serguei N via cfe-dev
</span>пишет<span lang="EN-US">:<o:p></o:p></span></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre><span lang="EN-US">Motivation<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">The existing OpenMP offloading implementation in clang does not allow dropping<o:p></o:p></span></pre>
<pre><span lang="EN-US">offload targets at link time. That is, if an object file is created with one set<o:p></o:p></span></pre>
<pre><span lang="EN-US">of offload targets you must use exactly the same set of offload targets at the<o:p></o:p></span></pre>
<pre><span lang="EN-US">link stage. Otherwise, linking will fail<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">$ clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu,nvptx64-nvidia-cuda foo.c -c<o:p></o:p></span></pre>
<pre><span lang="EN-US">$ clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu foo.o<o:p></o:p></span></pre>
<pre><span lang="EN-US">/tmp/foo-dd79f7.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg]+0x20): undefined reference to `.omp_offloading.img_start.nvptx64-nvidia-cuda'<o:p></o:p></span></pre>
<pre><span lang="EN-US">/tmp/foo-dd79f7.o:(.rodata..omp_offloading.device_images[.omp_offloading.descriptor_reg]+0x28): undefined reference to `.omp_offloading.img_end.nvptx64-nvidia-cuda'<o:p></o:p></span></pre>
<pre><span lang="EN-US">clang-7: error: linker command failed with exit code 1 (use -v to see invocation)<o:p></o:p></span></pre>
<pre><span lang="EN-US">$ <o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">This limits OpenMP offload usability. So far, this has not been a high priority<o:p></o:p></span></pre>
<pre><span lang="EN-US">issue but the importance of this problem will grow once clang offload starts<o:p></o:p></span></pre>
<pre><span lang="EN-US">supporting static libraries with offload functionality. For instance, this<o:p></o:p></span></pre>
<pre><span lang="EN-US">limitation won't allow creating general purpose static libraries targeting<o:p></o:p></span></pre>
<pre><span lang="EN-US">multiple types of offload devices and later linking them into a program that<o:p></o:p></span></pre>
<pre><span lang="EN-US">uses only one offload target.<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">Problem description<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">Offload targets cannot be dropped at the link phase because object files<o:p></o:p></span></pre>
<pre><span lang="EN-US">produced by the compiler for the host have dependencies on the offload targets<o:p></o:p></span></pre>
<pre><span lang="EN-US">specified during compilation. These dependencies arise from the offload<o:p></o:p></span></pre>
<pre><span lang="EN-US">initialization code.<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">The clang front-end adds offload initialization code to each host object in<o:p></o:p></span></pre>
<pre><span lang="EN-US">addition to all necessary processing of OpenMP constructs. This initialization<o:p></o:p></span></pre>
<pre><span lang="EN-US">code is intended to register target binaries for all offload targets in the<o:p></o:p></span></pre>
<pre><span lang="EN-US">runtime library at program startup. This code consists of two compiler-generated<o:p></o:p></span></pre>
<pre><span lang="EN-US">routines. One of these routines is added to the list of global constructors and<o:p></o:p></span></pre>
<pre><span lang="EN-US">the other to the global destructors. The constructor routine calls a<o:p></o:p></span></pre>
<pre><span lang="EN-US">libomptarget API which registers the target binaries and the destructor<o:p></o:p></span></pre>
<pre><span lang="EN-US">correspondingly calls a similar API for unregistering target binaries.<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">Both these APIs accept a pointer to the target binary descriptor object which<o:p></o:p></span></pre>
<pre><span lang="EN-US">specifies the number of offload target binaries to register and the start/end<o:p></o:p></span></pre>
<pre><span lang="EN-US">addresses of target binary images. Since the start/end addresses of target<o:p></o:p></span></pre>
<pre><span lang="EN-US">binaries are not available at compile time, the target binary descriptors are<o:p></o:p></span></pre>
<pre><span lang="EN-US">initialized using link-time constants which reference (undefined) symbols<o:p></o:p></span></pre>
<pre><span lang="EN-US">containing the start/end addresses of all target images. These symbols are<o:p></o:p></span></pre>
<pre><span lang="EN-US">created by the dynamically-generated linker script which the clang driver<o:p></o:p></span></pre>
<pre><span lang="EN-US">creates for the host link action.<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">References to the target specific symbols from host objects make them dependent<o:p></o:p></span></pre>
<pre><span lang="EN-US">on particular offload targets and prevents dropping offload targets at the link<o:p></o:p></span></pre>
<pre><span lang="EN-US">step. Therefore, the OpenMP offload initialization needs to be redesigned to<o:p></o:p></span></pre>
<pre><span lang="EN-US">make offload targets discardable.<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">Proposed change<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">Host objects should be independent of offload targets in order to allow dropping<o:p></o:p></span></pre>
<pre><span lang="EN-US">code for offload targets. That can be achieved by removing offload<o:p></o:p></span></pre>
<pre><span lang="EN-US">initialization code from host objects. The compiler should not inject this code<o:p></o:p></span></pre>
<pre><span lang="EN-US">into host objects.<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">However, offload initialization should still be done, so it is proposed to move<o:p></o:p></span></pre>
<pre><span lang="EN-US">the initialization code into a special dynamically generated object file<o:p></o:p></span></pre>
<pre><span lang="EN-US">(referred to as 'wrapper object' here onwards), which, besides the<o:p></o:p></span></pre>
<pre><span lang="EN-US">initialization code, will also contain embedded images for offload targets.<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">The wrapper object file will be generated by the clang driver with the help of<o:p></o:p></span></pre>
<pre><span lang="EN-US">a new tool: clang-offload-wrapper. This tool will take offload target binaries<o:p></o:p></span></pre>
<pre><span lang="EN-US">as input and produce bitcode files containing offload initialization code and<o:p></o:p></span></pre>
<pre><span lang="EN-US">embedded target images. The output bitcode is then passed to the backend and<o:p></o:p></span></pre>
<pre><span lang="EN-US">assembler tools from the host toolchain to produce the wrapper object which is<o:p></o:p></span></pre>
<pre><span lang="EN-US">then added as an input to the linker for host linking.<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">The offload action builder in the clang driver needs to be changed to use this<o:p></o:p></span></pre>
<pre><span lang="EN-US">tool while building the actions graph for OpenMP offload compilations.<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">Current status<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">A patch with initial implementation of the proposed changes has been uploaded to<o:p></o:p></span></pre>
<pre><span lang="EN-US">phabricator for review - </span><a href="https://reviews.llvm.org/D49510" moz-do-not-send="true"><span lang="EN-US">https://reviews.llvm.org/D49510</span></a><span lang="EN-US">.<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">Looking for a feedback for this proposal.<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
<pre><span lang="EN-US">Thanks,<o:p></o:p></span></pre>
<pre><span lang="EN-US">Sergey<o:p></o:p></span></pre>
<pre><span lang="EN-US"> <o:p></o:p></span></pre>
</blockquote>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
</blockquote>
<p class="MsoNormal"><span lang="EN-US"> <o:p></o:p></span></p>
</blockquote>
<p class="MsoNormal"><span lang="EN-US"><br>
<br>
<o:p></o:p></span></p>
<pre><span lang="EN-US">-- <o:p></o:p></span></pre>
<pre><span lang="EN-US">Hal Finkel<o:p></o:p></span></pre>
<pre><span lang="EN-US">Lead, Compiler Technology and Programming Languages<o:p></o:p></span></pre>
<pre><span lang="EN-US">Leadership Computing Facility<o:p></o:p></span></pre>
<pre><span lang="EN-US">Argonne National Laboratory<o:p></o:p></span></pre>
</div>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</body>
</html>