<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Verdana",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
.MsoPapDefault
{mso-style-type:export-only;
margin-bottom:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:979074691;
mso-list-type:hybrid;
mso-list-template-ids:2004933710 134807553 134807555 134807557 134807553 134807555 134807557 134807553 134807555 134807557;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Jason,
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">It’s great that Google are interested in contributing to the development of LLVM in this area, and that you have code to support offload.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">However, I’m not sure that all of it is needed, since LLVM already has the offload library which has been being developed in the context of OpenMP, but actually
provides a general facility. It has been a part of LLVM since April 2014, and is already being used to offload to both Intel Xeon Phi and (at least NVidia) GPUs. (The IBM folks can tell you more about that!)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">The main difference I see (at a very first glance!) is that your StreamExecutor interfaces seem to be aimed more at end user code, whereas the interface to the
existing offload library has not been designed for the user, but to be an interface from the compiler. That has advantages and disadvantages
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Advantages:
<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-18.0pt;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-size:10.0pt;font-family:Symbol;color:#1F497D;mso-fareast-language:EN-US"><span style="mso-list:Ignore">·<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">It is a C level interface, so is callable from C,C++ and Fortran<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Disadvantages:<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-18.0pt;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-size:10.0pt;font-family:Symbol;color:#1F497D;mso-fareast-language:EN-US"><span style="mso-list:Ignore">·<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Using it directly from C++ user code may be harder than using StreamExecutor.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">However, there is nothing in the interface that prevents it from being used with CUDA or OpenCL, and it already seems to support the low level features you cited
as StreamExecutor’s advantages, though not the “looks just like CUDA” aspects, since it’s explicitly vendor neutral.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">> StreamExecutor:</span><span style="font-size:10.0pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">>
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">> * abstracts the underlying accelerator platform (avoids locking you into a<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">> single vendor, and lets you write code without thinking about which<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">> platform you'll be running on).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Liboffload does this (and has a specific design for how to abstract new devices and support them using device specific libraries).</span><span style="font-size:10.0pt;font-family:"Courier New";color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">> * provides an open-source alternative to the CUDA runtime library.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">I am not a CUDA expert, so I can’t comment on this! As before, IBM should comment.</span><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">> * gives users a stream management model whose terminology matches that of the CUDA programming model.</span><span style="font-size:10.0pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">This is not abstract, but seems CUDA target specific, which is, if anything, worrying for a supposedly vendor-neutral interface!</span><span style="font-size:10.0pt;font-family:"Courier New";color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">> * makes use of modern C++ to create a safe, efficient, easy-to-use programming interface.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">No, because liboffload is an implementation layer, not intended to be user-visible.</span><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">> StreamExecutor makes it easy to:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">>
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">> * move data between host and accelerator (and also between peer accelerators).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Liboffload supports this.</span><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">> * execute data-parallel kernels written in the OpenCL or CUDA kernel</span><span style="font-size:10.0pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB">
</span><span style="font-size:10.0pt;font-family:"Courier New";color:black">languages.</span><span style="font-size:10.0pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">I believe this should be easy; IBM can comment better, since they have been working on GPU support.</span><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">> * inspect the capabilities of a GPU-like device at runtime.</span><span style="font-size:10.0pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:black">> * manage multiple devices.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Liboffload supports this.</span><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">We’d therefore be very interested in seeing an approach that implemented a C++ specific user-friendly interface on top of the existing liboffload functionality,
but we don’t see a reason to rework the OpenMP implementation to use StreamExecutor (since what LLVM already has is working fine, and supporting offload to both GPUs and Xeon Phi).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">-- Jim<br>
<br>
James Cownie <<a href="mailto:james.h.cownie@intel.com">james.h.cownie@intel.com</a>><br>
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)</span><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-GB"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Tel: +44 117 9071438</span><span style="color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p>---------------------------------------------------------------------<br>
Intel Corporation (UK) Limited<br>
Registered No. 1134945 (England)<br>
Registered Office: Pipers Way, Swindon SN3 1RJ<br>
VAT No: 860 2173 47</p>
<p>This e-mail and any attachments may contain confidential material for<br>
the sole use of the intended recipient(s). Any review or distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.</p></body>
</html>