<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.hoenzb
{mso-style-name:hoenzb;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
.MsoPapDefault
{mso-style-type:export-only;
margin-left:46.2pt;
text-indent:-17.85pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:799957587;
mso-list-type:hybrid;
mso-list-template-ids:-1214880770 44585902 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
{mso-level-start-at:0;
mso-level-number-format:bullet;
mso-level-text:-;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-font-family:"Times New Roman";}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><a name="_MailEndCompose"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">We also may want to implement strided memory access on X86, masking allows to do this safely.<o:p></o:p></span></a></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">One day we’ll need to mask FP operations as a part of FP exception mode.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Arithmetic operations with saturation.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:36.0pt;text-indent:-18.0pt;mso-list:l0 level1 lfo1">
<![if !supportLists]><span style="font-family:"Calibri",sans-serif;color:#2F5496"><span style="mso-list:Ignore">-<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span dir="LTR"></span><b><i><span style="color:#2F5496"> Elena<o:p></o:p></span></i></b></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><a name="_____replyseparator"></a><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Michael Kuperstein [mailto:mkuper@google.com]
<br>
<b>Sent:</b> Monday, September 26, 2016 10:32<br>
<b>To:</b> Demikhovsky, Elena <elena.demikhovsky@intel.com><br>
<b>Cc:</b> Hal Finkel <hfinkel@anl.gov>; Zaks, Ayal <ayal.zaks@intel.com>; Adam Nemet (anemet@apple.com) <anemet@apple.com>; Sanjay Patel (spatel@rotateright.com) <spatel@rotateright.com>; Nadav Rotem <nadav.rotem@me.com>; llvm-dev <llvm-dev@lists.llvm.org><br>
<b>Subject:</b> Re: RFC: New intrinsics masked.expandload and masked.compressstore<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal" style="margin-left:46.2pt;text-indent:-17.85pt">In theory, we could offload several things to such a target plug-in, I'm just not entirely sure we want to.<o:p></o:p></p>
<div>
<p class="MsoNormal" style="margin-left:46.2pt;text-indent:-17.85pt"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:46.2pt;text-indent:-17.85pt">Two examples I can think of:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:46.2pt;text-indent:-17.85pt"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:46.2pt;text-indent:-17.85pt">1) This could be a better interface for masked load/stores and gathers.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:46.2pt;text-indent:-17.85pt"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:46.2pt;text-indent:-17.85pt">2) Horizontal reductions. I tried writing yet-another-horizontals-as-first-class-citizens proposal a couple of months ago, and the main problem from the previous discussions about this was
that there's no good common representation. E.g. should a horizontal add return a vector or a scalar, should it return the base type of the vector (assumes saturation) or a wider integer type, etc. With a plugin, we could have the vectorizer emit the right
target intrinsic, instead of the crazy backend pattern-matching we have now. <o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal" style="margin-left:46.2pt;text-indent:-17.85pt"><o:p> </o:p></p>
<div>
<p class="MsoNormal" style="margin-left:46.2pt;text-indent:-17.85pt">On Sun, Sep 25, 2016 at 9:28 PM, Demikhovsky, Elena <<a href="mailto:elena.demikhovsky@intel.com" target="_blank">elena.demikhovsky@intel.com</a>> wrote:<o:p></o:p></p>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<p class="MsoNormal" style="margin-left:46.2pt;text-indent:-17.85pt"><br>
|<br>
|Hi Elena,<br>
|<br>
|Technically speaking, this seems straightforward.<br>
|<br>
|I wonder, however, how target-independent this is in a practical<br>
|sense; will there be an efficient lowering when targeting any other<br>
|ISA? I don't want to get into the territory where, because the<br>
|vectorizer is supposed to be architecture independent, we need to<br>
|add target-independent intrinsics for all potentially-side-effect-<br>
|carrying idioms (or just complicated idioms) we want the vectorizer to<br>
|support on any target. Is there a way we can design the vectorizer so<br>
|that the targets can plug in their own idiom recognition for these<br>
|kinds of things, and then, via that interface, let the vectorizer produce<br>
|the relevant target-dependent intrinsics?<br>
<br>
Entering target specific plug-in in vectorizer may be a good idea. We need target specific pattern recognition and target specific implementation of “vectorizeMemoryInstruction”. (It may be more functionality in the future)<br>
TTI->checkAdditionalVectorizationOppotunities() - detects target specific patterns; X86 will find compress/expand and may be others<br>
TTI->vectorizeMemoryInstruction() - handle only exotic target-specific cases<br>
<br>
Pros:<br>
It will allow us to implement all X86 specific solutions.<br>
The expandload and compresssrore intrinsics may be x86 specific, polymorphic:<br>
llvm.x86.masked.expandload()<br>
llvm.x86.masked.compressstore()<br>
<br>
Cons:<br>
<br>
TTI will need to deal with Loop Info, SCEVs and other loop analysis info that it does not have today. (I do not like this way)<br>
Or we'll need to introduce TLV - Target Loop Vectorizer - a new class that handles all target specific cases. This solution seems more reasonable, but too heavy just for compress/expand.<br>
Do you see any other target plug-in solution?<br>
<span style="color:#888888"><br>
<span class="hoenzb">-Elena</span></span><o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="margin-left:46.2pt;text-indent:-17.85pt"><br>
|<br>
|Thanks again,<br>
|Hal<br>
|<br>
|----- Original Message -----<br>
|> From: "Elena Demikhovsky" <<a href="mailto:elena.demikhovsky@intel.com">elena.demikhovsky@intel.com</a>><br>
|> To: "llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>><br>
|> Cc: "Ayal Zaks" <<a href="mailto:ayal.zaks@intel.com">ayal.zaks@intel.com</a>>, "Michael Kuperstein"<br>
|<<a href="mailto:mkuper@google.com">mkuper@google.com</a>>, "Adam Nemet (<a href="mailto:anemet@apple.com">anemet@apple.com</a>)"<br>
|> <<a href="mailto:anemet@apple.com">anemet@apple.com</a>>, "Hal Finkel (<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>)"<br>
|<<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>>, "Sanjay Patel (<a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a>)"<br>
|> <<a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a>>, "Nadav Rotem"<br>
|<<a href="mailto:nadav.rotem@me.com">nadav.rotem@me.com</a>><br>
|> Sent: Monday, September 19, 2016 1:37:02 AM<br>
|> Subject: RFC: New intrinsics masked.expandload and<br>
|> masked.compressstore<br>
|><br>
|><br>
|> Hi all,<br>
|><br>
|> AVX-512 ISA introduces new vector instructions VCOMPRESS and<br>
|VEXPAND<br>
|> in order to allow vectorization of the following loops with two<br>
|> specific types of cross-iteration dependencies:<br>
|><br>
|> Compress:<br>
|> for (int i=0; i<N; ++i)<br>
|> If (t[i])<br>
|> *A++ = expr;<br>
|><br>
|> Expand:<br>
|> for (i=0; i<N; ++i)<br>
|> If (t[i])<br>
|> X[i] = *A++;<br>
|> else<br>
|> X[i] = PassThruV[i];<br>
|><br>
|> On this poster (<br>
|> <a href="http://llvm.org/devmtg/2013-11/slides/Demikhovsky-Poster.pdf" target="_blank">
http://llvm.org/devmtg/2013-11/slides/Demikhovsky-Poster.pdf</a> )<br>
|you’ll<br>
|> find depicted “compress” and “expand” patterns.<br>
|><br>
|> The RFC proposes to support this functionality by introducing two<br>
|> intrinsics to LLVM IR:<br>
|> llvm.masked.expandload.*<br>
|> llvm.masked.compressstore.*<br>
|><br>
|> The syntax of these two intrinsics is similar to the syntax of<br>
|> llvm.masked.load.* and masked.store.*, respectively, but the<br>
|semantics<br>
|> are different, matching the above patterns.<br>
|><br>
|> %res = call <16 x float> @llvm.masked.expandload.v16f32.p0f32<br>
|(float*<br>
|> %ptr, <16 x i1>%mask, <16 x float> %passthru) void<br>
|> @llvm.masked.compressstore.v16f32.p0f32 (<16 x float> <value>,<br>
|> float* <ptr>, <16 x i1> <mask>)<br>
|><br>
|> The arguments - %mask, %value and %passthru all have the same<br>
|vector<br>
|> length.<br>
|> The underlying type of %ptr corresponds to the scalar type of the<br>
|> vector value.<br>
|> (In brief; the full syntax description will be provided in subsequent<br>
|> full documentation.)<br>
|><br>
|> The intrinsics are planned to be target independent, similar to<br>
|> masked.load/store/gather/scatter. They will be lowered effectively<br>
|on<br>
|> AVX-512 and scalarized on other targets, also akin to masked.*<br>
|> intrinsics.<br>
|> Loop vectorizer will query TTI about existence of effective support<br>
|> for these intrinsics, and if provided will be able to handle loops<br>
|> with such cross-iteration dependences.<br>
|><br>
|> The first step will include the full documentation and<br>
|implementation<br>
|> of CodeGen part.<br>
|><br>
|> An additional information about expand load (<br>
|><br>
|<a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=" target="_blank">https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=</a><br>
|exp<br>
|> andload&techs=AVX_512<br>
|> ) and compress store (<br>
|><br>
|<a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=" target="_blank">https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=</a><br>
|com<br>
|> pressstore&techs=AVX_512<br>
|> ) you also can find in the Intel Intrinsic Guide.<br>
|><br>
|><br>
|> * Elena<br>
|><br>
|> ---------------------------------------------------------------------<br>
|> Intel Israel (74) Limited<br>
|><br>
|> This e-mail and any attachments may contain confidential material<br>
|for<br>
|> the sole use of the intended recipient(s). Any review or distribution<br>
|> by others is strictly prohibited. If you are not the intended<br>
|> recipient, please contact the sender and delete all copies.<br>
|<br>
|--<br>
|Hal Finkel<br>
|Lead, Compiler Technology and Programming Languages Leadership<br>
|Computing Facility Argonne National Laboratory<br>
---------------------------------------------------------------------<br>
Intel Israel (74) Limited<br>
<br>
This e-mail and any attachments may contain confidential material for<br>
the sole use of the intended recipient(s). Any review or distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.<o:p></o:p></p>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal" style="margin-left:46.2pt;text-indent:-17.85pt"><o:p> </o:p></p>
</div>
</div>
</div>
<p>---------------------------------------------------------------------<br>
Intel Israel (74) Limited</p>
<p>This e-mail and any attachments may contain confidential material for<br>
the sole use of the intended recipient(s). Any review or distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.</p></body>
</html>