<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:"Courier New";}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:"Calibri",sans-serif;}
span.EmailStyle23
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle24
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle25
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle26
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle27
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle28
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle29
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle30
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:630597658;
mso-list-type:hybrid;
mso-list-template-ids:-2036853824 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:.25in;
text-indent:-.25in;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:.75in;
text-indent:-.25in;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:1.25in;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:1.75in;
text-indent:-.25in;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:2.25in;
text-indent:-.25in;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:2.75in;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:3.25in;
text-indent:-.25in;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:3.75in;
text-indent:-.25in;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:4.25in;
text-indent:-9.0pt;}
@list l1
{mso-list-id:796216097;
mso-list-type:hybrid;
mso-list-template-ids:1016515872 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l1:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:.25in;
text-indent:-.25in;}
@list l1:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:.75in;
text-indent:-.25in;}
@list l1:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:1.25in;
text-indent:-9.0pt;}
@list l1:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:1.75in;
text-indent:-.25in;}
@list l1:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:2.25in;
text-indent:-.25in;}
@list l1:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:2.75in;
text-indent:-9.0pt;}
@list l1:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:3.25in;
text-indent:-.25in;}
@list l1:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:3.75in;
text-indent:-.25in;}
@list l1:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:4.25in;
text-indent:-9.0pt;}
@list l2
{mso-list-id:908001806;
mso-list-type:hybrid;
mso-list-template-ids:-1471117948 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l2:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:.25in;
text-indent:-.25in;}
@list l2:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:.75in;
text-indent:-.25in;}
@list l2:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:1.25in;
text-indent:-9.0pt;}
@list l2:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:1.75in;
text-indent:-.25in;}
@list l2:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:2.25in;
text-indent:-.25in;}
@list l2:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:2.75in;
text-indent:-9.0pt;}
@list l2:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:3.25in;
text-indent:-.25in;}
@list l2:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:3.75in;
text-indent:-.25in;}
@list l2:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:4.25in;
text-indent:-9.0pt;}
@list l3
{mso-list-id:1045060171;
mso-list-type:hybrid;
mso-list-template-ids:-2036853824 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l3:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:.25in;
text-indent:-.25in;}
@list l3:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:.75in;
text-indent:-.25in;}
@list l3:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:1.25in;
text-indent:-9.0pt;}
@list l3:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:1.75in;
text-indent:-.25in;}
@list l3:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:2.25in;
text-indent:-.25in;}
@list l3:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:2.75in;
text-indent:-9.0pt;}
@list l3:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:3.25in;
text-indent:-.25in;}
@list l3:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:3.75in;
text-indent:-.25in;}
@list l3:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:4.25in;
text-indent:-9.0pt;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="color:#1F497D">Thx Ayal.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Zaks, Ayal [mailto:ayal.zaks@intel.com] <br>
<b>Sent:</b> Thursday, September 22, 2016 4:25 PM<br>
<b>To:</b> Das, Dibyendu <Dibyendu.Das@amd.com><br>
<b>Cc:</b> llvm-dev@lists.llvm.org<br>
<b>Subject:</b> RE: Extending LV to vectorize outerloops<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="color:#1F497D">Sure Dibyendu. The modelling of the control-flow inside the vectorized loop is designed to support both cases where this control-flow is mandatory and cases where it’s the result of optimization. The former include
<a name="_MailEndCompose">predicated scalarized instructions, and inner loops when vectorizing an outer loop. The latter include various static and dynamic techniques you referred to, and possibly others such as our “Predicate Vectors If You Must”. Indeed as
you point out these are subject to cost/benefit considerations, similar to many other optimizations; these should fit within the proposed “Plan Optimize Step 2.b” and “Cost Step 3”.</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><a name="_____replyseparator"></a><b>From:</b> Das, Dibyendu [<a href="mailto:Dibyendu.Das@amd.com">mailto:Dibyendu.Das@amd.com</a>]
<br>
<b>Sent:</b> Thursday, September 22, 2016 09:31<br>
<b>To:</b> Zaks, Ayal <<a href="mailto:ayal.zaks@intel.com">ayal.zaks@intel.com</a>><br>
<b>Cc:</b> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
<b>Subject:</b> RE: Extending LV to vectorize outerloops<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="color:#1F497D">Hi Ayal-<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="color:#1F497D">Can you elaborate a bit more on the modelling of the control flow for innermost loop vectorization ? Especially I am interested in knowing whether you are tending to a full mask-based
approach as suggested by Karrenberg et al or you will evaluate ( as a cost/benefit ) models like BOSCC ( Shin et al ) or kind-of-BOSCC models by El Hajj (</span><span style="font-size:10.0pt">DYNAMIC LOOP VECTORIZATION FOR EXECUTING OPENCL KERNELS ON CPUS<span style="color:#1F497D">
</span></span><span style="color:#1F497D">). Because different vectorization factors, ISA overheads and control flow probabilities may dictate which one may turn out to be the best for a particular situation.<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="color:#1F497D">-Thx<o:p></o:p></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="color:#1F497D">Dibyendu</span><span style="font-size:10.0pt"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> llvm-dev [<a href="mailto:llvm-dev-bounces@lists.llvm.org">mailto:llvm-dev-bounces@lists.llvm.org</a>]
<b>On Behalf Of </b>Zaks, Ayal via llvm-dev<br>
<b>Sent:</b> Wednesday, September 21, 2016 11:21 PM<br>
<b>To:</b> LLVM Dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>><br>
<b>Subject:</b> [llvm-dev] RFC: Extending LV to vectorize outerloops<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoPlainText"><i><span style="font-size:10.0pt;font-family:"Courier New"">Proposal for extending the Loop Vectorizer to handle Outer Loops<o:p></o:p></span></i></p>
<p class="MsoPlainText"><i><span style="font-size:10.0pt;font-family:"Courier New"">================================================================<o:p></o:p></span></i></p>
<p class="MsoPlainText"><i><span style="font-size:10.0pt;font-family:"Courier New"">Goal:<o:p></o:p></span></i></p>
<p class="MsoPlainText"><i><span style="font-size:10.0pt;font-family:"Courier New"">-----<o:p></o:p></span></i></p>
<p class="MsoPlainText"><i><span style="font-size:10.0pt;font-family:"Courier New"">We propose to extend the innermost Loop Vectorizer to also handle outerloops (cf.[1]). Our aim is to best leverage the efforts already invested in the existing innermost Loop
Vectorizer rather than introduce a separate pass dedicated to outerloop vectorization. This proposal will support explicit vector programming of loops and functions [2]. It also facilitates evaluating multiple vectorization candidates, including both outer
and innermost loops, to choose the most profitable ones.<o:p></o:p></span></i></p>
<p class="MsoPlainText"><i><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New"">Background:<o:p></o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New"">-----------<o:p></o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New"">The current Loop Vectorizer is confined to handle only innermost loops. In order to extend it to also handle outerloops the following design decisions need to be reworked:<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.25in;text-indent:-.25in;mso-list:l1 level1 lfo2">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">1.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">The resulting vectorized loop is inherently designed to contain a *<b>single</b>* basic block. This poses an issue today, as innermost loops may benefit from retaining
some internal branches when vectorized. For outerloops this clearly cannot hold – the resulting vectorized loop will contain more than a single basic block as it will contain innerloops.<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.25in;text-indent:-.25in;mso-list:l1 level1 lfo2">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">2.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">There is inherently a single vectorization candidate with a single dimension of optimization – namely the Vectorization Factor and/or Unrolling Factor of the innermost
loop. When dealing with outerloops it is important to evaluate multiple vectorization candidates – including both outer and inner loops, and their respective VF/UF’s.<o:p></o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New"">The current Loop Vectorizer works in the following three main steps:<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.25in;text-indent:-.25in;mso-list:l0 level1 lfo4">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">1.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">Legal Step: check if loop can be legally vectorized; encode constraints and artifacts if so.<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.25in;text-indent:-.25in;mso-list:l0 level1 lfo4">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">2.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">Cost Step: compute the relative cost of vectorizing it along possible vectorization and unroll factors.<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.25in;text-indent:-.25in;mso-list:l0 level1 lfo4">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">3.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">Transform Step: vectorize the loop according to best vectorization and unroll factors.<o:p></o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New"">This design has some implications:<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.25in;text-indent:-.25in;mso-list:l2 level1 lfo6">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">1.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">Cost Step tries to predict what the vectorized loop will look like and how much it will cost, independently of what the Transform Step will eventually do. It’s hard
to keep the two in sync, enforcing the comment placed at the beginning of Transform Step:<br>
// Notice: any optimization or new instruction that go<br>
// into the code below should be also be implemented in<br>
// the cost-model.<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.25in;text-indent:-.25in;mso-list:l2 level1 lfo6">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">2.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">Legal Step does more than check for vectorizability; e.g., it records auxiliary artifacts such as collectLoopUniforms() and InterleaveInfo.<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.25in;text-indent:-.25in;mso-list:l2 level1 lfo6">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">3.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">Transform Step first populates the single basic block of the vectorized loop and later revisits scalarized instructions to predicate them one by one, as needed.<o:p></o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New"">Proposal: introduce the Vectorization Plan as an explicit model of a vectorization candidate and update the overall flow:<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.25in;text-indent:-.25in;mso-list:l3 level1 lfo8">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">1.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">Legal Step: check if loop can be legally vectorized, encode constraints by initiating Vectorization Plan(s) if so.<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.25in;text-indent:-.25in;mso-list:l3 level1 lfo8">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">2.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">Plan Step:<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.75in;text-indent:-.25in;mso-list:l3 level2 lfo8">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">a.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">Build initial vectorization plans complying with all legal constraints.<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.75in;text-indent:-.25in;mso-list:l3 level2 lfo8">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">b.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">Optimize vectorization plans.
<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.25in;text-indent:-.25in;mso-list:l3 level1 lfo8">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">3.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">Cost Step: compute the relative cost of each plan. This step can be applied repeatedly by Plan Optimize Step 2.b.<o:p></o:p></span></i></p>
<p class="MsoListParagraph" style="margin-left:.25in;text-indent:-.25in;mso-list:l3 level1 lfo8">
<![if !supportLists]><i><span style="font-size:10.0pt;font-family:"Courier New""><span style="mso-list:Ignore">4.<span style="font:7.0pt "Times New Roman"">
</span></span></span></i><![endif]><i><span style="font-size:10.0pt;font-family:"Courier New"">Transform Step: materialize the best plan. Note that only this step modifies the IR, as in the current Loop Vectorizer.<o:p></o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New"">The Vectorization Plan is a recipe describing the vectorization candidate, including for instance its internal control-flow. It serves for both estimating the cost and for performing
the translation, and facilitates dealing with multiple vectorization candidates. One can compare with LLVM’s existing SLP vectorizer, where TSLP [3] adds step 2.b.<o:p></o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New"">As the scope of vectorization grows from innermost to outer loops, so do the uncertainty and complexity of each step. One way to mitigate the shortcomings of the Legal and Cost
steps is to rely on programmers to indicate which loops can and/or should be vectorized. This is implicit for certain loops in data-parallel languages such as OpenCL [4,5] and explicit in others such as OpenMP [6]. This proposal to extend the Loop Vectorizer
to outer loops supports and raises the importance of explicit vectorization beyond the current capabilities of Clang and LLVM. Namely, from currently forcing the vectorization of innermost loops according to prescribed width and/or interleaving count, to supporting
OpenMP’s “#pragma omp simd” construct and associated clauses, including our earlier proposal for vectorizing across function boundaries [2].<o:p></o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New"">Comments on this overall approach are welcome. The first patches we’re working on are designed to have the innermost Loop Vectorizer explicitly model the control flow of its vectorized
loop. More will be presented in our technical talk at the upcoming LLVM Developers' Meeting.<o:p></o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></i></p>
<p class="MsoPlainText"><i><span style="font-size:10.0pt;font-family:"Courier New"">References:<o:p></o:p></span></i></p>
<p class="MsoPlainText"><i><span style="font-size:10.0pt;font-family:"Courier New"">-----------<o:p></o:p></span></i></p>
<p class="MsoPlainText"><i><span style="font-size:10.0pt;font-family:"Courier New"">[1] “Outer-loop vectorization: revisited for short SIMD architectures”, Dorit Nuzman and Ayal Zaks, PACT 2008.<o:p></o:p></span></i></p>
<p class="MsoPlainText" style="margin-right:1.5pt"><i><span style="font-size:10.0pt;font-family:"Courier New"">[2] “Proposal for function vectorization and loop vectorization with function calls”, Xinmin Tian, [cfe-dev] March 2, 2016 (</span></i><a href="http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html"><i><span style="font-size:10.0pt;font-family:"Courier New";color:windowtext;text-decoration:none">http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html</span></i></a><i><span style="font-size:10.0pt;font-family:"Courier New"">.
See also </span></i><a href="https://reviews.llvm.org/D22792"><i><span style="font-size:10.0pt;font-family:"Courier New";color:windowtext;text-decoration:none">https://reviews.llvm.org/D22792</span></i></a><i><span style="font-size:10.0pt;font-family:"Courier New"">).<o:p></o:p></span></i></p>
<p class="MsoPlainText" style="margin-right:1.5pt"><i><span style="font-size:10.0pt;font-family:"Courier New"">[3] “Throttling Automatic Vectorization: When Less is More”, Vasileios Porpodas and Tim Jones, PACT 2015 and LLVM Developers' Meeting 2015.<o:p></o:p></span></i></p>
<p class="MsoPlainText"><i><span style="font-size:10.0pt;font-family:"Courier New"">[4] “Intel OpenCL SDK Vectorizer”, Nadav Rotem, LLVM Developers' Meeting 2011.<o:p></o:p></span></i></p>
<p class="MsoPlainText"><i><span style="font-size:10.0pt;font-family:"Courier New"">[5] “Automatic SIMD Vectorization of SSA-based Control Flow Graphs”, Ralf Karrenberg, Springer 2015. See also “Improving Performance of OpenCL on CPUs”, LLVM Developers' Meeting
2012.<o:p></o:p></span></i></p>
<p class="MsoPlainText"><i><span style="font-size:10.0pt;font-family:"Courier New"">[6] “Compiling C/C++ SIMD Extensions for Function and Loop Vectorization on Multicore-SIMD Processors”, Xinmin Tian and Hideki Saito et al., IPDPSW 2012.<o:p></o:p></span></i></p>
<p>---------------------------------------------------------------------<br>
Intel Israel (74) Limited<o:p></o:p></p>
<p>This e-mail and any attachments may contain confidential material for<br>
the sole use of the intended recipient(s). Any review or distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.<o:p></o:p></p>
<p>---------------------------------------------------------------------<br>
Intel Israel (74) Limited<o:p></o:p></p>
<p>This e-mail and any attachments may contain confidential material for<br>
the sole use of the intended recipient(s). Any review or distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.<o:p></o:p></p>
</div>
</body>
</html>