<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:SimSun;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@SimSun";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal">Hi,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Current loop vectorizer uses a range of vectorization factors computed by MaxVF. For each VF, it setups unform and scalar info before building VPlan and the final best VF selection. The best VF is also selected within the VF range.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {<o:p></o:p></p>
<p class="MsoNormal"> // Collect Uniform and Scalar instructions after vectorization with VF.<o:p></o:p></p>
<p class="MsoNormal"> CM.collectUniformsAndScalars(VF);<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> // Collect the instructions (and their associated costs) that will be more<o:p></o:p></p>
<p class="MsoNormal"> // profitable to scalarize.<o:p></o:p></p>
<p class="MsoNormal"> if (VF > 1)<o:p></o:p></p>
<p class="MsoNormal"> CM.collectInstsToScalarize(VF);<o:p></o:p></p>
<p class="MsoNormal"> }<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">It looks when force vectorization is not given, it is not necessary to setup uniform and scalar info for every VF. For a VF, we can do a check before collectUniformsAndScalars() and collectInstsToScalarize() to see if the types used in
the code can actually yield any vector types(after type legalization) or not. If not, there is no point for this VF to participate in VPlan and VF selection. As the (scalar) types can be collected once for all VFs, I guess it is cheap enough. As both collectUniformsAndScalars()
and collectInstsToScalarize() don’t look cheap, doing such check can speed up vectorization, in particular, for large MaxVFs.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Another minor thing is when force vectorization is enabled and MaxVF > 1, expected cost of VF=2 is computed twice at the moment.
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled;<o:p></o:p></p>
<p class="MsoNormal"> // Ignore scalar width, because the user explicitly wants vectorization.<o:p></o:p></p>
<p class="MsoNormal"> if (ForceVectorization && MaxVF > 1) {<o:p></o:p></p>
<p class="MsoNormal"> Width = 2;<o:p></o:p></p>
<p class="MsoNormal"> Cost = expectedCost(Width).first / (float)Width;<o:p></o:p></p>
<p class="MsoNormal"> }<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> for (unsigned i = 2; i <= MaxVF; i *= 2) {<o:p></o:p></p>
<p class="MsoNormal"> // Notice that the vector loop needs to be executed less times, so<o:p></o:p></p>
<p class="MsoNormal"> // we need to divide the cost of the vector loops by the width of<o:p></o:p></p>
<p class="MsoNormal"> // the vector elements.<o:p></o:p></p>
<p class="MsoNormal"> VectorizationCostTy C = expectedCost(i);<o:p></o:p></p>
<p class="MsoNormal"> float VectorCost = C.first / (float)i;<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span lang="EN-IE">Cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-IE"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-IE">Shixiong (Jason) Xu<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>