<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal">Hi,<o:p></o:p></p>
<p class="MsoNormal">I am trying to get a small loop to *<b>not vectorize</b>* for cases where it doesn’t make sense. For instance, this loop:<o:p></o:p></p>
<p class="MsoNormal">void foo(int a[4][8], int n)<o:p></o:p></p>
<p class="MsoNormal">{ <o:p></o:p></p>
<p class="MsoNormal"> int b[4][8];<o:p></o:p></p>
<p class="MsoNormal"> for(int i = 0; i < 4; i++) {<o:p></o:p></p>
<p class="MsoNormal"> for(int j = 0; j < n; j++) {<o:p></o:p></p>
<p class="MsoNormal"> a[i][j] = b[i][j]; <o:p></o:p></p>
<p class="MsoNormal"> }<o:p></o:p></p>
<p class="MsoNormal"> }<o:p></o:p></p>
<p class="MsoNormal">} <o:p></o:p></p>
<p class="MsoNormal">* Has maximum of 8ints copy. LLVM tries to use Memcpy for the inner loop. It is not helpful to perform memcpy for such small moves, especially when the outer loop is unrolled since the trip count is constant (4). The 4 calls to memcpy is
not efficient. <o:p></o:p></p>
<p class="MsoNormal">* Therefore, I disabled the memcpy optimization for such cases, and found that LLVM LoopVectorizer successfully vectorizes and unrolls the inner loop. However, in order to take the fast path (vmovups) it must copy at least 32 ints, where
as in this case we only do an 8int copy. <o:p></o:p></p>
<p class="MsoNormal">** Upon closer look, LoopVectorizer obtains the TripCount for the innerloop using getSmallConstantTripCount(Loop,…). This value is 0 for the loop with unknown trip count. Loop unrolling is disabled when TC > 0. Should this be changed to
TC >= 0 (which does the job for this testcase)? Or is there a better way to disable loop unrolling for such trivial loops, at least the ones with known array size?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks for your feedback<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Sriram<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">--<o:p></o:p></p>
<p class="MsoNormal">Sriram Murali<o:p></o:p></p>
<p class="MsoNormal">SSG/DPD/ECDL/DMP<o:p></o:p></p>
<p class="MsoNormal">+1 (519) 772 – 2579<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>