<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
I would strongly suggest we not try to be overly general at this
time. Let's focus on getting a clearly profitably use case for
each, and then worry about generalizing once we have more practical
experience and code has actually landed in tree. General loop
optimization is a *hard* problem. <br>
<br>
As a near term goal, having the profitability heuristics well
separated from the actual transforms seems like a reasonable goal.
Even this should happen after we have tests and working examples in
tree though. <br>
<br>
Philip<br>
<br>
<div class="moz-cite-prefix">On 01/16/2015 01:14 AM, Kristof Beyls
wrote:<br>
</div>
<blockquote cite="mid:000001d0316c$da758bf0$8f60a3d0$@arm.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<meta name="Generator" content="Microsoft Word 14 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Hi
Ramshankar,<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">This
looks really interesting.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Interestingly,
a patch for loop distribution has been posted for review
recently.<br>
On a conceptual level loop fusion and loop distribution are
each other’s inverse, so I would assume that<br>
they should ideally somehow be combined and use a single
cost function to decide, out of all legal<br>
fusions/distributions, which one is most profitable.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Does
that make sense, or is there some reason why these
fundamentally couldn’t be combined into a single<br>
loop fuse-and-or-distribute transformation?<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I
haven’t looked closely at either the code of the
distribution or the fusion patches, but would it be hard to<br>
combine them into a single transformation – assuming my
assumptions above do make sense?<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">If
it would prove sensible to somehow combine the distribution
and fusion transformations – could this<br>
be the beginning of a more general high-level loop
transformation framework in LLVM?<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thanks,<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Kristof<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm
0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #B5C4DF
1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif""
lang="EN-US">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif""
lang="EN-US"> <a class="moz-txt-link-abbreviated" href="mailto:llvmdev-bounces@cs.uiuc.edu">llvmdev-bounces@cs.uiuc.edu</a>
[<a class="moz-txt-link-freetext" href="mailto:llvmdev-bounces@cs.uiuc.edu">mailto:llvmdev-bounces@cs.uiuc.edu</a>] <b>On Behalf Of
</b>Ramanarayanan, Ramshankar<br>
<b>Sent:</b> 16 January 2015 00:22<br>
<b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:llvmdev@cs.uiuc.edu">llvmdev@cs.uiuc.edu</a><br>
<b>Subject:</b> [LLVMdev] proof of concept for a loop
fusion pass<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">We are proposing a loop fusion
pass that tries to proactive fuse loops across
function call boundaries and arbitrary control flow.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"><a moz-do-not-send="true"
href="http://reviews.llvm.org/D7008" target="_blank">http://reviews.llvm.org/D7008</a><o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">With this pass, we get 103 loop
fusions in SPECCPU INT 2006 462.libquantum with rate
performance improving close to 2.5X in x86 (results
from AMD A10-6700).<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">I took some liberties in patching
up some of the code in
ScalarEvolution/DependenceAnalysis/GlobalsModRef/Clang
options/ and also adjusted the IPO/LTO pass managers.
I would need to do a better job there but I would like
to put this out as WIP for high level reviews. At
this time standard loop fusion test cases may not work
(cases where control dep are same or loops are truly
adjacent etc). I would be doing that as well.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">Appreciate suggestions and other
help. <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">The pass initially attempts to
inline calls which have loops in them. Following
inlining, a separate scalar "main" pass tries to fuse
loops. It is not necessary for loops to be adjacent or
have the same control dependence. Specifically a
control dependence closure is computed and loops that
share a control dep closure prefix are taken as
candidates for fusion, in case there are no other
loops in a certain path between them. A loop graph is
built with edges between loops which share the control
dependence closure prefix. A recursive application of
fusion follows on the graph from "bottom-up".<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">During fusion, program slices are
taken based on the control flow closures of the two
loops being fused. Example: Suppose two loops A and B
are going to be fused. They share the same control
dependence prefix, but their suffices vary. The
suffices are used to duplicate Control flow paths that
pass through both loops. Specifically paths from the
first block in the control-dep closure suffix of the
first loop, through the first loop's exit and
following into the suffix of the second loop through
the second loop's exit on to the common post-dominator
of either loop's exit. The number of paths grows
exponentially. At this time some heuristics are used
when exploring for paths. Using profile based
heuristics is a better idea.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">Also function unswitching helps by
cleaning up control flow.<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">Example:<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">if (a & b) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (a & c) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (t & 0xF) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> for (i=0; i < size; i++)
{<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (A[i] & d) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> A[i] |= (1 << t);<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">}<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">if (b & d) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (a & d) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (t & 0xA) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> for (i=0; i < size; i++)
{<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (A[i] & d) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> A[i] |= (1 << t);<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">}<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">After fusion we will have:<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">if (a&b && a&c
&& t&0xF && b&d &&
a&d t&0xA) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> for (i=0; i < size; i++) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (A[i] & d) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> A[i] |= (1 << t);<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (A[i] & d) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> A[i] |= (1 << t);<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">} else {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">// original code<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (a & b) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (a & c) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (t & 0xF) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> for (i=0; i < size;
i++) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (A[i] & d) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> A[i] |= (1 <<
t);<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (b & d) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (a & d) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (t & 0xA) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> for (i=0; i < size;
i++) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> if (A[i] & d) {<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> A[i] |= (1 <<
t);<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> }<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">}<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black"> <o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">Thanks,<o:p></o:p></span></p>
<p class="MsoNormal"
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="color:black">Ramshankar<o:p></o:p></span></p>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>