<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    I would strongly suggest we not try to be overly general at this
    time.  Let's focus on getting a clearly profitably use case for
    each, and then worry about generalizing once we have more practical
    experience and code has actually landed in tree.  General loop
    optimization is a *hard* problem.  <br>
    <br>
    As a near term goal, having the profitability heuristics well
    separated from the actual transforms seems like a reasonable goal. 
    Even this should happen after we have tests and working examples in
    tree though.  <br>
    <br>
    Philip<br>
    <br>
    <div class="moz-cite-prefix">On 01/16/2015 01:14 AM, Kristof Beyls
      wrote:<br>
    </div>
    <blockquote cite="mid:000001d0316c$da758bf0$8f60a3d0$@arm.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <meta name="Generator" content="Microsoft Word 14 (filtered
        medium)">
      <style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
      <div class="WordSection1">
        <p class="MsoNormal"
          style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Hi
            Ramshankar,<o:p></o:p></span></p>
        <p class="MsoNormal"
          style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">This
            looks really interesting.<o:p></o:p></span></p>
        <p class="MsoNormal"
          style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Interestingly,
            a patch for loop distribution has been posted for review
            recently.<br>
            On a conceptual level loop fusion and loop distribution are
            each other’s inverse, so I would assume that<br>
            they should ideally somehow be combined and use a single
            cost function to decide, out of all legal<br>
            fusions/distributions, which one is most profitable.<o:p></o:p></span></p>
        <p class="MsoNormal"
          style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Does
            that make sense, or is there some reason why these
            fundamentally couldn’t be combined into a single<br>
            loop fuse-and-or-distribute transformation?<o:p></o:p></span></p>
        <p class="MsoNormal"
          style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I
            haven’t looked closely at either the code of the
            distribution or the fusion patches, but would it be hard to<br>
            combine them into a single transformation – assuming my
            assumptions above do make sense?<o:p></o:p></span></p>
        <p class="MsoNormal"
          style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">If
            it would prove sensible to somehow combine the distribution
            and fusion transformations – could this<br>
            be the beginning of a more general high-level loop
            transformation framework in LLVM?<o:p></o:p></span></p>
        <p class="MsoNormal"
          style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thanks,<o:p></o:p></span></p>
        <p class="MsoNormal"
          style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Kristof<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
        <div style="border:none;border-left:solid blue 1.5pt;padding:0cm
          0cm 0cm 4.0pt">
          <div>
            <div style="border:none;border-top:solid #B5C4DF
              1.0pt;padding:3.0pt 0cm 0cm 0cm">
              <p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif""
                    lang="EN-US">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif""
                  lang="EN-US"> <a class="moz-txt-link-abbreviated" href="mailto:llvmdev-bounces@cs.uiuc.edu">llvmdev-bounces@cs.uiuc.edu</a>
                  [<a class="moz-txt-link-freetext" href="mailto:llvmdev-bounces@cs.uiuc.edu">mailto:llvmdev-bounces@cs.uiuc.edu</a>] <b>On Behalf Of
                  </b>Ramanarayanan, Ramshankar<br>
                  <b>Sent:</b> 16 January 2015 00:22<br>
                  <b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:llvmdev@cs.uiuc.edu">llvmdev@cs.uiuc.edu</a><br>
                  <b>Subject:</b> [LLVMdev] proof of concept for a loop
                  fusion pass<o:p></o:p></span></p>
            </div>
          </div>
          <p class="MsoNormal"><o:p> </o:p></p>
          <div>
            <div>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">Hi,<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">We are proposing a loop fusion
                  pass that tries to proactive fuse loops across
                  function call boundaries and arbitrary control flow.<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"><a moz-do-not-send="true"
                    href="http://reviews.llvm.org/D7008" target="_blank">http://reviews.llvm.org/D7008</a><o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">With this pass, we get 103 loop
                  fusions in SPECCPU INT 2006 462.libquantum with rate
                  performance improving close to 2.5X in x86 (results
                  from AMD A10-6700).<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">I took some liberties in patching
                  up some of the code in
                  ScalarEvolution/DependenceAnalysis/GlobalsModRef/Clang
                  options/ and also adjusted the IPO/LTO pass managers.
                  I would need to do a better job there but I would like
                  to put this out as WIP for high level reviews.  At
                  this time standard loop fusion test cases may not work
                  (cases where control dep are same or loops are truly
                  adjacent etc). I would be doing that as well.<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">Appreciate suggestions and other
                  help. <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">The pass initially attempts to
                  inline calls which have loops in them.  Following
                  inlining, a separate scalar "main" pass tries to fuse
                  loops. It is not necessary for loops to be adjacent or
                  have the same control dependence. Specifically a
                  control dependence closure is computed and loops that
                  share a control dep closure prefix are taken as
                  candidates for fusion, in case there are no other
                  loops in a certain path between them.  A loop graph is
                  built with edges between loops which share the control
                  dependence closure prefix. A recursive application of
                  fusion follows on the graph from "bottom-up".<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">During fusion, program slices are
                  taken based on the control flow closures of the two
                  loops being fused.  Example: Suppose two loops A and B
                  are going to be fused.  They share the same control
                  dependence prefix, but their suffices vary.  The
                  suffices are used to duplicate Control flow paths that
                  pass through both loops. Specifically paths from the
                  first block in the control-dep closure suffix of the
                  first loop, through the first loop's exit and
                  following into the suffix of the second loop through
                  the second loop's exit on to the common post-dominator
                  of either loop's exit. The number of paths grows
                  exponentially. At this time some heuristics are used
                  when exploring for paths.  Using profile based
                  heuristics is a better idea.<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">Also function unswitching helps by
                  cleaning up control flow.<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">Example:<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">if (a & b) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">  if (a & c) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">    if (t & 0xF) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">      for (i=0; i < size; i++)
                  {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">        if (A[i] & d) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">          A[i] |= (1 << t);<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">        }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">      }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">    }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">  }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">}<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">if (b & d) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">  if (a & d) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">    if (t & 0xA) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">      for (i=0; i < size; i++)
                  {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">        if (A[i] & d) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">          A[i] |= (1 << t);<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">        }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">      }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">    }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">  }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">}<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">After fusion we will have:<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">if (a&b && a&c
                  && t&0xF && b&d &&
                  a&d t&0xA) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">  for (i=0; i < size; i++) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">    if (A[i] & d) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">      A[i] |= (1 << t);<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">    }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">    if (A[i] & d) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">      A[i] |= (1 << t);<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">    }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">  }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">} else {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">// original code<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">  if (a & b) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">    if (a & c) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">      if (t & 0xF) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">        for (i=0; i < size;
                  i++) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">          if (A[i] & d) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">            A[i] |= (1 <<
                  t);<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">          }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">        }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">      }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">    }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">  }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">  if (b & d) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">    if (a & d) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">      if (t & 0xA) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">        for (i=0; i < size;
                  i++) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">          if (A[i] & d) {<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">            A[i] |= (1 <<
                  t);<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">          }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">        }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">      }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">    }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">  }<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">}<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black"> <o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">Thanks,<o:p></o:p></span></p>
              <p class="MsoNormal"
                style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span
                  style="color:black">Ramshankar<o:p></o:p></span></p>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>