<div dir="ltr">It is essentially block layout algorithm 2 here, with limited non-greedy lookahead. (The triangle detection)<div><a href="https://www.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=p16-pettis.pdf">https://www.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=p16-pettis.pdf</a><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Sep 14, 2017 at 7:24 PM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Is this an existing published algorithm? Do you have a link to a paper?<div><br></div><div>-- Sean Silva</div></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On Thu, Sep 14, 2017 at 6:53 PM, Kyle Butt via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div dir="ltr"><div style="font-size:12.8px">I plan on rewriting the block placement algorithm to proceed by traces.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">A trace is a chain of blocks where each block in the chain may fall through to</div><div style="font-size:12.8px">the successor in the chain.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">The overall algorithm would be to first produce traces for a function, and then</div><div style="font-size:12.8px">order those traces to try and get cache locality.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Currently block placement uses a greedy single step approach to layout. It</div><div style="font-size:12.8px">produces chains working from inner to outer loops. Unlike a trace, a chain may</div><div style="font-size:12.8px">contain non-fallthrough edges. This causes problems with loop layout. The main</div><div style="font-size:12.8px">problems with loop layout are: loop rotation and cold blocks in a loop.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Overview of proposed solution:</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Phase 1:</div><div style="font-size:12.8px">Greedily produce a set of traces through the function. A trace is a list of</div><div style="font-size:12.8px">blocks with each block in the list falling through (possibly conditionally) to</div><div style="font-size:12.8px">the next block in the list. Loop rotation will occur naturally in this phase via</div><div style="font-size:12.8px">the triangle replacement algorithm below. Handling single trace loops requires a</div><div style="font-size:12.8px">tweak, see the detailed design.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Phase 2:</div><div style="font-size:12.8px">After producing what we believe are the best traces, they need to be ordered.</div><div style="font-size:12.8px">They will be ordered topologically, except that traces that are cold enough (As</div><div style="font-size:12.8px">measured by their warmest block) will be floated later, This may push them out</div><div style="font-size:12.8px">of a loop or to the end of the function.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Detailed Design</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Note whenever an edge is used as a number, I am referring to the edge frequency.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Phase 1: Producing traces</div><div style="font-size:12.8px">Traces are produced according to the following algorithm:</div><div style="font-size:12.8px"> * Sort the edges according to weight, stable-sorting them according the incoming</div><div style="font-size:12.8px">block and edge ordering.</div><div style="font-size:12.8px"> * Place each block in a trace of length 1.</div><div style="font-size:12.8px"> * For each edge in order:</div><div style="font-size:12.8px"> * If the source is at the end of a trace, and the target is at the beginning</div><div style="font-size:12.8px"> of a trace, glue those 2 traces into 1 longer trace.</div><div style="font-size:12.8px"> * If an edge has a target or source in the middle of another trace, consider</div><div style="font-size:12.8px"> tail duplication. The benefit calculation is the same as the existing</div><div style="font-size:12.8px"> code.</div><div style="font-size:12.8px"> * If an edge has a source or target in the middle, check them to see if they</div><div style="font-size:12.8px"> can be replaced as a triangle. (Triangle replacement described below)</div><div style="font-size:12.8px"> * Compare the benefit of choosing the edge, along with any triangles</div><div style="font-size:12.8px"> found, with the cost of breaking the existing edges.</div><div style="font-size:12.8px"> * If it is a net benefit, perform the switch.</div><div style="font-size:12.8px"> * Triangle checking:</div><div style="font-size:12.8px"> Consider a trace in 2 parts: A1->A2, and the current edge under consideration</div><div style="font-size:12.8px"> is A1->B (the case for C->A2 is mirror, and both may need to be done)</div><div style="font-size:12.8px"> * First find the best alternative C->B</div><div style="font-size:12.8px"> * Check for an alternative for A2: D->A2</div><div style="font-size:12.8px"> * Find D's best Alternative: D->E</div><div style="font-size:12.8px"> * Compare the frequencies: A1->A2 + C->B + D->E vs A1->B + D->A2</div><div style="font-size:12.8px"> * If the 2nd sum is bigger, do the switch.</div><div style="font-size:12.8px"> * Loop Rotation Tweak:</div><div style="font-size:12.8px"> If A contains a backedge A2->A1, then when considering A1->B or C->A2, we</div><div style="font-size:12.8px"> can include that backedge in the gain:</div><div style="font-size:12.8px"> A1->A2 + C->D + E->B vs A1->B + C->A2 + A2->A</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Phase 2: Order traces.</div><div style="font-size:12.8px">First we compute the frequency of a trace by finding the max frequency of any of</div><div style="font-size:12.8px">its blocks.</div><div style="font-size:12.8px">Then we attempt to place the traces topologically. When a trace cannot be placed</div><div style="font-size:12.8px">topologically, we prefer warmer traces first.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Questions and comments welcome.</div></div>
<br></div></div>______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div><br></div>
</blockquote></div><br></div>