<div dir="ltr"><div>I plan on rewriting the block placement algorithm to proceed by traces.</div><div><br></div><div>A trace is a chain of blocks where each block in the chain may fall through to</div><div>the successor in the chain.</div><div><br></div><div>The overall algorithm would be to first produce traces for a function, and then</div><div>order those traces to try and get cache locality.</div><div><br></div><div>Currently block placement uses a greedy single step approach to layout. It</div><div>produces chains working from inner to outer loops. Unlike a trace, a chain may</div><div>contain non-fallthrough edges. This causes problems with loop layout. The main</div><div>problems with loop layout are: loop rotation and cold blocks in a loop.</div><div><br></div><div>Overview of proposed solution:</div><div><br></div><div>Phase 1:</div><div>Greedily produce a set of traces through the function. A trace is a list of</div><div>blocks with each block in the list falling through (possibly conditionally) to</div><div>the next block in the list. Loop rotation will occur naturally in this phase via</div><div>the triangle replacement algorithm below. Handling single trace loops requires a</div><div>tweak, see the detailed design.</div><div><br></div><div>Phase 2:</div><div>After producing what we believe are the best traces, they need to be ordered.</div><div>They will be ordered topologically, except that traces that are cold enough (As</div><div>measured by their warmest block) will be floated later, This may push them out</div><div>of a loop or to the end of the function.</div><div><br></div><div>Detailed Design</div><div><br></div><div>Note whenever an edge is used as a number, I am referring to the edge frequency.</div><div><br></div><div>Phase 1: Producing traces</div><div>Traces are produced according to the following algorithm:</div><div> * Sort the edges according to weight, stable-sorting them according the incoming</div><div>block and edge ordering.</div><div> * Place each block in a trace of length 1.</div><div> * For each edge in order:</div><div>    * If the source is at the end of a trace, and the target is at the beginning</div><div>      of a trace, glue those 2 traces into 1 longer trace.</div><div>    * If an edge has a target or source in the middle of another trace, consider</div><div>      tail duplication. The benefit calculation is the same as the existing</div><div>      code.</div><div>    * If an edge has a source or target in the middle, check them to see if they</div><div>      can be replaced as a triangle. (Triangle replacement described below)</div><div>      * Compare the benefit of choosing the edge, along with any triangles</div><div>        found, with the cost of breaking the existing edges.</div><div>        * If it is a net benefit, perform the switch.</div><div> * Triangle checking:</div><div>    Consider a trace in 2 parts: A1->A2, and the current edge under consideration</div><div>    is A1->B (the case for C->A2 is mirror, and both may need to be done)</div><div>    * First find the best alternative C->B</div><div>    * Check for an alternative for A2: D->A2</div><div>    * Find D's best Alternative: D->E</div><div>    * Compare the frequencies: A1->A2 + C->B + D->E vs A1->B + D->A2</div><div>    * If the 2nd sum is bigger, do the switch.</div><div>  * Loop Rotation Tweak:</div><div>    If A contains a backedge A2->A1, then when considering A1->B or C->A2, we</div><div>    can include that backedge in the gain:</div><div>    A1->A2 + C->D + E->B vs A1->B + C->A2 + A2->A</div><div><br></div><div>Phase 2: Order traces.</div><div>First we compute the frequency of a trace by finding the max frequency of any of</div><div>its blocks.</div><div>Then we attempt to place the traces topologically. When a trace cannot be placed</div><div>topologically, we prefer warmer traces first.</div><div><br></div><div>Questions and comments welcome.</div><div><br></div></div>