<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
<div id="divtagdefaultwrapper" style="font-size:11pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<p>Hi River,</p>
<p><br>
</p>
<p>Thank you for information. Do you have your code in Phabricator?</p>
<p><br>
</p>
<p>As I understand your pass always runs after the Inliner. <span style="font-size: 14.6667px;">At the moment the Inliner is too aggressive at Os/Oz. I am trying to tune Inliner heuristics for Oz. </span><span style="font-size: 11pt; font-family: Calibri, Helvetica, sans-serif, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">It
 would be interesting to see if there are any code size </span><span style="font-size: 11pt;">improvements when the Inliner is disabled. This will demonstrate whether the Outliner could improve code size when
</span>duplicated code is <span style="font-size: 11pt;">not due to the Inliner. Another problem I have is that C++ programs get code size regressions due my changes in the inline heuristics. Have  you seen this problem?</span></p>
<p><span style="font-size: 11pt;"><br>
</span></p>
<p>Thanks,</p>
<p>Evgeny Astigeevich</p>
</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> River Riddle <riddleriver@gmail.com><br>
<b>Sent:</b> Friday, July 21, 2017 6:01:56 PM<br>
<b>To:</b> Evgeny Astigeevich<br>
<b>Cc:</b> llvm-dev@lists.llvm.org; nd; Kristof Beyls; gaborb@inf.u-szeged.hu<br>
<b>Subject:</b> Re: [llvm-dev] [RFC] Add IR level interprocedural outliner for code size.</font>
<div> </div>
</div>
<div>
<div dir="ltr">Hi Evgeny,
<div> I know of the current machine outliner in LLVM. If you look in the "More detailed performance data"  in the end section it includes performance comparisons to the machine outliner. <br>
 As for the algorithmic approach they are kind of similar. </div>
<div>Machine Outliner:</div>
<div>  - Builds a suffix tree based on identical equivalence between machine instrs. </div>
<div>  - Uses target specific cost model for benefit estimation.</div>
<div>  - Uses a greedy algorithm for candidate pruning.</div>
<div>  - Currently supports X86, AArch64 targets.</div>
<div>  - Requires NoRedZone on supported targets.</div>
<div><br>
</div>
<div>IR Outliner:</div>
<div>  - Builds a suffix array based upon relaxed equivalence between ir instrs.</div>
<div>  - Allows for inputs/outputs from an outlined function.</div>
<div>  - Complex verification of outlining candidates due to above two points(relaxed equivalency, inputs/outputs).</div>
<div>  - Tunable generic target independent cost model (estimates constant folded inputs, output condensing, etc.).</div>
<div>       - Note it uses the target info available (TargetTransformInfo) </div>
<div>  - Uses a greedy interval based algorithm for candidate pruning.</div>
<div>  - Supports profile data.</div>
<div>  - Supports emitting opt remarks.</div>
<div>  - Target independent, thus no required ABI constraints or hooks.</div>
<div>Though some parts of the approach are similar the implementation is very different. Also to note, the IR outliner can be expanded to also outline from multiple basic blocks. I've already prototyped the infrastructure for outlining regions, so its possible.<br>
  </div>
<div>Thanks,</div>
<div>  River Riddle</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Jul 21, 2017 at 2:24 AM, Evgeny Astigeevich <span dir="ltr">
<<a href="mailto:Evgeny.Astigeevich@arm.com" target="_blank">Evgeny.Astigeevich@arm.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div id="m_-1908490964173508319divtagdefaultwrapper" style="font-size:11pt;color:#000000;font-family:Calibri,Helvetica,sans-serif" dir="ltr">
<p>Hi River,</p>
<p><br>
</p>
<p>Do you know LLVM has the MachineOutliner pass (<span>lib/CodeGen/MachineOutliner.<wbr>cpp</span>)? </p>
<p>It would be interesting to compare your approach with it.</p>
<p><br>
</p>
<p>Thanks,</p>
<p>Evgeny Astigeevich</p>
</div>
<hr style="display:inline-block;width:98%">
<div id="m_-1908490964173508319divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> llvm-dev <<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.<wbr>org</a>>
 on behalf of River Riddle via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>><br>
<b>Sent:</b> Thursday, July 20, 2017 11:47:57 PM<br>
<b>To:</b> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<b>Subject:</b> [llvm-dev] [RFC] Add IR level interprocedural outliner for code size.</font>
<div> </div>
</div>
<div>
<div class="h5">
<div>
<div dir="ltr"><span id="m_-1908490964173508319gmail-docs-internal-guid-128cceb0-622c-6ca0-9998-bc07500bb3bc">
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap"> I’m River and I’m a compiler engineer at PlayStation. Recently, I’ve been working on an interprocedural
 outlining (code folding) pass for code size improvement at the IR level. We hit a couple of use cases that the current code size solutions didn’t handle well enough. Outlining is one of the avenues that seemed potentially beneficial.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">-- Algorithmic Approach --</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">The general implementation can be explained in stages:</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">* Candidate Selection:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Each instruction in the module is mapped to a congruence class based upon relaxed equivalency
 constraints. For most instructions this is simply: The type, opcode, and operand types. The constraints are tightened for instructions with special state that require more exact equivalence (e.g. ShuffleVector requires a constant mask vector). Candidates are
 then found by constructing a suffix array and lcp(longest common prefix) array. Walking the lcp array gives us congruent chains of instructions within the module.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">* Candidate Analysis:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">A verification step splits candidates that have different internal input sequences or incompatible
 parent function attributes between occurrences. An example of incompatible internal inputs sequences is:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">X = W + 6;   vs    X = W + 6;</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Y = X + 4;            Y = W + 4;</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">The above two occurrences would need special control flow to exist within the same outlined function.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">During analysis candidates have their inputs and outputs computed along with an estimated benefit
 from extraction. During input calculation a constant folding step removes all inputs that are the same amongst all occurrences.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">* Candidate Pruning:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Overlapping candidates are pruned with a generic greedy algorithm that picks occurrences starting
 from the most beneficial candidates.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">* Candidate Outlining:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Non pruned candidates are then outlined. Outputs from a candidate are returned via an output parameter,
 except for one output that is promoted to a return value. During outlining the inputs into the candidate are condensed by computing the equivalencies between the arguments at each occurrence. An example of this is:</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">outlinedFn(1,6,1);  ->  outlinedFn(1,6);</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">outlinedFn(2,4,2);  ->  outlinedFn(2,4);</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">In the above, parameters 1 and 3 were found to be equivalent for all occurrences, thus the amount
 of inputs was reduced to 2.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">* Debug Info:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Debug information is preserved for the calls to functions which have been outlined but all debug
 info from the original outlined portions is removed, making them harder to debug.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">* Profile Info:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">If the pass is running at Os the outliner will only consider cold blocks, whereas Oz considers
 all blocks that are not marked as hot.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">* Location in Pipeline:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">The pass is currently configured to run very late in the optimization pipeline. It is intended
 to run at Oz but will also run at Os if there is profile data available. The pass can optionally be run twice, once before function simplification and then again at the default location. This run is optional because you are gambling the potential benefits
 of redundancy elimination vs the potential benefits from function simplification. This can lead to large benefits or regressions depending on the benchmark (though the benefits tend to outnumber the regressions). The performance section contains data for both
 on a variety of benchmarks.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">-- Why at the IR level --</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">The decision to place the outliner at the IR level comes from a couple of major factors:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">- Desire to be target independent</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">- Most opportunities for congruency</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">The downside to having this type of transformation be at the IR level is it means there will be
 less accuracy in the cost model -  we can somewhat accurately model the cost per instruction but we can’t get information on how a window of instructions may lower. This can cause regressions depending on the platform/codebase, therefore to help alleviate
 this there are several tunable parameters for the cost model.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">-- Performance --</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">More results including clang, llvm-tblgen, and more specific numbers
 about benefits/regressions can be found in the notes section below.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">* Size Reduction:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">- Test Suite(X86_64):</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Early+Late outlining provides a geomean of 10.5% reduction over
 clang Oz, with a largest improvement of ~67% and largest regression of ~7.5%.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Late outlining provides a geomean of 4.65% reduction, with a
 largest improvement of ~51% and largest regression of ~6.4%.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">- Spec 2006(X86_64)</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Early+Late outlining provides a geomean reduction of 2.08%.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Late outlining provides 2.09%.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">- CSiBE(AArch64)</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Early+Late outlining shows geomean reduction of around 3.5%.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Late outlining shows 3.1%.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">* Compile Time:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Compile time was tested under test-suite with a multisample of 5.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">- Early+Late outlining</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Many improvements with > 40% compile time reduction.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Few regressions.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">- Late outlining</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Greatest improvement is ~7.8%</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Greatest regression is ~4% with a difference of <0.02s</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Our explanation for the compile time reduction during early outlining
 is that due to the amount of redundancy reduction early in the optimization pipeline there is a reduction in the amount of instruction processing during the rest of the compilation.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">* Execution Time:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">Ran with a multisample of 5.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">- Test Suite:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Early+Late outlining has many regressions up to 97%. The greatest
 improvement was around 7.5%.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Late outlining also has several regressions up to 44% and a
 greatest improvement of around 5.3%.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">- Spec:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Early+Late has a geomean regression of 3.5%.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - Late outlining has a geomean regression of 1.25%.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">The execution time results are to be expected given that the outliner,
 without profile data, will extract from whatever region it deems profitable. Extracting from the hot path can lead to a noticeable performance regression on any platform, which can be somewhat avoided by providing profile data during outlining.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">-- Tested Improvements --</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">* MergeFunctions:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - No noticeable benefit.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">* LTO:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - LTO doesn’t have a code size pipeline, but %reductions over
 LTO are comparable to non LTO.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">* Input/Output Partitioning:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   -This identifies inputs/outputs that may be folded by splitting
 a candidate. The benefit is minimal for the computations required.</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">* Similar Candidate Merging:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">   - The benefit to be gained is currently not worth the large complexity
 required to catch the desired cases.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">-- Potential Improvements --</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">* Suffix&LCP array construction: The pass currently uses a very basic
 implementation that could be improved. There are definitely faster algorithms and some can construct both simultaneously. We will investigate this as a potential benefit for compile time in the future.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">* Os performance tuning: Under -Os the pass currently only runs on
 cold blocks. Ideally we could expand this to be a little more lax on less frequently executed blocks that aren’t cold.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">* Candidate Selection: The algorithm currently focuses on the longest
 common sequences. More checks could be added to see if shortening the sequence length produces a larger benefit(e.g less inputs/outputs). This would likely lead to an increase in compile time but should remain less than the baseline.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">* Non Exact Functions: The outliner currently ignores functions that
 do not have an exact definition.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">-- --</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">* CSiBE(Code Size Benchmark):</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><a href="http://www.csibe.org/" style="text-decoration-line:none" target="_blank"><span style="font-size:11pt;font-family:Arial;background-color:transparent;text-decoration-line:underline;vertical-align:baseline;white-space:pre-wrap">www.csibe.org</span></a></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;background-color:transparent;vertical-align:baseline;white-space:pre-wrap">* More detailed performance data:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><a href="http://goo.gl/5k6wsP" style="text-decoration-line:none" target="_blank"><span style="font-size:11pt;font-family:Arial;background-color:transparent;text-decoration-line:underline;vertical-align:baseline;white-space:pre-wrap">goo.gl/5k6wsP</span></a></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">* Implementation:</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="text-decoration-line:underline;font-size:11pt;font-family:Arial;vertical-align:baseline;white-space:pre-wrap"><a href="https://github.com/River707/llvm/blob/outliner/lib/Transforms/IPO/CodeSizeOutliner.cpp" style="text-decoration-line:none" target="_blank">https://github.com/River707/<wbr>llvm/blob/outliner/lib/<wbr>Transforms/IPO/<wbr>CodeSizeOutliner.cpp</a></span></p>
<div><br>
</div>
</span></div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</body>
</html>