<div dir="ltr"><div>Thanks for the comments.</div><div><br></div>At the moment, I'm refactoring the code and also preparing a document describing the optimization in detail, which I'll make available to everyone ASAP.<br>This will make easier for our discussion.<div><br></div><div>As I see it, function merging and function outlining (e.g., MachineOutliner) are trying to solve the same fundamental problem of redundant/repeated code. But, like many other optimizations, I don't think they are exclusive. Instead, they can probably work together in collaboration.</div><div><br></div><div>Some quick facts about my optimization, and how it may differ from the MachineOutliner:</div><div>- it works on the IR level, so it's not affected by register allocation, etc.</div><div>- if two functions are identical it produces a result very much like the existing MergeFunctions.<br></div><div>- if the two functions are not identical, it tries to maximize the number of similar code merged,</div><div>but it does not need to create one new function for each "block" of code merged.</div><div>- it's not limited to work within basic blocks, in fact, it is able to merge similar code even if they span across a different amount of basic blocks in each one of the functions being merged.</div><div>- however, as it merges different functions, it is unable to find code duplication within the same function. Although I can see how it could be adapted to handle some cases of code duplication within a function. But in general, MachineOutliner would be valuable in these cases.</div><div><br></div><div>As I described the fingerprint of functions before, it is basically just a map of opcode to number of occurrences. It wouldn't really be affected by changes in the IR.</div><div><br></div><div>The actual merge operation, on the other hand, needs to check the equivalence between instructions, which might be affected by changes in the IR. However, the same is true for the IR Verifier and the existing function merging of identical functions. It would be interesting to see if this can be simplified/unified to reduce the number of changes to the code when the IR changes.</div><div><br></div><div>The way I see it working on LTO would be something like: remove unnecessary functions, perform the quick identical function merging to remove straightforward replication, and then apply the more powerful function merging optimization.</div><div>As it was suggested by Jason's work, the two first optimizations would potentially reduce the number of functions to be analysed, solving quickly the easy cases.</div><div><br></div><div>It is true that depending on how many functions are merged with respect to the total size of the code, we can observe reduction on the compilation-time, as now the remaining optimizations and the back-end has a smaller code to work with. </div><div><br></div><div>I think it would be interesting to test on large programs like the ones suggested.</div><div><br></div><div><br></div><div>Cheers,</div><div><br></div><div>Rodrigo Rocha</div><div><br></div><div><div class="gmail_quote"><div dir="ltr">On Thu, 2 Aug 2018 at 21:35 Matthias Braun <<a href="mailto:mbraun@apple.com">mbraun@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space">How does it compare to the new machine outliner pass in llvm?<div><br></div><div><a href="https://www.youtube.com/watch?v=yorld-WSOeU" target="_blank">https://www.youtube.com/watch?v=yorld-WSOeU</a></div><div><a href="http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html" target="_blank">http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html</a></div><div><br></div><div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div>- Matthias</div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><br><div><br><blockquote type="cite"><div>On Aug 2, 2018, at 10:30 AM, JF Bastien via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="m_-7416094728650266778Apple-interchange-newline"><div><div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br class="m_-7416094728650266778Apple-interchange-newline"><br><blockquote type="cite"><div>On Aug 2, 2018, at 8:58 AM, Rodrigo Caetano Rocha via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="m_-7416094728650266778Apple-interchange-newline"><div><div dir="ltr">Hi Hal,<div><br></div><div>Because my function merging strategy is able to merge any two function, allowing for different CFGs, different parameters, etc.</div><div>I am unable to use just a simple hash value to compare whether or not two functions are similar.</div></div></div></blockquote><div><br></div><div>Can you give us more detail on what criteria you use for “fuzzy” merging? Jason Koenig had uploaded a prototype of something similar in 2015, based on mergefuncs.</div><div><br></div><br><blockquote type="cite"><div><div dir="ltr"><div>Therefore, the idea is to have an infrastructure which allows me to compare whether or not two functions are similar without having traverse the two function (basically performing a merge for all pairs).</div><div>I'm precomputing a fingerprint of all functions, which is then cached for later use (this might also be useful to enable this function merging with ThinLTO).</div><div>At the moment, this fingerprint is just a map of opcode -> number of occurrences in the function, which is just a ~64-int-array.</div></div></div></blockquote><div><br></div><div>The difficulty with mergefuncs is keeping its comparator / hasher in sync with IR. All it takes is one new IR property to break it, and I think the only want to fix this issue is to have comparison / hashing be part of the IR definition. How would you solve this issue?</div><div><br></div><br><blockquote type="cite"><div><div dir="ltr"><div>Then, for each functions being considered for a merge, I'm able to rank the candidates with a PriorityQueue.</div><div><br></div><div>Hopefully, we are able to do that in a very lightweight manner.</div><div><br></div><div>After that, the more expensive bit will be actually performing the merge and then checking for profitability, using the TTI for code-size.</div><div><br></div><div>I haven't given much thought about adapting this infrastructure for the SLP Vectorizer, but perhaps something similar could also work there.</div><div><br></div><div>Cheers,</div><div><br></div><div>Rodrigo Rocha<br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Thu, 2 Aug 2018 at 16:43 Hal Finkel <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><br><div class="m_-7416094728650266778m_-4787981321377026353moz-cite-prefix">On 08/02/2018 10:25 AM, Rodrigo Caetano Rocha via llvm-dev wrote:<br></div><blockquote type="cite"><div dir="ltr">Hi everyone,<div><br></div><div>I'm currently working on a new function merging optimization that is more general than the current LLVM function merging optimization that works only on identical functions.</div><div><br></div><div>I would like to know if the community has any interest in having a more powerful function merging optimization.</div></div></blockquote><br></div><div text="#000000" bgcolor="#FFFFFF">Yes, I think there is certainly interest in this space.</div></blockquote></div></div></blockquote><div><br></div><div>Yes.</div><div><br></div><div>I'd also be interested in hearing about how this combines with MachineOutliner. I expect they find some redundant things, but mostly help each other.</div><div><br></div><br><blockquote type="cite"><div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><blockquote type="cite"><div dir="ltr"><div>---- More Details ----</div><div><br></div><div>Up until now, I have been focusing on the quality of the code reduction.</div><div><br></div><div>Some preliminary result on SPEC'06 in a full LTO fashion:</div><div>The baseline has no function merge but the optimization pipeline is the same, and I am comparing my function merge with LLVM's identical function merge, where everything else in the optimization pipeline is the same as the baseline.</div><div>Average reduction in the final exectuable file over the baseline: 5.55% compared to 0.49% of the identical merge.</div><div><div><span class="m_-7416094728650266778m_-4787981321377026353inbox-inbox-gr_ m_-7416094728650266778m_-4787981321377026353inbox-inbox-gr_run_anim m_-7416094728650266778m_-4787981321377026353inbox-inbox-gr_366 m_-7416094728650266778m_-4787981321377026353inbox-inbox-gr-alert m_-7416094728650266778m_-4787981321377026353inbox-inbox-doubleReplace m_-7416094728650266778m_-4787981321377026353inbox-inbox-gr_inline_cards m_-7416094728650266778m_-4787981321377026353inbox-inbox-replaceWithoutSep m_-7416094728650266778m_-4787981321377026353inbox-inbox-only-ins m_-7416094728650266778m_-4787981321377026353inbox-inbox-gr_gramm m_-7416094728650266778m_-4787981321377026353inbox-inbox-Grammar" id="m_-7416094728650266778m_-4787981321377026353inbox-inbox-366" style="display:inline;border-bottom-width:2px;border-bottom-style:solid;border-bottom-color:transparent;background-repeat:no-repeat no-repeat">Average</span> reduction in the total number of instructions over the baseline: 7.04% compared to 0.47% of the identical merge.</div></div></div></blockquote></div></blockquote></div></div></blockquote><div><br></div><div>IIRC this roughly matches Jason’s results on Chrome / Firefox: a few percentage point reduction. Can you try large applications like Chrome / Firefox / WebKit to get more real-world numbers? It’s interesting to compare what you get from regular builds as well as LTO builds (which will take forever, but expose much more duplication).</div><div><br></div><br><blockquote type="cite"><div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><blockquote type="cite"><div dir="ltr"><div>The highest reduction on the executable file is of about 20% (both 429.mcf and 447.dealII) and the highest reduction on the total number of instructions is of about 37% (447.dealII).</div><div><br></div><div>It has an average slowdown of about 1%, but having no statistical difference from the baseline in most of the benchmarks in the SPEC'06 suite.</div></div></blockquote></div></blockquote></div></div></blockquote><div><br></div><div>Jason’s (uncommitted) work found *speedups* when compiling all of Chrome. The way he did this was with an early and fast mergefuncs which didn’t try to be fuzzy: it just tried to remove code duplication, which means the optimizer spent less time because there were fewer functions. He then had a later mergefuncs which did fuzzy matching and tried to pick up more things. Keeping it somewhat later is important because merging similar functions might make it less attractive to inlining (because the merged functions are now slightly more complex).</div><div><br></div><div><br></div><blockquote type="cite"><div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><blockquote type="cite"><div dir="ltr"><div>Because this new function merging technique is able to merge any pair of functions, except for a few restrictions, the exploration strategy is critical for having an acceptable compilation time.</div><div><br></div><div>At the moment I'm starting to focus more on simplifying the optimization and reducing the overhead in compilation time.</div><div>My optimization has an exploration threshold which can be tuned to trade-off compilation time overhead for a more aggressive merge.</div><div>It does *not* perform n^2 merge operations. Instead, I have a ranking strategy based on a similarity metric computed from the function's "fingerprint".</div></div></blockquote><br></div><div text="#000000" bgcolor="#FFFFFF">Can you explain in more detail how this works?<br><br>(I'm also, as a side note, keeping my eye out for things like this that might also help us efficiently do more-compile-time-efficient SLP vetorization).<br><br> -Hal<br><br><blockquote type="cite"></blockquote></div><div text="#000000" bgcolor="#FFFFFF"><blockquote type="cite"><div dir="ltr"><div>The threshold limits the exploration to focus on the top functions of the rank.</div><div>The idea is to make the ranking mechanism as lightweight as possible.</div><div><br></div><div>Cheers,</div><div><br></div><div>Rodrigo Rocha</div></div><br><fieldset class="m_-7416094728650266778m_-4787981321377026353mimeAttachmentHeader"></fieldset><br></blockquote></div><div text="#000000" bgcolor="#FFFFFF"><blockquote type="cite"><pre>_______________________________________________

LLVM Developers mailing list

<a class="m_-7416094728650266778m_-4787981321377026353moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>

<a class="m_-7416094728650266778m_-4787981321377026353moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre></blockquote></div><div text="#000000" bgcolor="#FFFFFF"><br><pre class="m_-7416094728650266778m_-4787981321377026353moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre></div></blockquote></div>_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br></div></blockquote></div><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">_______________________________________________</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">LLVM Developers mailing list</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><a href="mailto:llvm-dev@lists.llvm.org" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank">llvm-dev@lists.llvm.org</a><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></div></blockquote></div><br></div></div></blockquote></div></div></div>