<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Zia,<div class=""><br class=""></div><div class="">do you have HSW performance numbers for this change? An internal bot is logging a >10% regression for <a href="https://smooshbase.apple.com/perf/db_default/v4/nts/graph?highlight_run=114068&plot.746=313.746.3" style="color: rgb(0, 136, 204); text-decoration: none; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px;" class="">MultiSource/Benchmarks/Ptrdist/ks/ks</a> (O3 flto) pinned to this change. If you can reproduce it perhaps there is a tuning opportunity. </div><div class=""><br class=""></div><div class="">Thanks!</div><div class="">Gerolf</div><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Dec 9, 2015, at 1:29 PM, Zia Ansari via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" class="">llvm-commits@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">zansari created this revision.<br class="">zansari added reviewers: rnk, qcolombet, mkuper.<br class="">zansari added a subscriber: llvm-commits.<br class="">Herald added subscribers: qcolombet, MatzeB.<br class=""><br class="">These changes introduce a local stack symbol table ordering phase to allow all targets a chance to order the stack symbols the way they'd like it.<br class=""><br class="">X86 heuristics are added to order the symbols to improve code size and locality. The current default behavior for all other targets is to leave the order untouched. Other target specific heuristics can be very easily applied by simply providing the necessary heuristics.<br class=""><br class="">As an example, here are some cpu2000 code size improvements that I measured (mileage may vary depending on options used).. Numbers are percentage reduction in the sum of the text size of all objects:<br class=""><br class="">177.mesa<span class="Apple-tab-span" style="white-space:pre"> </span>1.81%<br class="">179.art<span class="Apple-tab-span" style="white-space:pre">   </span>0.50%<br class="">183.equake<span class="Apple-tab-span" style="white-space:pre">        </span>1.78%<br class="">188.ammp<span class="Apple-tab-span" style="white-space:pre">  </span>3.75%<br class="">164.gzip<span class="Apple-tab-span" style="white-space:pre">  </span>0.79%<br class="">175.vpr<span class="Apple-tab-span" style="white-space:pre">   </span>0.85%<br class="">176.gcc<span class="Apple-tab-span" style="white-space:pre">   </span>0.70%<br class="">181.mcf<span class="Apple-tab-span" style="white-space:pre">   </span>0.26%<br class="">186.crafty<span class="Apple-tab-span" style="white-space:pre">        </span>0.47%<br class="">197.parser<span class="Apple-tab-span" style="white-space:pre">        </span>0.29%<br class="">252.eon<span class="Apple-tab-span" style="white-space:pre">   </span>2.47%<br class="">253.perl<span class="Apple-tab-span" style="white-space:pre">  </span>0.47%<br class="">254.gap<span class="Apple-tab-span" style="white-space:pre">   </span>0.55%<br class="">255.vortex<span class="Apple-tab-span" style="white-space:pre">        </span>1.14%<br class="">256.bzip2<span class="Apple-tab-span" style="white-space:pre"> </span>0.64%<br class="">300.twolf<span class="Apple-tab-span" style="white-space:pre"> </span>0.88%<br class="">total<span class="Apple-tab-span" style="white-space:pre">     </span>1.24%<br class=""><br class="">I measured performance on cpu2k, cpu2006, eembc, and a few other benchmarks and it was pretty much flat, although on average, these changes should also improve data locality.<br class=""><br class="">Many lit changes broke due to new offsets being assigned to local symbols. I fixed these by disabling the optimization (vs. updating with new offsets) so that we'd avoid additional flakiness due to heuristic tuning.<br class=""><br class=""><br class=""><a href="http://reviews.llvm.org/D15393" class="">http://reviews.llvm.org/D15393</a><br class=""><br class="">Files:<br class="">  include/llvm/CodeGen/CommandFlags.h<br class="">  include/llvm/Target/TargetFrameLowering.h<br class="">  include/llvm/Target/TargetOptions.h<br class="">  lib/CodeGen/PrologEpilogInserter.cpp<br class="">  lib/Target/X86/X86FrameLowering.cpp<br class="">  lib/Target/X86/X86FrameLowering.h<br class="">  test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll<br class="">  test/CodeGen/X86/aligned-variadic.ll<br class="">  test/CodeGen/X86/cleanuppad-realign.ll<br class="">  test/CodeGen/X86/dynamic-allocas-VLAs.ll<br class="">  test/CodeGen/X86/hipe-cc.ll<br class="">  test/CodeGen/X86/hipe-cc64.ll<br class="">  test/CodeGen/X86/local-stack-symbol-ordering.ll<br class="">  test/CodeGen/X86/phys-reg-local-regalloc.ll<br class="">  test/CodeGen/X86/seh-catch-all-win32.ll<br class="">  test/CodeGen/X86/seh-stack-realign.ll<br class="">  test/CodeGen/X86/ssp-data-layout.ll<br class="">  test/CodeGen/X86/statepoint-stack-usage.ll<br class="">  test/CodeGen/X86/statepoint-stackmap-format.ll<br class="">  test/CodeGen/X86/stdarg.ll<br class="">  test/CodeGen/X86/widen_load-1.ll<br class="">  test/CodeGen/X86/win-catchpad-varargs.ll<br class="">  test/CodeGen/X86/win-catchpad.ll<br class="">  test/CodeGen/X86/win32-seh-catchpad-realign.ll<br class=""><br class=""><span id="cid:A64932DE-0AB3-449C-AC51-ACF61276EC83@apple.com"><D15393.42317.patch></span>_______________________________________________<br class="">llvm-commits mailing list<br class="">llvm-commits@lists.llvm.org<br class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits<br class=""></div></div></blockquote></div><br class=""></div></body></html>