<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Sep 1, 2020, at 23:03, Alina Sbirlea <<a href="mailto:alina.sbirlea@gmail.com" class="">alina.sbirlea@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hi Florian,<div class=""><br class=""></div><div class="">Following up on D86967, I missed that all the timings were using the legacy pass manager.</div><div class="">Did you do any testing on the compile and run time impact for the new pass manager?</div></div></div></blockquote></div><div class=""><br class=""></div><div class="">All numbers shared earlier where indeed with the default configuration/LPM.</div><div class=""><br class=""></div>I did some testing with the new PM today (using ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER). In terms of removed stores, the improvements are very similar to the legacy pass manager.<div class=""><br class=""></div><div class="">In terms of compile-time, the impact is a bit bigger unfortunately, due to the fact that we need more extra computations of MemorySSA. For example, in the regular pipeline, DSE is scheduled just before LICM, both of which use MemorySSA. With the LPM, if I understand correctly, LICM's loop pass manager constructs MemorySSA even if there is no loop in a function. So in the LPM, the number of times MemorySSA is computed stays the same.</div><div class=""><br class=""></div><div class="">With the NPW, it seems like LICM’s loop pass manager does only construct MemorySSA if there are actual loops in a function. This is great, but unfortunately means that using DSE + MemorySSA introduces an extra MemorySSA construction for each function without loops. For tramp3d-v4  this introduces 3x more MemorySSA constructions when running DSE + MemorySSA followed by LICM on the LTO module, for example.</div><div class=""><br class=""></div><div class="">I do not think there are any short-term solutions for this in the NPM, but once more passes, e.g. NewGVN or MemCpyOptimizer, are using MemorySSA, the cost should be somewhat amortized.</div><div class=""><br class=""></div><div class="">So here are the latest CTMark numbers with DSE + MemorySSA enabled. Note that they those measurements do not include the compile-time improvements made recently to MemDepAnalysis a week ago (<a href="https://github.com/llvm/llvm-project/commit/3a54b6a4b71c21cf3bab4f132cbc2904fb9d997e" class="">https://github.com/llvm/llvm-project/commit/3a54b6a4b71c21cf3bab4f132cbc2904fb9d997e</a>). I think this highlights the benefits of switching as soon as possible, so the DSE + MemorySSA implementation can benefit from more additional and fresh eyes with focus on compile-time. </div><span class=""><br class="">                                     New Pass Manager<br class="">                                    exec instrs.   size-text <br class="">O3                                 + 0.95%        - 0.25%</span><div class=""><span class="">ReleaseThinLTO           + 1.28%        - 0.41%</span></div><div class=""><span class="">ReleaseLTO-g.              + 1.64%        - 0.35%</span></div><div class="">RelThinLTO (link only)   + 0.93%        - 0.41%</div><div class=""><span class="">RelLO-g (link only).        + 2.09%       - 0.35%<br class=""><br class="">Details:<br class=""><a href="http://195.201.131.214:8000/compare.php?from=7fa5828950d060a70eb57e721765da7cc67c5695&to=b5e5f6ed0f583ddd7a032c274e013335216d6372&stat=instructions" class="">http://195.201.131.214:8000/compare.php?from=7fa5828950d060a70eb57e721765da7cc67c5695&to=b5e5f6ed0f583ddd7a032c274e013335216d6372&stat=instructions</a></span></div><div class=""><span class=""><br class=""></span>                                     Legacy Pass Manager<br class="">                                    exec instrs.   size-text <br class="">O3                                 + 0.67%        - 0.27%<div class="">ReleaseThinLTO           + 1.07%        - 0.42%</div><div class="">ReleaseLTO-g.              + 0.81%        - 0.33%</div><div class="">RelThinLTO (link only)   + 0.94%        - 0.42%</div><div class="">RelLO-g (link only).        + 0.78%       - 0.33%<br class=""><br class=""></div><div class="">Details:</div><div class=""><a href="http://llvm-compile-time-tracker.com/compare.php?from=7fa5828950d060a70eb57e721765da7cc67c5695&to=b5e5f6ed0f583ddd7a032c274e013335216d6372&stat=instructions" class="">http://llvm-compile-time-tracker.com/compare.php?from=7fa5828950d060a70eb57e721765da7cc67c5695&to=b5e5f6ed0f583ddd7a032c274e013335216d6372&stat=instructions</a></div><span class=""><div class=""><span class=""><br class=""></span></div><div class=""><span class="">There currently are 2 patches pending until we should be ready to flip the switch:</span></div><div class=""><span class=""><br class=""></span></div><div class=""><a href="https://reviews.llvm.org/D86651" class="">https://reviews.llvm.org/D86651</a> <span style="color: rgba(0, 0, 0, 0.85); font-family: "Helvetica Neue";" class="">[MemCpyOpt] Preserve MemorySSA.</span></div><div class=""><a href="https://reviews.llvm.org/D86815" class="">https://reviews.llvm.org/D86815</a> <span style="color: rgba(0, 0, 0, 0.85); font-family: "Helvetica Neue";" class="">[LangRef] Adjust guarantee for llvm.memcpy to also allow equal arguments.</span></div></span><span class=""><div class=""><br class=""></div>Cheers,<br class="">Florian</span></div></body></html>