<div dir="ltr">Hi Florian,<div><br></div><div>Following up on D86967, I missed that all the timings were using the legacy pass manager.</div><div>Did you do any testing on the compile and run time impact for the new pass manager?</div><div><br></div><div>Thank you,</div><div>Alina</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Aug 25, 2020 at 12:51 PM Florian Hahn <<a href="mailto:florian_hahn@apple.com">florian_hahn@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
Thanks for all the responses!<br>
<br>
My understanding is that there were no objections so far against trading a bit of additional compile-time for more eliminated stores. Unless there are any new concerns raised, I’ll continue working on submitting the remaining patches to keep compile-time in check and flip the switch once they are in.<br>
<br>
<br>
> On Aug 19, 2020, at 18:46, Alina Sbirlea <<a href="mailto:alina.sbirlea@gmail.com" target="_blank">alina.sbirlea@gmail.com</a>> wrote:<br>
> <br>
> Hi Florian,<br>
> <br>
> First, thank you for working on this. I'm really glad to see this work so close to being enabled.<br>
> <br>
> I think the numbers look good for run time, and the benefits of switching for all configurations are clear.<br>
> <br>
> For compile time, the current regressions are noticeable, but not a deal breaker in my opinion. I'm very much in favor of switching in all configurations.<br>
> <br>
> To address some of the concerns, it may make sense to lower the threshold somewhat to minimize impact at this time (we won't have benefits as large at the time of the switch). I'm talking about getting the geomean closer to 1% in all configurations if possible.<br>
> I believe that the regressions introduced by this flag flip can be undone by further using MemorySSA in the other passes currently using MemDepAnalysis, and offsetting the cost of computing MemorySSA in the first place. The threshold could be raised again to enable more stores eliminated once the MemCpyOpt+MSSA and NewGVN become the default.<br>
> <br>
> If reducing the thresholds is not possible or removes most of the run time benefits, I would vote for enabling as is.<br>
<br>
I’ve been working on some additional changes to bring compile-time down a bit further, while keeping the number of eliminated stores at a similar level. Geomeans:<br>
-O3 +0.62%<br>
ReleaseThinLTO +1.06%, <br>
ReleaseLTO-g +0.79% Re<br>
ReleaseThinLTO (link-only) +0.96%<br>
ReleaseLTO-G (link only). +0.80%<br>
<br>
<a href="http://llvm-compile-time-tracker.com/compare.php?from=e19ef1aab524ef10a2d118adcd9f6fd6ca2d7ca9&to=c6812412983b4ebb20f1e32b83893284c8117e7f&stat=instructions" rel="noreferrer" target="_blank">http://llvm-compile-time-tracker.com/compare.php?from=e19ef1aab524ef10a2d118adcd9f6fd6ca2d7ca9&to=c6812412983b4ebb20f1e32b83893284c8117e7f&stat=instructions</a><br>
<br>
This also includes changes so DSE can re-use MemorySSA constructed by LICM earlier during LTO. The limits could certainly be a bit tighter still, but I hope they might encourage a few additional eyes on DSE + MemorySSA compile-time wise, which would be great. Also, triggering as often as possible is probably beneficial in terms of flushing out any remaining issues early.<br>
<br>
Cheers,<br>
Florian</blockquote></div>