<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Aug 18, 2020, at 15:02, Renato Golin <<a href="mailto:rengolin@gmail.com" class="">rengolin@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div dir="ltr" class="">On Tue, 18 Aug 2020 at 14:36, Florian Hahn via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:<br class=""></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Note that both the current and MemorySSA-backed DSE implementations rely on various thresholds to avoid excessive compile-times. There are a few knobs to turn to further reduce compile-time of the MemorySSA-backed DSE implementation, at the cost of missing to eliminate some stores.The current settings are chose so the compile-time difference is limited without limiting optimizations too much.<br class=""></blockquote><div class=""><br class=""></div><div class=""><div class="">Did you compare the new algorithm with the knobs down enough to take roughly the same compile time as the old one? </div><div class=""></div></div></div></div></div></blockquote><br class=""></div><div>Without spending too much time on that, the best I could do while not regressing on the total number of stores removed was +0.30% (for CTMark) executed instructions for -O3 with ~3% more stores removed across the large benchmark set. (Executed instruction +0.28% for ReleaseLTO, +0.58% for ReleaseThinLTO).</div><div><br class=""></div><div>So MemorySSA DSE probably can get quite close to legacy DSE at legacy DSE’s own game, but intuitively I think the MemorySSA approach is just slightly more expensive in general, because it does not just handle cases that can be cached easily and allows for more flexibility.</div><div><br class=""></div><div><div class=""></div><blockquote type="cite" class=""><div class="">Use slightly more aggressive limits at O2, those limits at O3 and perhaps add an extra argument to push even further for people who really need it and can pay the extra compile time.</div></blockquote><div class=""><br class=""></div><div class="">That is certainly an option, but I think ideally the settings do not get too fragmented.</div><div class=""><br class=""></div><div class="">Cheers,</div><div class="">Florian</div></div></body></html>