<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On Oct 7, 2013, at 3:48 PM, Chandler Carruth <<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 7, 2013 at 11:28 AM, Evan Cheng <span dir="ltr"><<a href="mailto:evan.cheng@apple.com" target="_blank" class="cremed">evan.cheng@apple.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Any change that has the potential to impact performance widely requires this investigation *before* the patch is committed.</blockquote>


</div><br></div><div class="gmail_extra">Evan, while I completely agree with you on this specific issue, I don't fully agree with this general claim... It's close to what I follow and encourage others to follow, but with some important differences.</div>


<div class="gmail_extra"><br></div><div class="gmail_extra">For changes which have the potential to impact performance significantly for a few benchmarks, but are reasonably small and isolated changes to make (such as, adding a new collection of simplifications to instcombine, or switching the canonical form from A to B in the IR), I think it's often sufficient to measure performance of the test suite on your architecture (provided it is reasonably representative, x86 or arm tend to be sufficient) and commit. We have on-commit benchmarking from many companies that will catch regressions and let us back out the change as needed.</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">For changes which have really far reaching impacts on performance and we expect essentially a mixture of improvements and regressions across a wide range of benchmarks, I think we *have* to commit the change so that people can evaluate it, but we *have* to keep it behind a flag so that people can run A/B tests that isolate the impact of the change. Examples of these types of changes that have thus far all been carried out in this way:</div>

<div class="gmail_extra">1) SROA -- it changed the canonical form for many inputs by scalarizing them further. feedback from others helped isolate several bugs</div><div class="gmail_extra">2) Vectorization -- we had a flag to enable it for a while before actually making it on-by-default. it helped isolate a couple of bugs</div>

<div class="gmail_extra">3) Switch to MI sched -- its under a flag right now, and I've been able to run lots of benchmarks with it</div><div class="gmail_extra"><br></div><div class="gmail_extra">Note that both of these strategies still have people commit the change prior to widespread testing on a wide variety of hardware and benchmarks. I think that's actually a good thing. It ensures we continue to make incremental progress on the codebase independently of the (often slow) time cycle of running benchmarks.</div>

<div class="gmail_extra"><br></div></div></blockquote><div><br></div><div>Good thoughts, Chandler. Do I understand you correctly that you agree with Evan’s statement with the modification of s/committed/enabled by default/?</div><div><br></div><div>-Jim</div></div><br></body></html>