<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 16, 2014 at 12:45 PM, Yi Kong <span dir="ltr"><<a href="mailto:kongy.dev@gmail.com" target="_blank">kongy.dev@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class=""><p dir="ltr">On 16 May 2014 18:40, "Chandler Carruth" <<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>> wrote:<br>


><br>

> Why not use the cycle count which perf exposes from hardware? That would seem even better to me, but data would be better. =]</p>

</div><p dir="ltr">That's an interesting idea. However I'm concerned if that will miss some aspects of compiler optimization. For example frequent cache misses would have much smaller impact on the result if the processor goes to lower frequency during the stall period. Nonetheless it's definitely worth to try out.</p>

</blockquote><div>Sure, but we should disable frequency throttling on any machine from which we want numbers that look *remotely* stable.<br></div><div><br></div><div>The other thing you might try doing while you're wrapping these tools is to use schedtool to pin the process to a single core. On most modern x86 machines you can see 2-3% swing in lots of small details, and when the process migrates between cores this makes the numbers very hard to analyze.</div>

</div></div></div>