<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On Apr 17, 2014, at 11:41 AM, Chandler Carruth <<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Apr 17, 2014 at 11:22 AM, Bob Wilson <span dir="ltr"><<a href="mailto:bob.wilson@apple.com" target="_blank">bob.wilson@apple.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="adM"><div class=""><div>On Apr 17, 2014, at 11:09 AM, Xinliang David Li <<a href="mailto:xinliangli@gmail.com" target="_blank">xinliangli@gmail.com</a>> wrote:</div>

<blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Apr 17, 2014 at 10:58 AM, Duncan P. N. Exon Smith <span dir="ltr"><<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><br>

On 2014-Apr-17, at 10:38, Xinliang David Li <<a href="mailto:xinliangli@gmail.com" target="_blank">xinliangli@gmail.com</a>> wrote:<br>

<br>

><br>

> Another idea is to use stack local counters per function -- synced up with global counters on entry and exit. the problem with it is for deeply recursive calls, stack pressure can be too high.<br>

<br>

</div>I think they'd need to be synced with global counters before function<br>

calls as well, since any function call can call "exit()".<br></blockquote><div><br></div><div>right -- but it might be better to handle this in other ways. For instance a stack of counters for each frames is maintained. At exit, they are flushed in a batch. Or simply ignore it in case of program exit .</div>


</div></div></div></blockquote><br></div></div><div>It seems to me like we’re going to have a hard time getting good multithreaded performance without significant impact on the single-threaded behavior. We might need to add an option to choose between those. There’s a lot of room for improvement in the performance with the current instrumentation, so maybe we can find a way to make things incrementally better in a way that helps both, but avoiding the multithreaded cache conflicts seems like it’s going to be expensive in other ways.</div>

</blockquote></div><br>I don't really agree.</div><div class="gmail_extra"><br></div><div class="gmail_extra">First, multithreaded applications are going to be the majority soon, even if they aren't already. We should design for them and support them well by default. If, once we have that, we find single threaded performance dramatically suffers, then maybe we should add a flag. But it doesn't make sense to do this before we even have data.</div>

</div>

</blockquote></div><br><div>If someone wants to revise the instrumentation in a way that works better for multithreaded code, that’s great. Before the change is committed, we should have performance data comparing it to the current code. If there is no regression, then fine. If it significantly hurts single-threaded performance, then we will need a flag.</div></body></html>