<div dir="ltr">Yes -- option. It would be unwise to penalize single threaded app unconditionally.<div><br></div><div>David</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Apr 17, 2014 at 11:22 AM, Bob Wilson <span dir="ltr"><<a href="mailto:bob.wilson@apple.com" target="_blank">bob.wilson@apple.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div class="h5"><br><div><div>On Apr 17, 2014, at 11:09 AM, Xinliang David Li <<a href="mailto:xinliangli@gmail.com" target="_blank">xinliangli@gmail.com</a>> wrote:</div>
<blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Apr 17, 2014 at 10:58 AM, Duncan P. N. Exon Smith <span dir="ltr"><<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><br>
On 2014-Apr-17, at 10:38, Xinliang David Li <<a href="mailto:xinliangli@gmail.com" target="_blank">xinliangli@gmail.com</a>> wrote:<br>
<br>
><br>
> Another idea is to use stack local counters per function -- synced up with global counters on entry and exit. the problem with it is for deeply recursive calls, stack pressure can be too high.<br>
<br>
</div>I think they'd need to be synced with global counters before function<br>
calls as well, since any function call can call "exit()".<br></blockquote><div><br></div><div>right -- but it might be better to handle this in other ways. For instance a stack of counters for each frames is maintained. At exit, they are flushed in a batch. Or simply ignore it in case of program exit .</div>
</div></div></div></blockquote><br></div></div></div><div>It seems to me like we’re going to have a hard time getting good multithreaded performance without significant impact on the single-threaded behavior. We might need to add an option to choose between those. There’s a lot of room for improvement in the performance with the current instrumentation, so maybe we can find a way to make things incrementally better in a way that helps both, but avoiding the multithreaded cache conflicts seems like it’s going to be expensive in other ways.</div>
</div></blockquote></div><br></div>