<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Hi Meador. <div><br></div><div>Thanks for working on this. The new approach looks good to me, and I will benchmark it later today. </div><div><br></div><div>I think that we can do a few things to make it a little faster. First, we can detect prefixes that are used by c++ mangled functions, such as "_Z" and exit early. We can also check if the length of the incoming string is greater than any library function call that we optimize. </div><div><br></div><div>Thanks,</div><div>Nadav</div><div><br><div><div>On Mar 11, 2013, at 6:29 AM, Meador Inge <<a href="mailto:meadori@codesourcery.com">meadori@codesourcery.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;">On 03/07/2013 07:37 PM, Nadav Rotem wrote:<br><br><blockquote type="cite">Hi Meador,<span class="Apple-converted-space"> </span><br><br>I spoke with Bill about this and I now think that your proposal to initialize<br>the SimplifyLibCalls on the first invocation of runOnFunction is correct. The<br>only exception is LTO build. On LTO builds we can change the TargetData and TLI<br>per-function. So, on each runOnFunction we need to check of the current TLI and<br>DataLayout is the same as the previous run. If it is not the we need to<br>re-initialize the SimpLibCalls, rehash the table, etc.<span class="Apple-converted-space"> </span><br></blockquote><br>Hi Nadav,<br><br>After thinking about this a little more caching the LibCallSimplifier object on<br>the first call makes me a little nervous and I agree with the original<br>objections to it.  This motivated me to try something different.  The attached<br>patch drops the hash table and does an explicit lookup for each of the lib call<br>simplifier optimizers.<br><br>This should alleviate Michael's concerns about caching the object in<br>runOnFunction, short circuits the intrinsic case Michael is interested in,<br>avoid repeatedly building and destroying the hash table, and benefits all<br>clients of LibCallSimplifier.<br><br>There is a slight overhead for the new lookup function, but it is still much<br>better than the current approach.  On a benchmark containing 100,000 calls<br>where *none* of them are simplified I noticed a 30% speedup.  On a benchmark<br>containing 100,000 calls where *all* of them are simplified I noticed an 8%<br>speedup.  The original LibCallSimplifier caching patch also obtained a 30%<br>speedup in the case where nothing was simplified and a 13% speedup when all<br>calls where simplified.  I am comfortable that the new patch is slightly slower<br>for some cases (this is on average.  I saw a few runs where the too approaches<br>gave equivalent speedups).<br><br>Comments?  Would you all mind trying this patch for the benchmarks that you all<br>are interested in?  I personally like the new patch better.<br><br>--<span class="Apple-converted-space"> </span><br>Meador Inge<br>CodeSourcery / Mentor Embedded<br><span><0001-LibCallSimplifier-optimize-speed-for-short-lived-ins.patch></span></div></blockquote></div><br></div></body></html>