<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">The instrumentation that I have proposed (on cfe-dev) for PGO is also intended to provide the necessary info for code coverage. I have not yet measured the performance of the code to write out the data, but it ought to be quite a bit faster than what we have now.<div><br><div><div>On Oct 4, 2013, at 1:40 AM, Kostya Serebryany <<a href="mailto:kcc@google.com">kcc@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr">Another question is about the performance of coverage's at-exit actions (dumping coverage data on disk).<div>I've built chromium's base_unittests with -fprofile-arcs -ftest-coverage and the coverage's at-exit hook takes 22 seconds, </div>
<div>which is 44x more than I am willing to pay. </div><div>Most of the time is spent here: <br></div><div><div>#0 0x00007ffff3b034cd in msync () at ../sysdeps/unix/syscall-template.S:82</div><div>#1 0x0000000003a8c818 in llvm_gcda_end_file ()</div>
<div>#2 0x0000000003a8c914 in llvm_writeout_files ()</div><div>#3 0x00007ffff2f5e901 in __run_exit_handlers</div></div><div>The test depends on ~700 source files and so the profiling library calls msync ~700 times.</div>
<div>Full chromium depends on ~12000 source files, so we'll be dumping the coverage data for 5 minutes this way.<br></div><div><div>I understand that we have to support the lcov/gcov format (broken in may ways) and this may be the reason for being slow.<br>
</div><div>But I really need something much faster (and maybe simpler).</div></div><div><br></div><div>Is anyone planing any work on coverage in the nearest months?<br></div><div>If no, we'll probably cook something simple and gcov-independent. </div>
<div>Thoughts? </div><div><br></div><div>--kcc </div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Oct 3, 2013 at 6:47 PM, Kostya Serebryany <span dir="ltr"><<a href="mailto:kcc@google.com" target="_blank">kcc@google.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello, <div><br></div><div>I have few questions about coverage.</div><div><br></div><div>Is there any user-facing documentation for clang's "-coverage" flag?</div>
<div>The coverage instrumentation seems to happen before asan, and so if asan is also enabled <br>
</div><div>asan will instrument accesses to @__llvm_gcov_ctr.</div><div>This is undesirable and so we'd like to skip these accesses. </div><div>Looks like GEP around @__llvm_gcov_ctr have special metadata attached: </div>
<div><div> %2 = getelementptr inbounds [4 x i64]* @__llvm_gcov_ctr, i64 0, i64 %1<br></div><div> %3 = load i64* %2, align 8</div><div> %4 = add i64 %3, 1</div><div> store i64 %4, i64* %2, align 8</div></div><div> ...</div>
<div>!1 = metadata !{...; [ DW_TAG_compile_unit ] ... /home/kcc/tmp/cond.cc] [DW_LANG_C_plus_plus]<br></div><div><br></div><div>Can we rely on having this metadata attached to @__llvm_gcov_ctr? <br></div><div>
Should we attach some metadata to the actual accesses as well, or simply find the corresponding GEP?</div>
<div><br></div><div>Finally, does anyone have performance numbers for coverage?</div><div>As of today it seems completely thread-hostile since __llvm_gcov_ctr is not thread-local.</div><div>A simple stress test shows that coverage slows down by 50x! </div>
<div><div>% cat ~/tmp/coverage_mt.cc </div><div>#include <pthread.h></div><div>__thread int x;</div><div>__attribute__((noinline))</div><div>void foo() {</div><div> x++;</div><div>}</div><div><br></div><div>void *Thread(void *) {</div>
<div> for (int i = 0; i < 100000000; i++)</div><div> foo();</div><div> return 0;</div><div>}</div><div><br></div><div>int main() {</div><div> static const int kNumThreads = 16;</div><div> pthread_t t[kNumThreads];</div>
<div> for (int i = 0; i < kNumThreads; i++)</div><div> pthread_create(&t[i], 0, Thread, 0);</div><div> for (int i = 0; i < kNumThreads; i++)</div><div> pthread_join(t[i], 0);</div><div> return 0;</div>
<div>
}</div></div><div><br></div><div><div>% clang -O2 ~/tmp/coverage_mt.cc -lpthread ; time ./a.out </div><div>TIME: real: 0.284; user: 3.560; system: 0.000</div><div>% clang -O2 ~/tmp/coverage_mt.cc -lpthread -coverage ; time ./a.out </div>
<div>TIME: real: 13.327; user: 174.510; system: 0.000</div></div><div><br></div><div>Any principal objections against making __llvm_gcov_ctr thread-local, perhaps under a flag?</div><div><br></div><div>If anyone is curious, my intent is to enable running coverage and asan in one process.</div>
<div><br></div><div>Thanks, <br></div><div>--kcc</div></div>
</blockquote></div><br></div>
_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br></blockquote></div><br></div></body></html>