<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">The instrumentation that I have proposed (on cfe-dev) for PGO is also intended to provide the necessary info for code coverage.  I have not yet measured the performance of the code to write out the data, but it ought to be quite a bit faster than what we have now.<div><br><div><div>On Oct 4, 2013, at 1:40 AM, Kostya Serebryany <<a href="mailto:kcc@google.com">kcc@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr">Another question is about the performance of coverage's at-exit actions (dumping coverage data on disk).<div>I've built chromium's base_unittests with -fprofile-arcs -ftest-coverage and the coverage's at-exit hook takes 22 seconds, </div>

<div>which is 44x more than I am willing to pay. </div><div>Most of the time is spent here: <br></div><div><div>#0  0x00007ffff3b034cd in msync () at ../sysdeps/unix/syscall-template.S:82</div><div>#1  0x0000000003a8c818 in llvm_gcda_end_file ()</div>

<div>#2  0x0000000003a8c914 in llvm_writeout_files ()</div><div>#3  0x00007ffff2f5e901 in __run_exit_handlers</div></div><div>The test depends on ~700 source files and so the profiling library calls msync ~700 times.</div>

<div>Full chromium depends on ~12000 source files, so we'll be dumping the coverage data for 5 minutes this way.<br></div><div><div>I understand that we have to support the lcov/gcov format (broken in may ways) and this may be the reason for being slow.<br>

</div><div>But I really need something much faster (and maybe simpler).</div></div><div><br></div><div>Is anyone planing any work on coverage in the nearest months?<br></div><div>If no, we'll probably cook something simple and gcov-independent. </div>

<div>Thoughts? </div><div><br></div><div>--kcc </div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Oct 3, 2013 at 6:47 PM, Kostya Serebryany <span dir="ltr"><<a href="mailto:kcc@google.com" target="_blank">kcc@google.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello, <div><br></div><div>I have few questions about coverage.</div><div><br></div><div>Is there any user-facing documentation for clang's "-coverage" flag?</div>

<div>The coverage instrumentation seems to happen before asan, and so if asan is also enabled <br>

</div><div>asan will instrument accesses to @__llvm_gcov_ctr.</div><div>This is undesirable and so we'd like to skip these accesses. </div><div>Looks like GEP around @__llvm_gcov_ctr have special metadata attached: </div>


<div><div>  %2 = getelementptr inbounds [4 x i64]* @__llvm_gcov_ctr, i64 0, i64 %1<br></div><div>  %3 = load i64* %2, align 8</div><div>  %4 = add i64 %3, 1</div><div>  store i64 %4, i64* %2, align 8</div></div><div>  ...</div>


<div>!1 = metadata !{...; [ DW_TAG_compile_unit ] ... /home/kcc/tmp/cond.cc] [DW_LANG_C_plus_plus]<br></div><div><br></div><div>Can we rely on having this metadata attached to @__llvm_gcov_ctr? <br></div><div>

Should we attach some metadata to the actual accesses as well, or simply find the corresponding GEP?</div>

<div><br></div><div>Finally, does anyone have performance numbers for coverage?</div><div>As of today it seems completely thread-hostile since __llvm_gcov_ctr is not thread-local.</div><div>A simple stress test shows that coverage slows down by 50x! </div>


<div><div>% cat ~/tmp/coverage_mt.cc </div><div>#include <pthread.h></div><div>__thread int x;</div><div>__attribute__((noinline))</div><div>void foo() {</div><div>  x++;</div><div>}</div><div><br></div><div>void *Thread(void *) {</div>


<div>  for (int i = 0; i < 100000000; i++)</div><div>    foo();</div><div>  return 0;</div><div>}</div><div><br></div><div>int main() {</div><div>  static const int kNumThreads = 16;</div><div>  pthread_t t[kNumThreads];</div>


<div>  for (int i = 0; i < kNumThreads; i++)</div><div>    pthread_create(&t[i], 0, Thread, 0);</div><div>  for (int i = 0; i < kNumThreads; i++)</div><div>    pthread_join(t[i], 0);</div><div>  return 0;</div>

<div>

}</div></div><div><br></div><div><div>% clang -O2 ~/tmp/coverage_mt.cc -lpthread  ; time ./a.out </div><div>TIME: real: 0.284; user: 3.560; system: 0.000</div><div>% clang -O2 ~/tmp/coverage_mt.cc -lpthread -coverage  ; time ./a.out </div>


<div>TIME: real: 13.327; user: 174.510; system: 0.000</div></div><div><br></div><div>Any principal objections against making __llvm_gcov_ctr thread-local, perhaps under a flag?</div><div><br></div><div>If anyone is curious, my intent is to enable running coverage and asan in one process.</div>


<div><br></div><div>Thanks, <br></div><div>--kcc</div></div>

</blockquote></div><br></div>

_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br></blockquote></div><br></div></body></html>