Tue May 2 08:09:13 PDT 2017

I've been trying to improve the parallelism of lldb but have run into an
odd roadblock.  I have the code at the point where it creates 40 worker
threads, and it stays that way because it has enough work to do.  However,
running 'top -d 1' shows that for the time in question, cpu load never gets
above 4-8 cpus (even though I have 40).

1. I tried mutrace, which measures mutex contention (I had to call
unsetenv("LD_PRELOAD") in main() so it wouldn't propagate to the process
being tested).  It indicated some minor contention, but not enough to be
the problem.  Regardless, I converted everything I could to lockfree
structures (TaskPool and ConstString) and it didn't help.

2. I tried strace, but I don't think strace can figure out how to trace
lldb.  It says it waits on a single futex for 8 seconds, and then is done.

I'm about to try lttng to trace all syscalls, but I was wondering if anyone
else had any ideas?  At one point I wondered if it was mmap kernel
semaphore contention, but that shouldn't affect faulting individual pages,
and I assume lldb doesn't call mmap all the time.

I'm getting a bit frustrated because lldb should be taking 1-2 seconds to
start up (it has ~45s of user+system work to do), but instead is taking
8-10, and I've been stuck there for a while.
