[lldb-dev] Lack of parallelism
Greg Clayton via lldb-dev
lldb-dev at lists.llvm.org
Tue May 2 12:43:17 PDT 2017
The other thing would be to try and move the demangler to use a custom allocator everywhere. Not sure what demangler you are using when you are doing these tests, but we can either use the native system one from the #include <cxxabi.h>, or the fast demangler in FastDemangle.cpp. If it is the latter, then we can probably optimize this.
The other thing to note is local files will be mmap'ed in and paging doesn't really show up on perf tests very well, so it will look like system time when the system is paging in pages from the symbol files as it reads them from memory. You could try disabling the mmap stuff in DataBufferLLVM.cpp and see if you see any difference. The call to llvm::MemoryBuffer::getFileSlice() takes a Volatile as its last argument. If you set this to true, we will read the file into memory instead of mmap'ing it. This will help you at least see if there is any component of the time that is due to mmap'ing. Currently we look to see if the file is local (not on a network mount). If it is local we mmap it.
> On May 2, 2017, at 12:31 PM, Scott Smith <scott.smith at purestorage.com> wrote:
> As it turns out, it was lock contention in the memory allocator. Using tcmalloc brought it from 8+ seconds down to 4.2.
> I think this didn't show up in mutrace because glibc's malloc doesn't use pthread mutexes.
> Greg, that joke about adding tcmalloc wholesale is looking less funny and more serious.... Or maybe it's enough to make it a cmake link option (use if present or use if requested).
> On Tue, May 2, 2017 at 8:42 AM, Jim Ingham <jingham at apple.com <mailto:jingham at apple.com>> wrote:
> I'm not sure about Linux, on OS X lldb will mmap the debug information rather that using straight reads. But that should just be once per loaded module.
> > On May 2, 2017, at 8:09 AM, Scott Smith via lldb-dev <lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>> wrote:
> > I've been trying to improve the parallelism of lldb but have run into an odd roadblock. I have the code at the point where it creates 40 worker threads, and it stays that way because it has enough work to do. However, running 'top -d 1' shows that for the time in question, cpu load never gets above 4-8 cpus (even though I have 40).
> > 1. I tried mutrace, which measures mutex contention (I had to call unsetenv("LD_PRELOAD") in main() so it wouldn't propagate to the process being tested). It indicated some minor contention, but not enough to be the problem. Regardless, I converted everything I could to lockfree structures (TaskPool and ConstString) and it didn't help.
> > 2. I tried strace, but I don't think strace can figure out how to trace lldb. It says it waits on a single futex for 8 seconds, and then is done.
> > I'm about to try lttng to trace all syscalls, but I was wondering if anyone else had any ideas? At one point I wondered if it was mmap kernel semaphore contention, but that shouldn't affect faulting individual pages, and I assume lldb doesn't call mmap all the time.
> > I'm getting a bit frustrated because lldb should be taking 1-2 seconds to start up (it has ~45s of user+system work to do), but instead is taking 8-10, and I've been stuck there for a while.
> > _______________________________________________
> > lldb-dev mailing list
> > lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the lldb-dev