[lldb-dev] Deadlock loading DWARF symbols
Greg Clayton via lldb-dev
lldb-dev at lists.llvm.org
Fri Oct 2 14:13:53 PDT 2020
Yes this is bad, and GetDescription() is used as a convenience to print out the module path (which might be a .o file within a .a file) and optionally architecture of the module. It probably shouldn't be taking the module lock as the only member variables that that GetDescription accesses are:
Module::m_arch
Module::m_file
Module::m_object_name
I would almost vote to take out the mutex lock in GetDescription() as the arch, file and name don't change after the module has been created. I am going to CC a few extra folks for discussion.
Anyone else have any objections to removing the mutex in GetDescription? Seems like this deadlock is easy to trigger if you have DWARF with errors or warnings inside of it.
Greg
> On Oct 2, 2020, at 6:50 AM, Dmitry Antipov via lldb-dev <lldb-dev at lists.llvm.org> wrote:
>
> I'm observing the following deadlock:
>
> One thread calls Module::PreloadSymbols() which takes m_mutex of this Module. Module::PreloadSymbols()
> calls ManualDWARFIndex::Index(), which, in turn, creates thread pool and waits for all threads completion:
>
> (gdb)
> #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7f67f176914c) at ../sysdeps/nptl/futex-internal.h:183
> #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f67f17690c8, cond=0x7f67f1769120) at pthread_cond_wait.c:508
> #2 __pthread_cond_wait (cond=0x7f67f1769120, mutex=0x7f67f17690c8) at pthread_cond_wait.c:638
> #3 0x00007f67f3974890 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
> #4 0x00007f67f4440c4b in std::condition_variable::wait<llvm::ThreadPool::wait()::<lambda()> > (__p=..., __lock=..., this=0x7f67f1769120)
> at /usr/include/c++/10/condition_variable:108
> #5 llvm::ThreadPool::wait (this=this at entry=0x7f67f1769060) at source/llvm/lib/Support/ThreadPool.cpp:72
> #6 0x00007f67fc6fa3a6 in lldb_private::ManualDWARFIndex::Index (this=0x7f66fe87e950)
> at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:94
> #7 0x00007f67fc6b3825 in SymbolFileDWARF::PreloadSymbols (this=0x7f67de7af6f0) at /usr/include/c++/10/bits/unique_ptr.h:421
> #8 0x00007f67fc1ee488 in lldb_private::Module::PreloadSymbols (this=0x7f67de79b620) at source/lldb/source/Core/Module.cpp:1397
> #9 0x00007f67fc397a37 in lldb_private::Target::GetOrCreateModule (this=this at entry=0x96c7a0, module_spec=..., notify=notify at entry=true, error_ptr=error_ptr at entry=0x0)
> at /usr/include/c++/10/bits/shared_ptr_base.h:1324
> ...
>
> OTOH one of pool threads makes an attempt to lock Module's mutex:
>
> (gdb) bt
> #0 __lll_lock_wait (futex=futex at entry=0x7f67de79b638, private=0) at lowlevellock.c:52
> #1 0x00007f67fcd907f1 in __GI___pthread_mutex_lock (mutex=0x7f67de79b638) at ../nptl/pthread_mutex_lock.c:115
> #2 0x00007f67fc1ed922 in __gthread_mutex_lock (__mutex=0x7f67de79b638) at /usr/include/c++/10/x86_64-redhat-linux/bits/gthr-default.h:749
> #3 __gthread_recursive_mutex_lock (__mutex=0x7f67de79b638) at /usr/include/c++/10/x86_64-redhat-linux/bits/gthr-default.h:811
> #4 std::recursive_mutex::lock (this=0x7f67de79b638) at /usr/include/c++/10/mutex:106
> #5 std::lock_guard<std::recursive_mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /usr/include/c++/10/bits/std_mutex.h:159
> #6 lldb_private::Module::GetDescription (this=this at entry=0x7f67de79b620, s=..., level=level at entry=lldb::eDescriptionLevelBrief)
> at source/lldb/source/Core/Module.cpp:1083
> #7 0x00007f67fc1f2070 in lldb_private::Module::ReportError (this=0x7f67de79b620, format=0x7f67fca03660 "DW_FORM_ref* DIE reference 0x%lx is outside of its CU")
> at source/lldb/include/lldb/Utility/Stream.h:358
> #8 0x00007f67fc6adfb4 in DWARFFormValue::Reference (this=this at entry=0x7f66f5ff29c0) at /usr/include/c++/10/bits/shared_ptr_base.h:1324
> #9 0x00007f67fc6aaa77 in DWARFDebugInfoEntry::GetAttributes (this=this at entry=0x7f662e3580e0, cu=cu at entry=0x7f66ff6ebad0, attributes=...,
> recurse=recurse at entry=DWARFBaseDIE::Recurse::yes, curr_depth=curr_depth at entry=0)
> at source/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:439
> #10 0x00007f67fc6f8f8f in DWARFDebugInfoEntry::GetAttributes (recurse=DWARFBaseDIE::Recurse::yes, attrs=..., cu=0x7f66ff6ebad0, this=0x7f662e3580e0)
> at source/lldb/source/./Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h:54
> #11 lldb_private::ManualDWARFIndex::IndexUnitImpl (unit=..., cu_language=cu_language at entry=lldb::eLanguageTypeRust, set=...)
> at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:180
> #12 0x00007f67fc6f96b7 in lldb_private::ManualDWARFIndex::IndexUnit (this=<optimized out>, unit=..., dwp=0x0, set=...)
> at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:126
> ...
>
> So this is a deadlock because thread pool is created with module lock held, and one (or more,
> I'm observing two) pool thread(s) might want to grab the same lock to issue an error message.
>
> Commenting out the whole body of Module::GetDescription() makes this deadlock disappear.
>
> I'm not an expert in this area, but the whole thing looks like the Module object should have more
> fine-granted locking rather than the only std::recursive_mutex for all synchronization purposes.
>
> Dmitry
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
More information about the lldb-dev
mailing list