[lldb-dev] Deadlock loading DWARF symbols
Frédéric Riss via lldb-dev
lldb-dev at lists.llvm.org
Wed Oct 14 20:12:57 PDT 2020
[I thought I had already sent this out weeks ago…]
> On Oct 2, 2020, at 2:13 PM, Greg Clayton via lldb-dev <lldb-dev at lists.llvm.org> wrote:
>
> Yes this is bad, and GetDescription() is used as a convenience to print out the module path (which might be a .o file within a .a file) and optionally architecture of the module. It probably shouldn't be taking the module lock as the only member variables that that GetDescription accesses are:
>
> Module::m_arch
> Module::m_file
> Module::m_object_name
>
> I would almost vote to take out the mutex lock in GetDescription() as the arch, file and name don't change after the module has been created. I am going to CC a few extra folks for discussion.
>
> Anyone else have any objections to removing the mutex in GetDescription? Seems like this deadlock is easy to trigger if you have DWARF with errors or warnings inside of it.
I remember having a discussion with Jim about a very similar issue and IIRC, he told me that the arch could change after a module is created. I don’t remember the reason off the top of my head.
I think we are hitting a very similar deadlock in the Swift REPL for slightly different reasons (same lock though).
Fred
> Greg
>
>
>> On Oct 2, 2020, at 6:50 AM, Dmitry Antipov via lldb-dev <lldb-dev at lists.llvm.org> wrote:
>>
>> I'm observing the following deadlock:
>>
>> One thread calls Module::PreloadSymbols() which takes m_mutex of this Module. Module::PreloadSymbols()
>> calls ManualDWARFIndex::Index(), which, in turn, creates thread pool and waits for all threads completion:
>>
>> (gdb)
>> #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7f67f176914c) at ../sysdeps/nptl/futex-internal.h:183
>> #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f67f17690c8, cond=0x7f67f1769120) at pthread_cond_wait.c:508
>> #2 __pthread_cond_wait (cond=0x7f67f1769120, mutex=0x7f67f17690c8) at pthread_cond_wait.c:638
>> #3 0x00007f67f3974890 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
>> #4 0x00007f67f4440c4b in std::condition_variable::wait<llvm::ThreadPool::wait()::<lambda()> > (__p=..., __lock=..., this=0x7f67f1769120)
>> at /usr/include/c++/10/condition_variable:108
>> #5 llvm::ThreadPool::wait (this=this at entry=0x7f67f1769060) at source/llvm/lib/Support/ThreadPool.cpp:72
>> #6 0x00007f67fc6fa3a6 in lldb_private::ManualDWARFIndex::Index (this=0x7f66fe87e950)
>> at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:94
>> #7 0x00007f67fc6b3825 in SymbolFileDWARF::PreloadSymbols (this=0x7f67de7af6f0) at /usr/include/c++/10/bits/unique_ptr.h:421
>> #8 0x00007f67fc1ee488 in lldb_private::Module::PreloadSymbols (this=0x7f67de79b620) at source/lldb/source/Core/Module.cpp:1397
>> #9 0x00007f67fc397a37 in lldb_private::Target::GetOrCreateModule (this=this at entry=0x96c7a0, module_spec=..., notify=notify at entry=true, error_ptr=error_ptr at entry=0x0)
>> at /usr/include/c++/10/bits/shared_ptr_base.h:1324
>> ...
>>
>> OTOH one of pool threads makes an attempt to lock Module's mutex:
>>
>> (gdb) bt
>> #0 __lll_lock_wait (futex=futex at entry=0x7f67de79b638, private=0) at lowlevellock.c:52
>> #1 0x00007f67fcd907f1 in __GI___pthread_mutex_lock (mutex=0x7f67de79b638) at ../nptl/pthread_mutex_lock.c:115
>> #2 0x00007f67fc1ed922 in __gthread_mutex_lock (__mutex=0x7f67de79b638) at /usr/include/c++/10/x86_64-redhat-linux/bits/gthr-default.h:749
>> #3 __gthread_recursive_mutex_lock (__mutex=0x7f67de79b638) at /usr/include/c++/10/x86_64-redhat-linux/bits/gthr-default.h:811
>> #4 std::recursive_mutex::lock (this=0x7f67de79b638) at /usr/include/c++/10/mutex:106
>> #5 std::lock_guard<std::recursive_mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /usr/include/c++/10/bits/std_mutex.h:159
>> #6 lldb_private::Module::GetDescription (this=this at entry=0x7f67de79b620, s=..., level=level at entry=lldb::eDescriptionLevelBrief)
>> at source/lldb/source/Core/Module.cpp:1083
>> #7 0x00007f67fc1f2070 in lldb_private::Module::ReportError (this=0x7f67de79b620, format=0x7f67fca03660 "DW_FORM_ref* DIE reference 0x%lx is outside of its CU")
>> at source/lldb/include/lldb/Utility/Stream.h:358
>> #8 0x00007f67fc6adfb4 in DWARFFormValue::Reference (this=this at entry=0x7f66f5ff29c0) at /usr/include/c++/10/bits/shared_ptr_base.h:1324
>> #9 0x00007f67fc6aaa77 in DWARFDebugInfoEntry::GetAttributes (this=this at entry=0x7f662e3580e0, cu=cu at entry=0x7f66ff6ebad0, attributes=...,
>> recurse=recurse at entry=DWARFBaseDIE::Recurse::yes, curr_depth=curr_depth at entry=0)
>> at source/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:439
>> #10 0x00007f67fc6f8f8f in DWARFDebugInfoEntry::GetAttributes (recurse=DWARFBaseDIE::Recurse::yes, attrs=..., cu=0x7f66ff6ebad0, this=0x7f662e3580e0)
>> at source/lldb/source/./Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h:54
>> #11 lldb_private::ManualDWARFIndex::IndexUnitImpl (unit=..., cu_language=cu_language at entry=lldb::eLanguageTypeRust, set=...)
>> at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:180
>> #12 0x00007f67fc6f96b7 in lldb_private::ManualDWARFIndex::IndexUnit (this=<optimized out>, unit=..., dwp=0x0, set=...)
>> at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:126
>> ...
>>
>> So this is a deadlock because thread pool is created with module lock held, and one (or more,
>> I'm observing two) pool thread(s) might want to grab the same lock to issue an error message.
>>
>> Commenting out the whole body of Module::GetDescription() makes this deadlock disappear.
>>
>> I'm not an expert in this area, but the whole thing looks like the Module object should have more
>> fine-granted locking rather than the only std::recursive_mutex for all synchronization purposes.
>>
>> Dmitry
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
More information about the lldb-dev
mailing list