[lldb-dev] Help fixing deadlock in DWARF symbol preloading
Greg Clayton via lldb-dev
lldb-dev at lists.llvm.org
Mon Feb 8 11:12:13 PST 2021
A simpler solution is to remove the lock in Module::GetDescription(...) all of the data that is used to dump the module description is set correctly (the file, the arch and the object name for BSD archives)) very early on when a module is created before anyone should be making symbol or debug info queries or parsing.
> On Feb 5, 2021, at 1:24 PM, Jorge Gorbe Moya via lldb-dev <lldb-dev at lists.llvm.org> wrote:
>
> I just started looking at how to do this (having a separate mutex for the description) and I think I found another bug. Or maybe I'm missing some assumption.
>
> On one hand, Module::SetArchitecture won't assign the new value if m_arch is already valid, just return m_arch.IsCompatibleWith(new_arch). On the other hand, in Module::MergeArchitecture we have both code that assumes that SetArchitecture replaces the previous value:
>
> if (!m_arch <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/include/lldb/Core/Module.h;rcl=355613052;l=939>.IsCompatibleMatch <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Utility/ArchSpec.cpp;rcl=355613052;l=934>(arch_spec <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1620?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>)) {
> // The new architecture is different, we just need to replace it.
> return SetArchitecture <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1544?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>(arch_spec <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1620?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>);
> }
>
> and right after it we have code that works around the fact that SetArchitecture doesn't replace the previous value sometimes
>
> // Merge bits from arch_spec into "merged_arch" and set our architecture.
> ArchSpec <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/include/lldb/Utility/ArchSpec.h;rcl=355613052;l=33> merged_arch <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;bpv=1;bpt=1;l=1633?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental&gsn=merged_arch&gs=kythe%3A%2F%2Fgoogle3%3Flang%3Dc%252B%252B%3Fpath%3Dthird_party%2Fllvm%2Fllvm-project%2Flldb%2Fsource%2FCore%2FModule.cpp%23f1bDLAwnC-gD__Xt6Bs-4h5T-ePUhWBq8L6dQxa8QEA>(m_arch <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/include/lldb/Core/Module.h;rcl=355613052;l=939>);
> merged_arch <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1633?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>.MergeFrom <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Utility/ArchSpec.cpp;rcl=355613052;l=801>(arch_spec <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1620?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>);
> // SetArchitecture() is a no-op if m_arch is already valid.
> m_arch <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/include/lldb/Core/Module.h;rcl=355613052;l=939> = <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/include/lldb/Utility/ArchSpec.h;rcl=355613052;l=33> ArchSpec <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Utility/ArchSpec.cpp;rcl=355613052;l=510>();
> return SetArchitecture <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1544?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>(merged_arch <https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1633?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>);
>
> My guess is that the first of the snippets above has a bug (it intends to replace the architecture but doesn't) and that replacing a module's architecture with an incompatible one is something that doesn't happen often and the bug went unnoticed. Does this make sense?
>
> On Thu, Feb 4, 2021 at 5:38 PM Raphael Isemann <teemperor at gmail.com <mailto:teemperor at gmail.com>> wrote:
> We could also just give the Module a std::string with the description
> and update it in the few places where we actually update it. The
> m_arch already has a setter in place that just needs to be used in a
> few more places, so the infrastructure is kind of already there (at
> least for m_arch). The description would just have its own mutex.
>
>
> Am Fr., 5. Feb. 2021 um 00:39 Uhr schrieb Jorge Gorbe Moya via
> lldb-dev <lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>>:
> >
> > Wouldn't it be preferable to try_lock in GetDescription (which is the one currently acquiring the mutex) instead? ReportError doesn't touch any mutex itself and will happily report the rest of the error if GetDescription bails out. For the test case I sent it would look like this:
> >
> > error: {0x0000000b}: invalid abbreviation code 123, please file a bug and attach the file at the start of this error message
> > error: {0x0000000b}: invalid abbreviation code 123, please file a bug and attach the file at the start of this error message
> > error: {0x0000000b}: invalid abbreviation code 123, please file a bug and attach the file at the start of this error message
> >
> > which is way better than a deadlock IMO.
> >
> >
> >
> >
> > On Thu, Feb 4, 2021 at 12:16 PM Pavel Labath <pavel at labath.sk <mailto:pavel at labath.sk>> wrote:
> >>
> >> Please have a look at
> >> <https://lists.llvm.org/pipermail/lldb-dev/2020-October/016479.html <https://lists.llvm.org/pipermail/lldb-dev/2020-October/016479.html>>,
> >> which is the last time this came up.
> >>
> >>
> >> One quick'n'dirty solution would be to have `Module::ReportError` _try_
> >> to get the module lock, and if it fails, just bail out. That obviously
> >> means you won't get to see the error message which triggerred the
> >> deadlock (though we could also play around with that and try printing
> >> the error message without the module description or something), but it
> >> will at least get you past that point...
> >>
> >> pl
> >>
> >> On 04/02/2021 21:04, Jorge Gorbe Moya via lldb-dev wrote:
> >> > Hi,
> >> >
> >> > I've found a deadlock in lldb (see attached test case, you can build it
> >> > with just `clang -o test test.s`), but I'm a total newbie and I have no
> >> > idea what's the right way to fix it.
> >> >
> >> > The problem happens when an error is found during DIE extraction when
> >> > preloading symbols. As far as I can tell, it goes like this:
> >> >
> >> > 1. Module::PreloadSymbols locks Module::m_mutex
> >> > 2. A few layers below it, we end up in ManualDWARFIndex::Index, which
> >> > dispatches DIE extractions to a thread pool:
> >> >
> >> > |for (size_t i = 0; i < units_to_index.size(); ++i)
> >> > pool.async(extract_fn, i); pool.wait(); |
> >> >
> >> > 3. extract_fn in the snippet above ends up executing
> >> > DWARFDebugInfoEntry::Extract and when there's an error during
> >> > extraction, Module::GetDescription is called while generating the error
> >> > message.
> >> > 4. Module::GetDescription tries to acquire Module::m_mutex from a
> >> > different thread, while the main thread has the mutex already locked and
> >> > it's waiting for DIE extraction to end, causing a deadlock.
> >> >
> >> > If we make Module::GetDescription not lock the problem disappears, so
> >> > the diagnosis looks correct, but I don't know what would be the right
> >> > way to fix it. Module::GetDescription looks more or less safe to call
> >> > without locking: it just prints m_arch, m_file, and m_object_name to a
> >> > string, and those look like fields that wouldn't change after the Module
> >> > is initialized, so maybe it's okay? But I feel like there must be a
> >> > better solution anyway. Any advice?
> >> >
> >> > Best,
> >> > Jorge
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > lldb-dev mailing list
> >> > lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>
> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev>
> >> >
> >>
> > _______________________________________________
> > lldb-dev mailing list
> > lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev>
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20210208/a9b9e595/attachment-0001.html>
More information about the lldb-dev
mailing list