[lldb-dev] Help fixing deadlock in DWARF symbol preloading

Jorge Gorbe Moya via lldb-dev lldb-dev at lists.llvm.org
Fri Feb 5 13:24:47 PST 2021


I just started looking at how to do this (having a separate mutex for the
description) and I think I found another bug. Or maybe I'm missing some
assumption.

On one hand, Module::SetArchitecture won't assign the new value if m_arch
is already valid, just return m_arch.IsCompatibleWith(new_arch). On the
other hand, in Module::MergeArchitecture we have both code that assumes
that SetArchitecture replaces the previous value:

if (!m_arch
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/include/lldb/Core/Module.h;rcl=355613052;l=939>
.IsCompatibleMatch
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Utility/ArchSpec.cpp;rcl=355613052;l=934>
(arch_spec
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1620?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>))
{
// The new architecture is different, we just need to replace it.
return SetArchitecture
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1544?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>
(arch_spec
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1620?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>);

}

and right after it we have code that works around the fact that
SetArchitecture doesn't replace the previous value sometimes

// Merge bits from arch_spec into "merged_arch" and set our architecture.
ArchSpec
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/include/lldb/Utility/ArchSpec.h;rcl=355613052;l=33>
merged_arch
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;bpv=1;bpt=1;l=1633?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental&gsn=merged_arch&gs=kythe%3A%2F%2Fgoogle3%3Flang%3Dc%252B%252B%3Fpath%3Dthird_party%2Fllvm%2Fllvm-project%2Flldb%2Fsource%2FCore%2FModule.cpp%23f1bDLAwnC-gD__Xt6Bs-4h5T-ePUhWBq8L6dQxa8QEA>
(m_arch
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/include/lldb/Core/Module.h;rcl=355613052;l=939>);

merged_arch
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1633?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>
.MergeFrom
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Utility/ArchSpec.cpp;rcl=355613052;l=801>
(arch_spec
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1620?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>);

// SetArchitecture() is a no-op if m_arch is already valid.
m_arch
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/include/lldb/Core/Module.h;rcl=355613052;l=939>
=
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/include/lldb/Utility/ArchSpec.h;rcl=355613052;l=33>
ArchSpec
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Utility/ArchSpec.cpp;rcl=355613052;l=510>();

return SetArchitecture
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1544?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>
(merged_arch
<https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/Module.cpp;l=1633?q=IsCompatibleMatch%20lldb&ct=os&sq=package:piper%20file:%2F%2Fdepot%2Fgoogle3%20-file:google3%2Fexperimental>
);

My guess is that the first of the snippets above has a bug (it intends to
replace the architecture but doesn't) and that replacing a module's
architecture with an incompatible one is something that doesn't happen
often and the bug went unnoticed. Does this make sense?

On Thu, Feb 4, 2021 at 5:38 PM Raphael Isemann <teemperor at gmail.com> wrote:

> We could also just give the Module a std::string with the description
> and update it in the few places where we actually update it. The
> m_arch already has a setter in place that just needs to be used in a
> few more places, so the infrastructure is kind of already there (at
> least for m_arch). The description would just have its own mutex.
>
>
> Am Fr., 5. Feb. 2021 um 00:39 Uhr schrieb Jorge Gorbe Moya via
> lldb-dev <lldb-dev at lists.llvm.org>:
> >
> > Wouldn't it be preferable to try_lock in GetDescription (which is the
> one currently acquiring the mutex) instead? ReportError doesn't touch any
> mutex itself and will happily report the rest of the error if
> GetDescription bails out. For the test case I sent it would look like this:
> >
> > error:  {0x0000000b}: invalid abbreviation code 123, please file a bug
> and attach the file at the start of this error message
> > error:  {0x0000000b}: invalid abbreviation code 123, please file a bug
> and attach the file at the start of this error message
> > error:  {0x0000000b}: invalid abbreviation code 123, please file a bug
> and attach the file at the start of this error message
> >
> > which is way better than a deadlock IMO.
> >
> >
> >
> >
> > On Thu, Feb 4, 2021 at 12:16 PM Pavel Labath <pavel at labath.sk> wrote:
> >>
> >> Please have a look at
> >> <https://lists.llvm.org/pipermail/lldb-dev/2020-October/016479.html>,
> >> which is the last time this came up.
> >>
> >>
> >> One quick'n'dirty solution would be to have `Module::ReportError` _try_
> >> to get the module lock, and if it fails, just bail out. That obviously
> >> means you won't get to see the error message which triggerred the
> >> deadlock (though we could also play around with that and try printing
> >> the error message without the module description or something), but it
> >> will at least get you past that point...
> >>
> >> pl
> >>
> >> On 04/02/2021 21:04, Jorge Gorbe Moya via lldb-dev wrote:
> >> > Hi,
> >> >
> >> > I've found a deadlock in lldb (see attached test case, you can build
> it
> >> > with just `clang -o test test.s`), but I'm a total newbie and I have
> no
> >> > idea what's the right way to fix it.
> >> >
> >> > The problem happens when an error is found during DIE extraction when
> >> > preloading symbols. As far as I can tell, it goes like this:
> >> >
> >> > 1. Module::PreloadSymbols locks Module::m_mutex
> >> > 2. A few layers below it, we end up in ManualDWARFIndex::Index, which
> >> > dispatches DIE extractions to a thread pool:
> >> >
> >> > |for (size_t i = 0; i < units_to_index.size(); ++i)
> >> > pool.async(extract_fn, i); pool.wait(); |
> >> >
> >> > 3. extract_fn in the snippet above ends up executing
> >> > DWARFDebugInfoEntry::Extract and when there's an error during
> >> > extraction, Module::GetDescription is called while generating the
> error
> >> > message.
> >> > 4. Module::GetDescription tries to acquire Module::m_mutex from a
> >> > different thread, while the main thread has the mutex already locked
> and
> >> > it's waiting for DIE extraction to end, causing a deadlock.
> >> >
> >> > If we make Module::GetDescription not lock the problem disappears, so
> >> > the diagnosis looks correct, but I don't know what would be the right
> >> > way to fix it. Module::GetDescription looks more or less safe to call
> >> > without locking: it just prints m_arch, m_file, and m_object_name to a
> >> > string, and those look like fields that wouldn't change after the
> Module
> >> > is initialized, so maybe it's okay? But I feel like there must be a
> >> > better solution anyway. Any advice?
> >> >
> >> > Best,
> >> > Jorge
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > lldb-dev mailing list
> >> > lldb-dev at lists.llvm.org
> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> >> >
> >>
> > _______________________________________________
> > lldb-dev mailing list
> > lldb-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20210205/c06f9ce0/attachment.html>


More information about the lldb-dev mailing list