[lldb-dev] Help fixing deadlock in DWARF symbol preloading
Jorge Gorbe Moya via lldb-dev
lldb-dev at lists.llvm.org
Thu Feb 4 15:38:47 PST 2021
Wouldn't it be preferable to try_lock in GetDescription (which is the one
currently acquiring the mutex) instead? ReportError doesn't touch any mutex
itself and will happily report the rest of the error if GetDescription
bails out. For the test case I sent it would look like this:
error: {0x0000000b}: invalid abbreviation code 123, please file a bug and
attach the file at the start of this error message
error: {0x0000000b}: invalid abbreviation code 123, please file a bug and
attach the file at the start of this error message
error: {0x0000000b}: invalid abbreviation code 123, please file a bug and
attach the file at the start of this error message
which is way better than a deadlock IMO.
On Thu, Feb 4, 2021 at 12:16 PM Pavel Labath <pavel at labath.sk> wrote:
> Please have a look at
> <https://lists.llvm.org/pipermail/lldb-dev/2020-October/016479.html>,
> which is the last time this came up.
>
>
> One quick'n'dirty solution would be to have `Module::ReportError` _try_
> to get the module lock, and if it fails, just bail out. That obviously
> means you won't get to see the error message which triggerred the
> deadlock (though we could also play around with that and try printing
> the error message without the module description or something), but it
> will at least get you past that point...
>
> pl
>
> On 04/02/2021 21:04, Jorge Gorbe Moya via lldb-dev wrote:
> > Hi,
> >
> > I've found a deadlock in lldb (see attached test case, you can build it
> > with just `clang -o test test.s`), but I'm a total newbie and I have no
> > idea what's the right way to fix it.
> >
> > The problem happens when an error is found during DIE extraction when
> > preloading symbols. As far as I can tell, it goes like this:
> >
> > 1. Module::PreloadSymbols locks Module::m_mutex
> > 2. A few layers below it, we end up in ManualDWARFIndex::Index, which
> > dispatches DIE extractions to a thread pool:
> >
> > |for (size_t i = 0; i < units_to_index.size(); ++i)
> > pool.async(extract_fn, i); pool.wait(); |
> >
> > 3. extract_fn in the snippet above ends up executing
> > DWARFDebugInfoEntry::Extract and when there's an error during
> > extraction, Module::GetDescription is called while generating the error
> > message.
> > 4. Module::GetDescription tries to acquire Module::m_mutex from a
> > different thread, while the main thread has the mutex already locked and
> > it's waiting for DIE extraction to end, causing a deadlock.
> >
> > If we make Module::GetDescription not lock the problem disappears, so
> > the diagnosis looks correct, but I don't know what would be the right
> > way to fix it. Module::GetDescription looks more or less safe to call
> > without locking: it just prints m_arch, m_file, and m_object_name to a
> > string, and those look like fields that wouldn't change after the Module
> > is initialized, so maybe it's okay? But I feel like there must be a
> > better solution anyway. Any advice?
> >
> > Best,
> > Jorge
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > lldb-dev mailing list
> > lldb-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20210204/dda6724d/attachment-0001.html>
More information about the lldb-dev
mailing list