[lldb-dev] LLDB hang loading Linux core files from live processes (Bug 26322)
Howard Hellyer via lldb-dev
lldb-dev at lists.llvm.org
Fri Nov 11 05:36:17 PST 2016
I was afraid someone would say that but I've done some digging and found a
difference in the core files I get generated by gcore to those generated
by a crash or abort.
Most of the core files have one SIGINFO structure in the core, I think it
belongs to the preceding thread (the one that caught the signal).
In the core files generated by gcore all of the threads have a SIGINFO
structure following their PRSTATUS structure. In the non-gcore files the
value of info.si_signo in the PRSTATUS structure is a signal number. In
the gcore file this is actually 0 but the SIGINFO structure following
PRSTATUS has an si_signo value of 19.
Looking at it with eu-readelf shows:
CORE 336 PRSTATUS
info.si_signo: 0, info.si_code: 0, info.si_errno: 0, cursig: 0
... lots of registsers...
CORE 128 SIGINFO
si_signo: 19, si_errno: 0, si_code: 0
sender PID: 0, sender UID: 0
I think gcore is being clever. It's including the "real" signal number the
running thread had received at the time the core was taken (info.si_signo
is 0) but also the signal it had used to interrupt the thread and gather
it's state. The value in PRSTATUS info.si_signo is the signal number that
ends up in m_signo in ThreadElfCore and ultimately is looked for in the
set of signals lldb should stop on in UnixSignals::GetShouldStop. 0 is not
found in that set since there isn't a signal 0. I think gcore is doing all
this so that it preserves the real signal state the process had before
gcore attached to it, I guess in case you are trying to debug something to
do with signals and need to see that state. (That's a bit of a guess mind
I can think of three solutions:
- Read the signal information from the SIGINFO block for a thread if it's
present. Core files generated by abort or a crash only seem to have a
SIGINFO for one thread which looks like it's the one that received/trigger
the signal in the first place. This means adding a something to parse that
block out of the elf core as well as PRSTATUS and override the state from
PRSTATUS if we see it. SIGINFO always seems to come after PRSTATUS and
probably has to as PRSTATUS contains the pid and identifies that there is
a new thread in the core so if SIGINFO is found that signal number will
just replace the first one.
- Never allow a threads signal number to be 0 when it comes form an elf
core dump. (This is probably as much of a band aid as the first solution.)
- Stick with the first solution of saying that we can never resume a core
file. The only thing in this solutions favour is that it means the "real"
thread state that gcore tried to preserve is known to lldb. Once the core
file is loaded typing continue does result in an error message telling you
that you can't resume from a core file.
I'll have a go at prototyping the solution to read the SIGINFO structure
but I'd appreciate any thoughts on which is the "correct" fix.
IBM Runtime Technologies, IBM Systems
From: Jim Ingham <jingham at apple.com>
To: Howard Hellyer/UK/IBM at IBMGB
Cc: lldb-dev at lists.llvm.org
Date: 10/11/2016 18:48
Subject: Re: [lldb-dev] LLDB hang loading Linux core files from
live processes (Bug 26322)
Sent by: jingham at apple.com
I think that approach is kind of a bandaid.
Core files can't resume, so it would be better to figure out why telling a
core file which can't resume to resume caused us to go into a tail spin.
That should just fall out of WillResume returning false or some other
better general signal. Special-casing core files seems a bit of a hack.
That being said, if nobody has time to make a better solution, a bandaid
is better than bleeding...
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the lldb-dev