[lldb-dev] LLDB hang loading Linux core files from live processes (Bug 26322)

Fri Nov 11 05:36:17 PST 2016

Hi Jim

I was afraid someone would say that but I've done some digging and found a 
difference in the core files I get generated by gcore to those generated 
by a crash or abort.

Most of the core files have one SIGINFO structure in the core, I think it 
belongs to the preceding thread (the one that caught the signal).
In the core files generated by gcore all of the threads have a SIGINFO 
structure following their PRSTATUS structure. In the non-gcore files the 
value of info.si_signo in the PRSTATUS structure is a signal number. In 
the gcore file this is actually 0 but the SIGINFO structure following 
PRSTATUS has an si_signo value of 19.

Looking at it with eu-readelf shows:

  CORE                 336  PRSTATUS
    info.si_signo: 0, info.si_code: 0, info.si_errno: 0, cursig: 0
    sigpend: <>
    sighold: <>
... lots of registsers...
  CORE                 128  SIGINFO
    si_signo: 19, si_errno: 0, si_code: 0
    sender PID: 0, sender UID: 0

I think gcore is being clever. It's including the "real" signal number the 
running thread had received at the time the core was taken (info.si_signo 
is 0) but also the signal it had used to interrupt the thread and gather 
it's state. The value in PRSTATUS info.si_signo is the signal number that 
ends up in m_signo in ThreadElfCore and ultimately is looked for in the 
set of signals lldb should stop on in UnixSignals::GetShouldStop. 0 is not 
found in that set since there isn't a signal 0. I think gcore is doing all 
this so that it preserves the real signal state the process had before 
gcore attached to it, I guess in case you are trying to debug something to 
do with signals and need to see that state. (That's a bit of a guess mind 
you.)

I can think of three solutions:

- Read the signal information from the SIGINFO block for a thread if it's 
present. Core files generated by abort or a crash only seem to have a 
SIGINFO for one thread which looks like it's the one that received/trigger 
the signal in the first place. This means adding a something to parse that 
block out of the elf core as well as PRSTATUS and override the state from 
PRSTATUS if we see it. SIGINFO  always seems to come after PRSTATUS and 
probably has to as PRSTATUS contains the pid and identifies that there is 
a new thread in the core so if SIGINFO is found that signal number will 
just replace the first one.

- Never allow a threads signal number to be 0 when it comes form an elf 
core dump. (This is probably as much of a band aid as the first solution.)

- Stick with the first solution of saying that we can never resume a core 
file. The only thing in this solutions favour is that it means the "real" 
thread state that gcore tried to preserve is known to lldb. Once the core 
file is loaded typing continue does result in an error message telling you 
that you can't resume from a core file.

I'll have a go at prototyping the solution to read the SIGINFO structure 
but I'd appreciate any thoughts on which is the "correct" fix.

Thanks,

Howard Hellyer 
IBM Runtime Technologies, IBM Systems 

From:   Jim Ingham <jingham at apple.com>
To:     Howard Hellyer/UK/IBM at IBMGB
Cc:     lldb-dev at lists.llvm.org
Date:   10/11/2016 18:48
Subject:        Re: [lldb-dev] LLDB hang loading Linux core files from 
live processes (Bug 26322)
Sent by:        jingham at apple.com

I think that approach is kind of a bandaid. 

Core files can't resume, so it would be better to figure out why telling a 
core file which can't resume to resume caused us to go into a tail spin. 
That should just fall out of WillResume returning false or some other 
better general signal.  Special-casing core files seems a bit of a hack.

That being said, if nobody has time to make a better solution, a bandaid 
is better than bleeding...

Jim

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20161111/6d221ca0/attachment-0001.html>