<font size=2 face="sans-serif">Hi Jim</font>

<br>

<br><font size=2 face="sans-serif">I was afraid someone would say that

but I've done some digging and found a difference in the core files I get

generated by gcore to those generated by a crash or abort.</font>

<br>

<br><font size=2 face="sans-serif">Most of the core files have one SIGINFO

structure in the core, I think it belongs to the preceding thread (the

one that caught the signal).</font>

<br><font size=2 face="sans-serif">In the core files generated by gcore

all of the threads have a SIGINFO structure following their PRSTATUS structure.

In the non-gcore files the value of info.si_signo in the PRSTATUS structure

is a signal number. In the gcore file this is actually 0 but the SIGINFO

structure following PRSTATUS has an si_signo value of 19.</font>

<br>

<br><font size=2 face="sans-serif">Looking at it with eu-readelf shows:</font>

<br>

<br><font size=2 face="Menlo-Regular">  CORE      

          336  PRSTATUS</font>

<br><font size=2 face="Menlo-Regular">    info.si_signo: 0, info.si_code:

0, info.si_errno: 0, cursig: 0</font>

<br><font size=2 face="Menlo-Regular">    sigpend: <></font>

<br><font size=2 face="Menlo-Regular">    sighold: <></font>

<br><font size=2 face="sans-serif">... lots of registsers...</font>

<br><font size=2 face="Menlo-Regular">  CORE      

          128  SIGINFO</font>

<br><font size=2 face="Menlo-Regular">    si_signo: 19, si_errno:

0, si_code: 0</font>

<br><font size=2 face="Menlo-Regular">    sender PID: 0, sender

UID: 0</font>

<br>

<br><font size=2 face="sans-serif">I think gcore is being clever. It's

including the "real" signal number the running thread had received

at the time the core was taken (info.si_signo is 0) but also the signal

it had used to interrupt the thread and gather it's state. The value in

PRSTATUS info.si_signo is the signal number that ends up in m_signo in

ThreadElfCore and ultimately is looked for in the set of signals lldb should

stop on in UnixSignals::GetShouldStop. 0 is not found in that set since

there isn't a signal 0. I think gcore is doing all this so that it preserves

the real signal state the process had before gcore attached to it, I guess

in case you are trying to debug something to do with signals and need to

see that state. (That's a bit of a guess mind you.)</font>

<br>

<br><font size=2 face="sans-serif">I can think of three solutions:</font>

<br>

<br><font size=2 face="sans-serif">- Read the signal information from the

SIGINFO block for a thread if it's present. Core files generated by abort

or a crash only seem to have a SIGINFO for one thread which looks like

it's the one that received/trigger the signal in the first place. This

means adding a something to parse that block out of the elf core as well

as PRSTATUS and override the state from PRSTATUS if we see it. SIGINFO

 always seems to come after PRSTATUS and probably has to as PRSTATUS

contains the pid and identifies that there is a new thread in the core

so if SIGINFO is found that signal number will just replace the first one.<br>

</font>

<br><font size=2 face="sans-serif">- Never allow a threads signal number

to be 0 when it comes form an elf core dump. (This is probably as much

of a band aid as the first solution.)<br>

</font>

<br><font size=2 face="sans-serif">- Stick with the first solution of saying

that we can never resume a core file. The only thing in this solutions

favour is that it means the "real" thread state that gcore tried

to preserve is known to lldb. Once the core file is loaded typing continue

does result in an error message telling you that you can't resume from

a core file.</font>

<br>

<br><font size=2 face="sans-serif">I'll have a go at prototyping the solution

to read the SIGINFO structure but I'd appreciate any thoughts on which

is the "correct" fix.</font>

<br>

<br><font size=2 face="sans-serif">Thanks,</font>

<br>

<br>

<br><font size=2 face="sans-serif">Howard Hellyer </font>

<br><font size=2 face="sans-serif">IBM Runtime Technologies, IBM Systems

        </font>

<br><font size=2 face="sans-serif"><br>

</font>

<br>

<br>

<br>

<br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">Jim Ingham <jingham@apple.com></font>

<br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">Howard Hellyer/UK/IBM@IBMGB</font>

<br><font size=1 color=#5f5f5f face="sans-serif">Cc:      

 </font><font size=1 face="sans-serif">lldb-dev@lists.llvm.org</font>

<br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">10/11/2016 18:48</font>

<br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">Re: [lldb-dev]

LLDB hang loading Linux core files from live processes (Bug 26322)</font>

<br><font size=1 color=#5f5f5f face="sans-serif">Sent by:    

   </font><font size=1 face="sans-serif">jingham@apple.com</font>

<br>

<hr noshade>

<br>

<br>

<br><tt><font size=2>I think that approach is kind of a bandaid.  <br>

<br>

Core files can't resume, so it would be better to figure out why telling

a core file which can't resume to resume caused us to go into a tail spin.

 That should just fall out of WillResume returning false or some other

better general signal.  Special-casing core files seems a bit of a

hack.<br>

<br>

That being said, if nobody has time to make a better solution, a bandaid

is better than bleeding...<br>

<br>

Jim<br>

</font></tt>

<br>

<br><font size=2 face="sans-serif"><br>

Unless stated otherwise above:<br>

IBM United Kingdom Limited - Registered in England and Wales with number

741598. <br>

Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6

3AU<br>

</font>