[llvm-bugs] [Bug 52165] New: lldb incorrectly unwinds from signal handlers on AArch64 Linux

Wed Oct 13 06:08:03 PDT 2021

https://bugs.llvm.org/show_bug.cgi?id=52165

            Bug ID: 52165
           Summary: lldb incorrectly unwinds from signal handlers on
                    AArch64 Linux
           Product: lldb
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: All Bugs
          Assignee: lldb-dev at lists.llvm.org
          Reporter: david.spickett at linaro.org
                CC: jdevlieghere at apple.com, llvm-bugs at lists.llvm.org

TestHandleAbort.py fails on AArch64 Ubuntu Focal because we're using frame
information from the stack when trying to unwind from __kernel_rt_sigreturn. We
should be using register values from the signal context instead.

The explanation:

This is the backtrace when the signal is first raised:
(lldb) bt
* thread #1, name = 'a.out', stop reason = signal SIGABRT
  * frame #0: 0x0000fffff7bda138 libc.so.6`raise + 224
    frame #1: 0x0000fffff7bc6d68 libc.so.6`abort + 272
    frame #2: 0x0000000000400704 a.out`abort_caller at main.c:12:5
    frame #3: 0x000000000040074c a.out`main at main.c:23:5
    frame #4: 0x0000fffff7bc7090 libc.so.6`__libc_start_main + 232
    frame #5: 0x0000000000400614 a.out`_start + 52

We expect to get something like this, plus the signal handler when we hit a
breakpoint in the signal handler. What we actually get is:

(lldb) bt
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
  * frame #0: 0x00000000004006ec a.out`handler(sig=6) at main.c:7:5
    frame #1: 0x0000fffff7ffc5b8 [vdso]`__kernel_rt_sigreturn
    frame #2: 0x0000ffffffffeef8

If we examine the frame info in __kernel_rt_sigreturn:
(lldb) memory read -s 8 -f x -c 2 $fp
0xffffffffeec0: 0x0000ffffffffeed0 0x0000ffffffffeef8

We have a vaguely valid looking FP and a clearly invalid LR.

(lldb) memory region 0x0000ffffffffeef8
[0x0000fffffffdf000-0x0001000000000000) rw- [stack]

The FP does actually point to something that *looks* ok (whether by design or
accident).

(lldb) memory read -s 8 -f x -c 2 0x0000ffffffffeed0
0xffffffffeed0: 0x0000fffffffff000 0x0000fffff7bc6d68
(lldb) image lookup -a 0x0000fffff7bc6d68
      Address: libc.so.6[0x0000000000023d68] (libc.so.6.PT_LOAD[0]..text + 424)
      Summary: libc.so.6`abort + 272

We expect to get abort but we're missing some "raise" entry beacause of that
invalid LR.

Looking at kernel source they have not added cfi info to sgireturn:
https://github.com/torvalds/linux/blob/659caaf65dc9c7150aa3e80225ec6e66b25ab3ce/arch/arm64/kernel/vdso/sigreturn.S

And gdb/libunwind would be ignoring it anyway as what they do is assume that
the stack pointer in the sigreturn frame points to a signal context and you get
the register from there. See: https://reviews.llvm.org/D90898

We can see the pc we'd want by manually reading that memory:
(lldb) memory read -s 8 -f x $sp+304 $sp+304+512
0xffffffffdda0: 0x0000000000000000 0x0000000000000000
<...>
0xffffffffdea0: 0x0000ffffffffeed0 0x0000fffff7bda138
<...>

LLDB needs to have a specific unwind plan for this situation. The closest thing
at the moment is a comment in lldb/source/Target/RegisterContextUnwind.cpp.

// section, so prefer that if available. On other platforms we may need to
// provide a platform-specific UnwindPlan which encodes the details of how to
// unwind out of sigtramp.

So we do have a point in time where we can do that.

Why did this work on bionic? I'm not sure it ever did. I'm not sure that the
frame pointer of __kernel_rt_sigreturn is ever supposed to be valid. What I
think happened on Bionic is that it happened to point to a set of info that
points back into libc's raise.

But it's just memory re-use rather than a deliberate action to put that data
there. I can't prove this 100% but it seems odd that it would be broken between
libc releases, when the kernel stays the same. If anyone was going to setup
this information it would be the kernel.

Further to that, the frame addresses that we get on bionic look a bit off.

* thread #1, name = 'a.out', stop reason = breakpoint 1.1
  * frame #0: 0x00000000004006ac a.out`handler(sig=6) at main.c:7:5
    frame #1: 0x0000fffff7ffc5b8 [vdso]`__kernel_rt_sigreturn
    frame #2: 0x0000fffff7c38484 libc.so.6`__GI_raise at nptl-signals.h:68
    frame #3: 0x0000fffff7c38480 libc.so.6`__GI_raise(sig=6) at raise.c:40
    frame #4: 0x0000fffff7c398d4 libc.so.6`__GI_abort at abort.c:79
    frame #5: 0x00000000004006c8 a.out`abort_caller at main.c:12:5
<...>

(lldb) d -n __GI_raise
libc.so.6`:
<...>
    0xfffff7c38480 <+56>:  bl     0xfffff7c88f90            ; __GI_memcpy
    0xfffff7c38484 <+60>:  mov    x3, x0

I would expect to see some instruction after an svc. Not after a call to a
normal function.

GDB's backtrace shows a different source location for raise.

(gdb) bt
#0  handler (sig=6) at
/home/david.spickett/llvm-project/lldb/test/API/functionalities/signal/handle-abrt/main.c:7
#1  <signal handler called>
#2  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#3  0x0000fffff7c398d4 in __GI_abort () at abort.c:79

lldb is pointing to:
 39   sigset_t set;
 40   __libc_signal_block_app (&set);

gdb is pointing to:
 50   return ret;
 51 }

Seems like gdb is using this signal context strategy everywhere and is
therefore getting it correct on focal and bionic. lldb is reading from the
frame info and kinda getting something that looks right on bionic, failing
completely on focal.

Probably it was just chance that the sequence of calls on bionic pushed that
sort of valid frame info that we later read. When libc was updated in focal,
that sequence got shuffled around and we see the issue.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20211013/31bca47c/attachment-0001.html>