[Lldb-commits] [PATCH] D86417: [lldb] do not propagate eTrapHandlerFrame repeatedly

Tue Aug 25 15:45:49 PDT 2020

jasonmolenda added a comment.

Hi, I'll review this problem / suggested patch in a bit but I think it might be helpful to outline what the intention of all this is.  Let's say we have a stack of

  frame #0 - handler_func()
  frame #1 - magic_function_called_from_kernel()
  frame #2 - doing_work()

`doing_work()` was executing along when an asynchronous signal came in - a SIGINT or something.  The kernel stops it, saves its register context (all the registers) into a buffer on the stack, and invokes `magic_function_called_from_kernel`.  m_f_c_f_k calls the `handler_func` that the program registered as the handler for SIGINT earlier.

In this situation, there are two *special* things about `doing_work` that the unwinder must do.

First, normally when you are off of the currently-executing stack frame (frame 0), only registers that are callee-spilled aka non-volatile are retrievable.  For instance, on the x86 SysV ABI, rdi is an argument register.  In frame 0, you can print rdi.  On frame #1, rdi may have been modified since it called frame 0, so lldb won't let you retrieve it.  However, when a stack frame has been interrupted asynchronously and we have a full saved register context, we can retrieve any register.  Printing stack frame 2's $rdi is possible and valid.  frame #2 behaves like a zeroth stack frame, because the next frame -- frame #1, magic_function_called_from_kernel, is a trap handler and has the full register context.

The second thing that is different about frame #2 is how we look up the source line and symbol context.  When we set the source line & block scope & function scope searching for a frame up the stack, we subtract 1 from the return-pc value.  Why do we subtract 1?  Two important reasons.  First, a function may end by calling a no-return function like abort() and the x86 ABI will push the return address on the stack with the next pc value.  But if the callq abort is the last instruction, then the return address points to the next function!  Also, there are cases (often with optimized code) where a variable is in a register up until the function call (and is still retrievable) but after the function call, it may be considered dead because it's a different code path through the function.  So doing source line / block scope / function scope / location list lookups with $return-pc - 1 is very important.  However, when we interrupt a function asynchronously, like `doing_work` here, we may be legitimately on the very first instruction of `doing_work`.  Backing up the pc by 1 would have us claiming that we're in the *previous* function.

So this is why we have all this code for computing behaves_like_zeroth_frame in these areas.  In my original design, I conflated the two important points here:  That we have a full saved register context and that the function may have been interrupted asynchronously, so we need to treat this as if it's a currently-executing stack frame, even though it's in the middle of the stack.  If either of these is not true:  if the NextFrame (frame #1 in this case) does NOT have a full register context, or this frame (frame #2) was NOT interrupted asynchronously, then treating frame #2 as behaves_like_zeroth_frame is incorrect and will lead to bugs.

Repository:
  rLLDB LLDB

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86417/new/

https://reviews.llvm.org/D86417