[lldb-dev] Inquiry regarding AddOneMoreFrame function in UnWindLLDB

Tue May 31 15:27:26 PDT 2016

> On May 31, 2016, at 11:31 AM, jingham at apple.com wrote:
> 
> 
>> On May 31, 2016, at 12:52 AM, Ravitheja Addepally via lldb-dev <lldb-dev at lists.llvm.org> wrote:
>> 
>> Hello,
>>      I posted this query a while ago, i still have no answers, I am currently working on Bug 27687 (PrintStackTraces), so the reason for the failure is the erroneous unwinding of the frames from the zeroth frame. The error is not detected in AddOneMoreFrame, since it only checks for 2 more frames, if it was checking more frames in AddOneMoreFrame, it would have detected the error. Now my questions are ->
>> 
>> ->  is that is there any specific reason for only checking 2 frames instead of more ?
> 
> The stepping machinery uses the unwinder on each stop to figure out whether it has stepped in or out, which is fairly performance sensitive, so we don't want AddOneMoreFrame to do more work than it has to.  

Most common case for a bad unwind, where the unwinder is stuck in a loop, is a single stack frame repeating.  I've seen loops as much as six frames repeating (which are not actually a series of recursive calls) but it's less common.

> 
>> ->  Why no make the EH CFI based unwinder the default one and make the assembly the fallback ?

Sources of unwind information fall into two categories.  They can describe the unwind state at every instruction of a function (asynchronous) or they can describe the unwind state only at function call boundaries (synchronous).

Think of "asynchronous" here as the fact that the debugger can interrupt the program at any point in time.

Most unwind information is designed for exception handling -- it is synchronous, it can only throw an exception in the body of the function, or an exception is passed up through it when it is calling another function.  

For exception handling, there is no need/requirement to describe the prologue or epilogue instructions, for instance.

eh_frame (and DWARF's debug_frame from which it derives) splits the difference and makes things quite unclear.  It is guaranteed to be correct for exception handling -- it is synchronous, and is valid in the middle of the function and when it is calling other functions -- but it is a general format that CAN be asynchronous if the emitter includes information about the prologue or epilogue or mid-function stack changes.  But eh_frame is not guaranteed to be that way, and in fact there's no way for it to indicate what it describes, beyond the required unwind info for exception handling.

On x86, gcc and clang have always described the prologue unwind info in their eh_frame.  gcc has recently started describing the epilogue too (clang does not).  There's code in lldb (e.g. UnwindAssembly_x86::AugmentUnwindPlanFromCallSite) written by Tong Shen when interning at Google which will try to detect if the eh_frame describes the prologue and epilogue.  If it does, it will use eh_frame for frame 0.  If it only describes the prologue, it will use the instruction emulation code to add epilogue instructions and use that at frame 0.

There are other sources of unwind information similar to eh_frame that are only for exception handling.  Tamas added ArmUnwindInfo last year which reads the .ARM.exidx unwind tables.  I added compact unwind importing - an Apple specific format that uses a single 4-byte word to describe the unwind state for each function, which can't describe anything in the prologue/epilogue.  These formats definitely can't be used to unwind at frame 0 because we could be stopped anywhere in the prologue/epilogue where they are not accurate.

It's unfortunate that eh_frame doesn't include a way for the producer to declare how async the unwind info is, it makes the debugger's job a lot more difficult.

J