[lldb-dev] regarding [Bug 15671] New: backtrace truncated after assertion failure in inferior

Jason Molenda jmolenda at apple.com
Wed Jul 17 21:31:54 PDT 2013


Hi Ashok, I apologize for taking so long to get back to you on this radar.  There are a lot of corner cases handled in RegisterContextLLDB and I wanted to look it over carefully before I said anything.

This patch is fine.  

+            if (eh_frame->GetUnwindPlan (m_current_pc, *unwind_plan_sp))
+            {
+                m_frame_type = eSkipFrame; // no symbol context, but we can use eh_frame to get back on track.
+                return unwind_plan_sp;
+            }


I wouldn't use eSkipFrame - wouldn't eNormalFrame work?  eSkipFrame was intended to indicate a frame that is known to be invalid, an artifact of following the frame-unwind chain via the architectural default unwind plans.  In your case, you have a function with fixed bounds and full unwind information -- you only lack a function name.

A more ambitious solution here would be to have the ObjectFile/SymbolFile ingest the function address ranges from eh_frame and supplement the symbol table with those additional functions, making up names.  

I understand why doing this (at initial ObjectFile creation time) is a performance hit on ELF systems - on Mac OS X we have a section in Mach-O with a compact/fast to parse function start addresses (our LC_FUNCTION_STARTS load command) so doing this unconditionally at ObjectFile creation time makes sense.  

The best solution on an ELF system would have lldb get to this point where it's unwinding through a dylib, can't find a symbol for a pc value, CAN find an eh_frame entry for it -- and asks the ObjectFile to supplement its symbol table with eh_frame entries and then uses those.

But I'm not going to ask you to make a change that big - your change is fine and I don't see any problems happening because of it.  I'd recommend trying to use eNormalFrame, that's the only change I'd suggest.


On Jul 12, 2013, at 1:29 PM, Thirumurthi, Ashok <ashok.thirumurthi at intel.com> wrote:

> Ping!
> 
> FYI Jason, I verified that the original patch (attached again) continues to apply cleanly and resolve the failure in functionalities/inferior-assert with SVN trunk.
> 
> - Ashok
> 
> -----Original Message-----
> From: lldb-dev-bounces at cs.uiuc.edu [mailto:lldb-dev-bounces at cs.uiuc.edu] On Behalf Of Thirumurthi, Ashok
> Sent: Wednesday, June 12, 2013 12:34 PM
> To: Jason Molenda
> Cc: lldb-dev at cs.uiuc.edu
> Subject: Re: [lldb-dev] regarding [Bug 15671] New: backtrace truncated after assertion failure in inferior
> 
>> Hi Ashok, thanks for working on this -- I know the unwinder code can 
>> be a hard to modify, RegisterContextLLDB.cpp is a little complex in 
>> places. :/
> For sure, Jason, thanks for the sophisticated unwinder.
> 
> 
>> A recent change to ObjectFileMachO is that it also gets the function start addresses from the eh_frame information if LC_FUNCTION_STARTS doesn't exist:
> Nice, I see how that's an advantage in spite of the performance hit.  I'll certainly look at reworking ObjectFileELF to add the function symbols for stripped symbols from the eh_frame information.
> 
> 
>> Let me know what you think.
> Perhaps the best approach is to do both.  Having my suggested new code path in the unwinder isn't fundamentally wrong or a performance concern.  In contrast, it does unblock Linux core file support and a high-profile bug for a common use case.  I think it also improves the applicability of the unwinder while looking for improvements in other object-file formats (i.e. ObjectFilePECOFF).
> 
> If you like the idea, I'm happy to commit & improve,
> 
> - Ashok
> 
> 
> On Jun 7, 2013, at 11:46 AM, "Thirumurthi, Ashok" <ashok.thirumurthi at intel.com> wrote:
> 
>> Hi Jason,
>> 
>>> Frame 2 did not get a valid CFA for this frame, stopping stack walk
>> So, the attached patch allows the unwinder to get past frame 2 using eh_frame information that is dug up based on the pc rather than the start address of the function (i.e. to handle the case where the function symbol is unavailable).
>> 
>> This fix is coupled with GetFullUnwindPlanForFrame rather than lowered to UnwindTable and FuncUnwinders.  Alternately, I could add or modify routines like GetFuncUnwindersContainingAddress to avoid the requirement for a SymbolContext.  Similarly, I could add or modify routines like GetUnwindPlanAtCallSite to allow the caller to specify a pc.
>> 
>> The attached patch also slides m_current_pc in the case where a Symbol is found at pc - 1.  Note that the log while adding frame 2 indicates a bogus fp:
>> th1/fr2 supplying caller's register 6 from the stack, saved at CFA plus offset
>>  th1/fr3 fp = 0x00000000004006db
>> 
>> The slide keeps me out of the weeds while adding frame 3 (see the attached log).  The combined result is a healthy stack:
>> 
>> (lldb) bt
>> * thread #1: tid = 0x2987, 0x00007ffba7b23425 libc.so.6`raise + 53, stop reason = signal SIGABRT
>>    frame #0: 0x00007ffba7b23425 libc.so.6`raise + 53
>>    frame #1: 0x00007ffba7b26b8b libc.so.6`abort + 379
>>    frame #2: 0x00007ffba7b1c0ee libc.so.6
>>    frame #3: 0x00007ffba7b1c192 libc.so.6`__assert_fail + 66
>>    frame #4: 0x00000000004005c0 a.out`main(argc=1, argv=0x00007fff1ccfbd68) + 112 at main.c:18
>>    frame #5: 0x00007ffba7b0e76d libc.so.6`__libc_start_main + 237
>>    frame #6: 0x0000000000400489 a.out`_start + 41
>> 
>> Perhaps it would be helpful to provide a slightly different entry for frame #2 like:
>>    frame #2: 0x00007ffba7b1c0ee libc.so.6`??? + offset
>> 
>> For now, I set eSkipFrame which is documented as a frame state that indicates that the unwinder found issues and is hoping to recover. Perhaps a new value would better document the fact that the frame goes with a function with no known symbol.  
>> 
>> I'll commit this patch by next Monday since this is an important use 
>> case for lldb 3.3 (and I assume that WDC is all encompassing for a 
>> bit), but do fire away with any feedback.  Cheers,
>> 
>> - Ashok
>> 
>> 
>> -----Original Message-----
>> From: lldb-dev-bounces at cs.uiuc.edu
>> [mailto:lldb-dev-bounces at cs.uiuc.edu] On Behalf Of Thirumurthi, Ashok
>> Sent: Tuesday, May 28, 2013 10:52 AM
>> To: lldb-dev at cs.uiuc.edu
>> Subject: Re: [lldb-dev] regarding [Bug 15671] New: backtrace truncated 
>> after assertion failure in inferior
>> 
>> FYI, gdb can identify the frame addresses for/relative to mystery frame 2 while at the assert site:
>> 
>> (gdb) f 2
>> #2  0x00007ffff7a4a0ee in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> 
>> (gdb) info frame
>> Stack level 2, frame at 0x7fffffffdee0:
>> rip = 0x7ffff7a4a0ee; saved rip 0x7ffff7a4a192  called by frame at 0x7fffffffdf10, caller of frame at 0x7fffffffde80  Arglist at 0x7fffffffde78, args:
>> Locals at 0x7fffffffde78, Previous frame's sp is 0x7fffffffdee0  Saved registers:
>> rbx at 0x7fffffffdec0, rbp at 0x7fffffffdec8, r12 at 0x7fffffffded0, 
>> rip at 0x7fffffffded8
>> 
>> - Ashok
>> 
>> -----Original Message-----
>> From: lldb-dev-bounces at cs.uiuc.edu
>> [mailto:lldb-dev-bounces at cs.uiuc.edu] On Behalf Of Thirumurthi, Ashok
>> Sent: Monday, May 27, 2013 5:09 PM
>> To: lldb-dev at cs.uiuc.edu
>> Subject: Re: [lldb-dev] regarding [Bug 15671] New: backtrace truncated 
>> after assertion failure in inferior
>> 
>> Hi Jason,
>> 
>> So, this thread is still relevant and reproducible using functionalities/inferior-asserting on platforms where libc.so is compiled with -fomit-frame-pointer.
>> 
>>>>> The only solution I can think of here is if abort()'s eh_frame does provide a saved location for rbp but lldb failed to read it correctly.  Else, I have no idea how gdb managed to unwind out of this one.
>> 
>> FYI, the routine RegisterContextLLDB::InitializeNoneZerothFrame calls ReadGPRValue for active_row->GetCFARegister(), which allows m_cfa to be set for frame 1 'abort'.  When this routine runs for the mystery frame 2, m_sym_ctx.GetAddressRange comes up empty handed (consistent with gdb's backtrace), so addr_range.GetBaseAddress() is not valid.  As a result, m_current_offset is -1, and this routine returns before m_cfa is read, resulting in an invalid frame.
>> 
>> 
>>> But in this particular backtrace we've got -fomit-frame-pointer frames using eh_frame, then one function that doesn't have any symbol name or eh_frame entry, and I honestly have no idea how gdb found its way out of that one.  
>> 
>> Even if the function for frame 2 doesn't have a symbol name, is it possible that it has an eh_frame entry that we can use?
>> 
>> 
>>>>> The only reasonable approach here would be to assume that this frame used a frame pointer (rbp), grab the saved rbp value and try to find the caller's pc based on that -- but that failed.
>> 
>> So, I see the code that executes to handle the case where a function ends with a call instruction, which backs up the PC by one byte. However, ResolveSymbolContextForAddress fails, and SymbolContext::GetAddressRange comes up empty handed because the member function is 0, so addr_range is not set by this code.
>> 
>> Without a function symbol, is there a way to set m_current_offset so 
>> that ReadGPRRegister can read the saved rbp for frame 2?  Thanks,
>> 
>> - Ashok
>> 
>> 
>> -----Original Message-----
>> From: lldb-dev-bounces at cs.uiuc.edu
>> [mailto:lldb-dev-bounces at cs.uiuc.edu] On Behalf Of Langmuir, Ben
>> Sent: Monday, April 08, 2013 10:12 AM
>> To: Luddy Harrison; Jason Molenda
>> Cc: lldb-dev at cs.uiuc.edu
>> Subject: Re: [lldb-dev] regarding [Bug 15671] New: backtrace truncated 
>> after assertion failure in inferior
>> 
>> I've updated bugzilla with the output of image show-unwind -n abort.  I couldn't attach the output of readelf -wf libc.so.6 (too big) - is there a way to only show info about the abort function?  The name 'abort' doesn't appear in the output.
>> 
>> Ben
>> 
>> -----Original Message-----
>> From: Luddy Harrison [mailto:luddy.harrison at gmail.com]
>> Sent: Monday, April 08, 2013 6:18 AM
>> To: Jason Molenda
>> Cc: Langmuir, Ben; lldb-dev at cs.uiuc.edu
>> Subject: Re: [lldb-dev] regarding [Bug 15671] New: backtrace truncated 
>> after assertion failure in inferior
>> 
>> hi, just to clarify, I regularly write asm with no eh frames or fonction bounds, no .cfi.   gdb unwinds my leaf funtions fine.  it is my impression that gdb will in the absence of frame info assume that the topmost item on the stack at a trap is a return pc (even though the trapped pc cannot be identified and has invalid rbp, so disasm of the leaf itself is not possible
>> 
>> put differently if one can't figure out the leaf one can grope for the return pc on the stack and try again at the caller.  if the teturn pc points just after a plausible-looking call insn then you're good.   hope that makes sense...
>> 
>> Sent from my iPhone
>> 
>> On 8 Apr, 2013, at 17:43, Jason Molenda <jason at molenda.com> wrote:
>> 
>>> Yeah, lldb uses similar tricks.  If you have eh_frame instructions, unwinding from -fomit-frame-pointer code is easy.  And if you have accurate function bounds for all the frames, lldb can usually manage to unwind an -fomit-frame-pointer stack without eh_frame (because it inspects the actual assembly instructions in the prologue to understand the stack setup).  But in this particular backtrace we've got -fomit-frame-pointer frames using eh_frame, then one function that doesn't have any symbol name or eh_frame entry, and I honestly have no idea how gdb found its way out of that one.  The only reasonable approach here would be to assume that this frame used a frame pointer (rbp), grab the saved rbp value and try to find the caller's pc based on that -- but that failed.
>>> 
>>> Well, maybe the additional information from Ben (the eh_frame instructions for abort() most importantly) will provide a hint.  The only thing I can think is that maybe lldb misinterpreted that function's eh_frame instructions.
>>> 
>>> J
>>> 
>>> 
>>> On Apr 8, 2013, at 1:20 AM, Luddy Harrison wrote:
>>> 
>>>> having done lots of asm debugging with gdb, I can offer a guess.  gdb seems to able to unwind frameless leaf functions with no unwind info.   so perhaps as a final fallback it pops the top entry on the stack and treats it as the return pc.  if it can unwind the caller using that pc, the it is good.
>>>> 
>>>> just a guess...
>>>> 
>>>> -Luddy
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On 8 Apr, 2013, at 6:01, Jason Molenda <jason at molenda.com> wrote:
>>>> 
>>>>> I see what's going on here.
>>>>> 
>>>>> /lib/x86_64-linux-gnu/libc.so.6 was built -fomit-frame-pointer, and 
>>>>> it includes eh_frame instructions on how to unwind the frames.  But 
>>>>> when lldb gets to
>>>>> 
>>>>> #2  0x00007ffff7a4a0ee in ?? () from
>>>>> /lib/x86_64-linux-gnu/libc.so.6
>>>>> 
>>>>> it doesn't have any eh_frame instructions.  lldb can figure out the stack pointer value (from frame 1) which tells us the "bottom" of this stack frame but it can't find the "top" without eh_frame unwind instructions or knowing what function it is in so it can do an assembly instruction scan to understand how the stack frame was set up.  lldb tries to get a saved frame pointer (rbp) which would give us the "top" of the stack frame but the saved rbp value it gets (0x40067e0) is obviously invalid.
>>>>> 
>>>>> It might be interesting to see the output of
>>>>> 
>>>>> image show-unwind -n abort
>>>>> 
>>>>> to see exactly what the eh_frame instructions read (this is lldb's 
>>>>> interpretation of the eh_frame instructions, of course, it might be 
>>>>> useful to include the output of readelf -wf libc.so.6 or readelf 
>>>>> -wF
>>>>> libc.so.6 for the abort() function, going by a web page for readelf 
>>>>> I found on the web.)  The log output included this,
>>>>> 
>>>>> th1/fr0 supplying caller's saved reg 16's location, cached
>>>>> th1/fr1 requested caller's saved PC but this UnwindPlan uses a RA 
>>>>> reg; getting reg 16 instead
>>>>> th1/fr1 supplying caller's saved reg 16's location using eh_frame 
>>>>> CFI UnwindPlan
>>>>> th1/fr1 supplying caller's register 16 from the stack, saved at CFA 
>>>>> plus offset
>>>>> th1/fr2 pc = 0x00007f216e4850ee
>>>>> 
>>>>> That bit about "this UnwindPlan uses a RA reg" is novel for x86 code, it's normally you see in arm code where the caller's saved pc value is in the link register on a function call.  But as you'd guess from the name abort(), this may have the caller's register context saved in an unusual way so this may be fine.
>>>>> 
>>>>> I'm surprised gdb can unwind this successfully.
>>>>> 
>>>>> As I alluded to above, lldb can profile the assembly language instructions of a function to understand the prologue setup (where registers are saved, how the stack is set up, etc.) -- but to do this, it needs to know the start address of the function.  This "#2  0x00007ffff7a4a0ee in ?? ()" frame clearly doesn't have any symbolic information with its address range so lldb can't do its assembly scan.  And it doesn't have eh_frame instructions to help either.
>>>>> 
>>>>> On Mac OS X we're often working with binaries that have had most of their symbols stripped.  Because it is so valuable to lldb to have accurate function ranges, we supplement the symbol table with two sources:  The LC_FUNCTION_STARTS section, and barring that (this is new), the eh_frame section.  LC_FUNCTION_STARTS is an array of LEB128 encoded offsets of all the start addresses of the functions in the file.  The first function is at offset 0, etc.  It's real compact, typically a few bytes per function. The eh_frame section is another great source of function bounds information but it tends to be larger and slower to parse through. lldb adds fake symbol names for these function ranges that it adds, e.g. a fake symbol added to the program Dock might be "__lldb_unnamed_function3491$$Dock".
>>>>> 
>>>>> Of course, given that lldb couldn't find eh_frame instructions for "#2  0x00007ffff7a4a0ee in ?? ()", maybe even that wouldn't have helped.
>>>>> 
>>>>> 
>>>>> The only solution I can think of here is if abort()'s eh_frame does provide a saved location for rbp but lldb failed to read it correctly.  Else, I have no idea how gdb managed to unwind out of this one.
>>>>> 
>>>>> 
>>>>> On Apr 7, 2013, at 5:46 AM, Langmuir, Ben wrote:
>>>>> 
>>>>>> Done.
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Jason Molenda [mailto:jason at molenda.com]
>>>>>> Sent: Sunday, April 07, 2013 5:50 AM
>>>>>> To: Langmuir, Ben
>>>>>> Subject: regarding [Bug 15671] New: backtrace truncated after 
>>>>>> assertion failure in inferior
>>>>>> 
>>>>>> I don't know if I have a bugzilla account on llvm.org (I should 
>>>>>> but I don't know what password it might have) but I wanted to ask 
>>>>>> you to do
>>>>>> 
>>>>>> (lldb) log enable lldb unwind
>>>>>> (lldb) run
>>>>>> (lldb) bt
>>>>>> 
>>>>>> 
>>>>>> and attach that output to
>>>>>> http://llvm.org/bugs/show_bug.cgi?id=15671
>>>>>> 
>>>>>> lldb should use a DefaultUnwindPlan for frame 2 ("?? ()" in gdb's backtrace) to continue the unwind.  I don't have linux installed on any devices so I haven't looked but the output will probably be a good clue as to why the unwind stopped early.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> J
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> lldb-dev mailing list
>>>>> lldb-dev at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>> 
>> 
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>> 
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>> 
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>> <pr15671.patch><unwind-full.txt>
> 
> 
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> <pr15671.patch>




More information about the lldb-dev mailing list