[Lldb-commits] [PATCH] Profile Assembly Until Ret Instruction

Fri Aug 22 18:37:46 PDT 2014

Hi Jason,

Updated patch attached.

Augmentation code is in UnwindAssembly-x86.cpp.

Current augmentation progress:
- Get eh_frame for the function.
- Inspect each instruction:
    * If there's an FDE for this instruction, use it.
    * Otherwise, see if this instruction will change how we locate CFA and
make corresponding changes.

The code assume there's prologue description.
It checks whether there's an FDE for the first instruction; if not,
augmentation is aborted and we fallback to assembly profiling.

I only tested it with i386 epilogue and pc relative addressing.
If you think it's OK, I will write a complete test.

Thanks for your time!

On Fri, Aug 22, 2014 at 9:58 AM, Todd Fiala <tfiala at google.com> wrote:

> Ok - holding off on checking this per communication with Tong.  Will see a
> new patch later today on this.
>
>
> On Fri, Aug 22, 2014 at 9:49 AM, Todd Fiala <tfiala at google.com> wrote:
>
>> Er I'll "get it" in...  eek..
>>
>>
>> On Fri, Aug 22, 2014 at 9:49 AM, Todd Fiala <tfiala at google.com> wrote:
>>
>>> I'm going to test this now.  If it all looks good, I'll ge tit in.
>>>
>>>
>>> On Tue, Aug 19, 2014 at 5:01 PM, Tong Shen <endlessroad at google.com>
>>> wrote:
>>>
>>>> Thanks Jason!
>>>> I will finish this patch and let's see how it goes.
>>>>
>>>> P.S. I know a little about eh_frame stuff; I added CFI to the new
>>>> Android ahead-of-time Java compiler so AOT'ed code can properly unwind :-)
>>>>
>>>>
>>>>
>>>> On Tue, Aug 19, 2014 at 4:51 PM, Jason Molenda <jmolenda at apple.com>
>>>> wrote:
>>>>
>>>>> The CIE sets the initial unwind state -- the CIE may describe the
>>>>> unwind state at the first instruction (as it always does with gcc, clang)
>>>>> but in theory it could describe the unwind state once the prologue had
>>>>> executed.
>>>>>
>>>>> The idea is that there is one CIE entry which describes a typical
>>>>> at-first-instruction unwind state and then many FDEs that describe the
>>>>> unwind instructions for specific functions - they all use that one CIE.
>>>>>
>>>>> Anyway, that's just an implementation detail of eh_frame.  I honestly
>>>>> don't think we should worry about incomplete eh_frame - let's try living on
>>>>> them and see how it works in practice.
>>>>>
>>>>> It may be possible to categorize eh_frame to see how complete it is.
>>>>> Compiler-generated x86 prologues are very regular, it would be possible to
>>>>> look at the first few bytes of a function for some pushes or stack pointer
>>>>> changes and see if the eh_frame describes that.  We know what the unwind
>>>>> state is on the first instruction of a function (it's determined by the
>>>>> ABI) -- does the eh_frame have the same instructions?  Can we can through
>>>>> the function for an epilogue, and if we find one, does the eh_frame have
>>>>> unwind instructions there?
>>>>>
>>>>> But I don't want to have the perfect be the enemy of the good.  IMO
>>>>> let's take the plunge and try, to use eh_frame and see how that goes.  We
>>>>> can refine it later, or back it out again (it will be a very small change
>>>>> to RegisterContextLLDB) if necessary.
>>>>>
>>>>>
>>>>> > On Aug 19, 2014, at 4:41 PM, Tong Shen <endlessroad at google.com>
>>>>> wrote:
>>>>> >
>>>>> > And for no prologue case:
>>>>> > We can detect this easily (any CFI for start address?) and bail out,
>>>>> so we will fallback to assembly profiler.
>>>>> >
>>>>> >
>>>>> > On Tue, Aug 19, 2014 at 4:36 PM, Tong Shen <endlessroad at google.com>
>>>>> wrote:
>>>>> > Ahh sorry I've been working on something else this week and didn't
>>>>> get back to you in time.
>>>>> > And you've been very patient and informative. Thanks!
>>>>> >
>>>>> > I'm only suggesting it for x86 / x86_64. What I am doing here relies
>>>>> on:
>>>>> > - Compiler describes prologue;
>>>>> > - We can figure our all mid function CFA changes by inspecting
>>>>> instructions.
>>>>> >
>>>>> > For frame 0, the new progress for CFA locating will look like this:
>>>>> > - Find the nearest CFI available before current PC.
>>>>> > - If the CFI is for current PC, viola :-) If not, continue.
>>>>> > - Inspect all instructions in between, and make changes to CFA
>>>>> accordingly. This can solve the PC relative addressing case.
>>>>> > - For epilogue, detect if we are in middle of an epilogue.
>>>>> Considering that there are not many patterns and they are all simple, I
>>>>> think we can enumerate them and handle accordingly.
>>>>> >
>>>>> > From what I've seen so far, this actually can solve most of
>>>>> gcc/clang generated code.
>>>>> > For JIT'ed code or hand written assembly, if there's no asynchronous
>>>>> CFI we are screwed anyway, so trying this won't hurt either (except some
>>>>> extra running time).\
>>>>> >
>>>>> > I hope I explain my thoughts clearly.
>>>>> >
>>>>> > Thank you.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Aug 19, 2014 at 4:22 PM, Jason Molenda <jmolenda at apple.com>
>>>>> wrote:
>>>>> > Hi Tong, my message was a little rambling.  Let's be specific.
>>>>> >
>>>>> > We are changing lldb to trust eh_frame instructions on the
>>>>> currently-executing aka 0th frame.
>>>>> >
>>>>> > In practice, gcc and clang eh_frame both describe the prologue, so
>>>>> this is OK.
>>>>> >
>>>>> > Old gcc and clang eh_frame do not describe the epilogue.  So we need
>>>>> to add a pass for i386/x86_64 (at least) to augment the eh_frame-sourced
>>>>> unwind instructions.  I don't know if it would be best to augment eh_frame
>>>>> UnwindPlans when we create them in DWARFCallFrameInfo or if it would be
>>>>> better to do it lazily when we are actually using the unwind instructions
>>>>> in RegisterContextLLDB (probably RegisterContextLLDB like you were doing).
>>>>> We should only do it once for a given function, of course.
>>>>> >
>>>>> > I think it would cleanest if the augmentation function lived in the
>>>>> UnwindAssembly class.  But I haven't looked how easy it is to get an
>>>>> UnwindAssembly object where we need it.
>>>>> >
>>>>> >
>>>>> > Thanks for taking this on.  It will be interesting to try living
>>>>> entirely off eh_frame and see how that works for all the
>>>>> architectures/environments lldb supports.
>>>>> >
>>>>> > I worry a little that we're depending on the generous eh_frame from
>>>>> clang/gcc and if we try to run on icc (Intel's compiler) or something like
>>>>> that, we may have no prologue instructions and stepping will work very
>>>>> poorly.  But we'll cross that bridge when we get to it.
>>>>> >
>>>>> >
>>>>> >
>>>>> > > On Aug 15, 2014, at 8:07 PM, Jason Molenda <jmolenda at apple.com>
>>>>> wrote:
>>>>> > >
>>>>> > > Hi Tong, sorry for the delay in replying.
>>>>> > >
>>>>> > > I have a couple thoughts about the patch.  First, the change in
>>>>> RegisterContextLLDB::GetFullUnwindPlanForFrame() forces the use of eh_frame
>>>>> unwind instructions ("UnwindPlanAtCallSite" - which normally means the
>>>>> eh_frame unwind instructions) for the currently-executing aka zeroth
>>>>> frame.  We've talked about this before, but it's worth noting that this
>>>>> patch includes that change.
>>>>> > >
>>>>> > > There's still the problem of detecting how *asynchronous* those
>>>>> eh_frame unwind instructions are.  For instance, what do you get for an
>>>>> i386 program that does
>>>>> > >
>>>>> > > #include <stdio.h>
>>>>> > > int main()
>>>>> > > {
>>>>> > >  puts ("HI");
>>>>> > > }
>>>>> > >
>>>>> > > Most codegen will use a sequence like
>>>>> > >
>>>>> > >  call LNextInstruction
>>>>> > > .LNextInstruction
>>>>> > >  pop ebx
>>>>> > >
>>>>> > > this call & pop sequence is establishing the "pic base", it the
>>>>> program will then use that address to find the "HI" constant data.  If you
>>>>> compile this -fomit-frame-pointer, so we have to use the stack pointer to
>>>>> find the CFA, do the eh_frame instructions describe this?
>>>>> > >
>>>>> > > It's a bit of an extreme example but it's one of those tricky
>>>>> cases where asynchronous ("accurate at every instruction") unwind
>>>>> instructions and synchronous ("accurate at places where we can throw an
>>>>> exception, or a callee can throw an exception") unwind instructions are
>>>>> different.
>>>>> > >
>>>>> > >
>>>>> > > I would use behaves_like_zeroth_frame instead of if
>>>>> (IsFrameZero()) because you can have a frame in the middle of the stack
>>>>> which was the zeroth frame when an asynchronous signal came in -- in which
>>>>> case, the "callee" stack frame will be sigtramp.
>>>>> > >
>>>>> > >
>>>>> > > You'd want to update the UnwindLogMsgVerbose() text, of course.
>>>>> > >
>>>>> > >
>>>>> > > What your DWARFCallFrameInfo::PatchUnwindPlanForX86() function is
>>>>> doing is assuming that the unwind plan fails to include an epilogue
>>>>> description, steps through all the instructions in the function looking for
>>>>> the epilogue.
>>>>> > >
>>>>> > > DWARFCallFrameInfo doesn't seem like the right place for this.
>>>>> There's an assumption that the instructions came from eh_frame and that
>>>>> they are incomplete.  It seems like it would more naturally live in the
>>>>> UnwindAssembly plugin and it would have a name like
>>>>> AugmentIncompleteUnwindPlanWithEpilogue or something like that.
>>>>> > >
>>>>> > > What if the CFI already does describe the epilogue?  I imagine
>>>>> we'll just end up with a doubling of UnwindPlan Rows that describe the
>>>>> epilogue instructions.
>>>>> > >
>>>>> > > What if we have a mid-function epilogue?  I've never seen
>>>>> gcc/clang generate these for x86, but it's possible.  It's a common code
>>>>> sequence on arm/arm64.  You can see a messy bit of code in
>>>>> UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly which
>>>>> handles these -- saving the UnwindPlan's unwind instructions when we see
>>>>> the beginning of an epilogue, and once the epilogue is complete, restoring
>>>>> the unwind instructions.
>>>>> > >
>>>>> > >
>>>>> > > I'm not opposed to the patch - but it does make the assumption
>>>>> that we're going to use eh_frame for the currently executing function and
>>>>> that the eh_frame instructions do not include a description of the
>>>>> epilogue.  (and that there is only one epilogue in the function).  Mostly I
>>>>> want to call all of those aspects out so we're clear what we're talking
>>>>> about here.  Let's clean it up a bit, put it in and see how it goes.
>>>>> > >
>>>>> > > J
>>>>> > >
>>>>> > >
>>>>> > >> On Aug 14, 2014, at 6:31 PM, Tong Shen <endlessroad at google.com>
>>>>> wrote:
>>>>> > >>
>>>>> > >> Hi Jason,
>>>>> > >>
>>>>> > >> Turns out we still need CFI for frame 0 in certain situations...
>>>>> > >>
>>>>> > >> A possible approach is to disassemble machine code, and manually
>>>>> adjust CFI for frame 0. For example, if we see "pop ebp; => ret", we set
>>>>> cfa to [esp]; if we see "call next-insn; => pop %ebp", we set cfa_offset+=4.
>>>>> > >>
>>>>> > >> Patch attached, now it just implements adjustment for "pop ebp;
>>>>> ret".
>>>>> > >>
>>>>> > >> If you think this approach is OK, I will go ahead and add other
>>>>> tricks(i386 pc relative addressing, more styles of epilogue, etc).
>>>>> > >>
>>>>> > >> Thank you for your time!
>>>>> > >>
>>>>> > >>
>>>>> > >> On Thu, Jul 31, 2014 at 12:50 PM, Tong Shen <
>>>>> endlessroad at google.com> wrote:
>>>>> > >> I think gdb's rationale for using CFI for leaf function is:
>>>>> > >> - gcc always generate CFI for progolue, so at function entry, we
>>>>> know the correct CFA;
>>>>> > >> - any stack pointer altering operation after that(mid-function &
>>>>> epilogue), we can recognize and handle them.
>>>>> > >> So basically, it assumes 2, hacks its way through 3 & 4, and
>>>>> pretends we are at 5.
>>>>> > >> Number of hacks we need seems to be small in x86 world, so this
>>>>> tradition is still here.
>>>>> > >>
>>>>> > >> Here's what gdb does for epilogue: normally when you run 'n', it
>>>>> will run one instruction a time till the next line/different stack id. But
>>>>> when it sees "pop %rbp; ret", it won't step into these instructions.
>>>>> Instead it will execute past them directly.
>>>>> > >> I didn't experiment with x86 pc-relative addressing; but I guess
>>>>> it will also recognize and execute past this pattern directly.
>>>>> > >>
>>>>> > >> So for compiler generated functions, what we do now with assembly
>>>>> parser now can be done with CFI + those gdb hacks.
>>>>> > >> And for hand-written assembly, i think CFI is almost always
>>>>> precise at instruction level. In this case, utilizing CFI instead of
>>>>> assembly parser will be a big help.
>>>>> > >>
>>>>> > >> So maybe we can apply those hacks, and trust CFI only for x86 &
>>>>> x86_64 targets?
>>>>> > >>
>>>>> > >>
>>>>> > >> On Thu, Jul 31, 2014 at 12:02 AM, Jason Molenda <
>>>>> jmolenda at apple.com> wrote:
>>>>> > >> I think we could think of five levels of eh_frame information:
>>>>> > >>
>>>>> > >>
>>>>> > >> 1 unwind instructions at exception throw locations & locations
>>>>> where a callee may throw an exception
>>>>> > >>
>>>>> > >> 2 unwind instructions that describe the prologue
>>>>> > >>
>>>>> > >> 3 unwind instructions that describe the epilogue at the end of
>>>>> the function
>>>>> > >>
>>>>> > >> 4 unwind instructions that describe mid-function epilogues (I see
>>>>> these on arm all the time, don't see them on x86 with compiler generated
>>>>> code - but we don't use eh_frame on arm at Apple, I'm just mentioning it
>>>>> for completeness)
>>>>> > >>
>>>>> > >> 5 unwind instructions that describe any changes mid-function
>>>>> needed to unwind at all instructions ("asynchronous unwind information")
>>>>> > >>
>>>>> > >>
>>>>> > >> The eh_frame section only guarantees #1.  gcc and clang always do
>>>>> #1 and #2.  Modern gcc's do #3.  I don't know if gcc would do #4 on arm but
>>>>> it's not important, I just mention it for completeness.  And no one does #5
>>>>> (as far as I know), even in the DWARF debug_frame section.
>>>>> > >>
>>>>> > >> I think it maybe possible to detect if an eh_frame entry fulfills
>>>>> #3 by looking if the CFA definition on the last row is the same as the
>>>>> initial CFA definition.  But I'm not sure how a debugger could use
>>>>> heuristics to determine much else.
>>>>> > >>
>>>>> > >>
>>>>> > >> In fact, detecting #3 may be the easiest thing to detect.  I'm
>>>>> not sure if the debugger could really detect #2 except maybe if the
>>>>> function had a standard prologue (push rbp, mov rsp rbp) and the eh_frame
>>>>> didn't describe the effects of these instructions, the debugger could know
>>>>> that the eh_frame does not describe the prologue.
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >>> On Jul 30, 2014, at 6:58 PM, Tong Shen <endlessroad at google.com>
>>>>> wrote:
>>>>> > >>>
>>>>> > >>> Ah I understand now.
>>>>> > >>>
>>>>> > >>> Now prologue seems always included in CFI fro gcc & clang; and
>>>>> newer gcc includes epilogue as well.
>>>>> > >>> Maybe we can detect and use them when they are available?
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> On Wed, Jul 30, 2014 at 6:44 PM, Jason Molenda <
>>>>> jmolenda at apple.com> wrote:
>>>>> > >>> Ah, it looks like gcc changed since I last looked at its
>>>>> eh_frame output.
>>>>> > >>>
>>>>> > >>> It's not a bug -- the eh_frame unwind instructions only need to
>>>>> be accurate at instructions where an exception can be thrown, or where a
>>>>> callee function can throw an exception.  There's no requirement to include
>>>>> prologue or epilogue instructions in the eh_frame.
>>>>> > >>>
>>>>> > >>> And unfortunately from lldb's perspective, when we see eh_frame
>>>>> we'll never know how descriptive it is.  If it's old-gcc or clang, it won't
>>>>> include epilogue instructions.  If it's from another compiler, it may not
>>>>> include any prologue/epilogue instructions at all.
>>>>> > >>>
>>>>> > >>> Maybe we could look over the UnwindPlan rows and see if the CFA
>>>>> definition of the last row matches the initial row's CFA definition.  That
>>>>> would show that the epilogue is described.  Unless it is a tail-call (aka
>>>>> noreturn) function - in which case the stack is never restored.
>>>>> > >>>
>>>>> > >>>
>>>>> > >>>
>>>>> > >>>
>>>>> > >>>> On Jul 30, 2014, at 6:32 PM, Tong Shen <endlessroad at google.com>
>>>>> wrote:
>>>>> > >>>>
>>>>> > >>>> GCC seems to generate a row for epilogue.
>>>>> > >>>> Do you think this is a clang bug, or at least a discrepancy
>>>>> between clang & gcc?
>>>>> > >>>>
>>>>> > >>>> Source:
>>>>> > >>>> int f() {
>>>>> > >>>>      puts("HI\n");
>>>>> > >>>>      return 5;
>>>>> > >>>> }
>>>>> > >>>>
>>>>> > >>>> Compile option: only -g
>>>>> > >>>>
>>>>> > >>>> gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)
>>>>> > >>>> clang version 3.5.0 (213114)
>>>>> > >>>>
>>>>> > >>>> Env: Ubuntu 14.04, x86_64
>>>>> > >>>>
>>>>> > >>>> drawfdump -F of clang binary:
>>>>> > >>>> <    2><0x00400530:0x00400559><f><fde offset 0x00000088 length:
>>>>> 0x0000001c><eh aug data len 0x0>
>>>>> > >>>>        0x00400530: <off cfa=08(r7) > <off r16=-8(cfa) >
>>>>> > >>>>        0x00400531: <off cfa=16(r7) > <off r6=-16(cfa) > <off
>>>>> r16=-8(cfa) >
>>>>> > >>>>        0x00400534: <off cfa=16(r6) > <off r6=-16(cfa) > <off
>>>>> r16=-8(cfa) >
>>>>> > >>>>
>>>>> > >>>> drawfdump -F of gcc binary:
>>>>> > >>>> <    1><0x0040052d:0x00400542><f><fde offset 0x00000070 length:
>>>>> 0x0000001c><eh aug data len 0x0>
>>>>> > >>>>        0x0040052d: <off cfa=08(r7) > <off r16=-8(cfa) >
>>>>> > >>>>        0x0040052e: <off cfa=16(r7) > <off r6=-16(cfa) > <off
>>>>> r16=-8(cfa) >
>>>>> > >>>>        0x00400531: <off cfa=16(r6) > <off r6=-16(cfa) > <off
>>>>> r16=-8(cfa) >
>>>>> > >>>>        0x00400541: <off cfa=08(r7) > <off r6=-16(cfa) > <off
>>>>> r16=-8(cfa) >
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>> On Wed, Jul 30, 2014 at 5:43 PM, Jason Molenda <
>>>>> jmolenda at apple.com> wrote:
>>>>> > >>>> I'm open to trying to trust eh_frame at frame 0 for x86_64.
>>>>> The lack of epilogue descriptions in eh_frame is the biggest problem here.
>>>>> > >>>>
>>>>> > >>>> When you "step" or "next" in the debugger, the debugger
>>>>> instruction steps across the source line until it gets to the next source
>>>>> line.  Every time it stops after an instruction step, it confirms that it
>>>>> is (1) between the start and end pc values for the source line, and (2)
>>>>> that the "stack id" (start address of the function + CFA address) is the
>>>>> same.  If it stops and the stack id has changed, for a "next" command, it
>>>>> will backtrace one stack frame to see if it stepped into a function.  If
>>>>> so, it sets a breakpoint on the return address and continues.
>>>>> > >>>>
>>>>> > >>>> If you switch lldb to prefer eh_frame instructions for x86_64,
>>>>> e.g.
>>>>> > >>>>
>>>>> > >>>> Index: source/Plugins/Process/Utility/RegisterContextLLDB.cpp
>>>>> > >>>>
>>>>> ===================================================================
>>>>> > >>>> --- source/Plugins/Process/Utility/RegisterContextLLDB.cpp
>>>>> (revision 214344)
>>>>> > >>>> +++ source/Plugins/Process/Utility/RegisterContextLLDB.cpp
>>>>> (working copy)
>>>>> > >>>> @@ -791,6 +791,22 @@
>>>>> > >>>>         }
>>>>> > >>>>     }
>>>>> > >>>>
>>>>> > >>>> +    // For x86_64 debugging, let's try using the eh_frame
>>>>> instructions even if this is the currently
>>>>> > >>>> +    // executing function (frame zero).
>>>>> > >>>> +    Target *target = exe_ctx.GetTargetPtr();
>>>>> > >>>> +    if (target
>>>>> > >>>> +        && (target->GetArchitecture().GetCore() ==
>>>>> ArchSpec::eCore_x86_64_x86_64h
>>>>> > >>>> +            || target->GetArchitecture().GetCore() ==
>>>>> ArchSpec::eCore_x86_64_x86_64))
>>>>> > >>>> +    {
>>>>> > >>>> +        unwind_plan_sp =
>>>>> func_unwinders_sp->GetUnwindPlanAtCallSite (m_current_offset_backed_up_one);
>>>>> > >>>> +        int valid_offset = -1;
>>>>> > >>>> +        if (IsUnwindPlanValidForCurrentPC(unwind_plan_sp,
>>>>> valid_offset))
>>>>> > >>>> +        {
>>>>> > >>>> +            UnwindLogMsgVerbose ("frame uses %s for full
>>>>> UnwindPlan, preferred over assembly profiling on x86_64",
>>>>> unwind_plan_sp->GetSourceName().GetCString());
>>>>> > >>>> +            return unwind_plan_sp;
>>>>> > >>>> +        }
>>>>> > >>>> +    }
>>>>> > >>>> +
>>>>> > >>>>     // Typically the NonCallSite UnwindPlan is the unwind
>>>>> created by inspecting the assembly language instructions
>>>>> > >>>>     if (behaves_like_zeroth_frame)
>>>>> > >>>>     {
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>> you'll find that you have to "next" twice to step out of a
>>>>> function.  Why?  With a simple function like:
>>>>> > >>>>
>>>>> > >>>> * thread #1: tid = 0xaf31e, 0x0000000100000eb9 a.out`foo + 25
>>>>> at a.c:5, queue = 'com.apple.main-thread', stop reason = step over
>>>>> > >>>>    #0: 0x0000000100000eb9 a.out`foo + 25 at a.c:5
>>>>> > >>>>   2    int foo ()
>>>>> > >>>>   3    {
>>>>> > >>>>   4        puts("HI");
>>>>> > >>>> -> 5        return 5;
>>>>> > >>>>   6    }
>>>>> > >>>>   7
>>>>> > >>>>   8    int bar ()
>>>>> > >>>> (lldb) disass
>>>>> > >>>> a.out`foo at a.c:3:
>>>>> > >>>>   0x100000ea0:  pushq  %rbp
>>>>> > >>>>   0x100000ea1:  movq   %rsp, %rbp
>>>>> > >>>>   0x100000ea4:  subq   $0x10, %rsp
>>>>> > >>>>   0x100000ea8:  leaq   0x6b(%rip), %rdi          ; "HI"
>>>>> > >>>>   0x100000eaf:  callq  0x100000efa               ; symbol stub
>>>>> for: puts
>>>>> > >>>>   0x100000eb4:  movl   $0x5, %ecx
>>>>> > >>>> -> 0x100000eb9:  movl   %eax, -0x4(%rbp)
>>>>> > >>>>   0x100000ebc:  movl   %ecx, %eax
>>>>> > >>>>   0x100000ebe:  addq   $0x10, %rsp
>>>>> > >>>>   0x100000ec2:  popq   %rbp
>>>>> > >>>>   0x100000ec3:  retq
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>> if you do "next" lldb will instruction step, comparing the
>>>>> stack ID at every stop, until it gets to 0x100000ec3 at which point the
>>>>> stack ID will change.  The CFA address (which the eh_frame tells us is
>>>>> rbp+16) just changed to the caller's CFA address because we're about to
>>>>> return.  The eh_frame instructions really need to tell us that the CFA is
>>>>> now rsp+8 at 0x100000ec3.
>>>>> > >>>>
>>>>> > >>>> The end result is that you need to "next" twice to step out of
>>>>> a function.
>>>>> > >>>>
>>>>> > >>>> AssemblyParse_x86 has a special bit where it looks or the 'ret'
>>>>> instruction sequence at the end of the function -
>>>>> > >>>>
>>>>> > >>>>   // Now look at the byte at the end of the AddressRange for a
>>>>> limited attempt at describing the
>>>>> > >>>>    // epilogue.  We're looking for the sequence
>>>>> > >>>>
>>>>> > >>>>    //  [ 0x5d ] mov %rbp, %rsp
>>>>> > >>>>    //  [ 0xc3 ] ret
>>>>> > >>>>    //  [ 0xe8 xx xx xx xx ] call __stack_chk_fail  (this is
>>>>> sometimes the final insn in the function)
>>>>> > >>>>
>>>>> > >>>>    // We want to add a Row describing how to unwind when we're
>>>>> stopped on the 'ret' instruction where the
>>>>> > >>>>    // CFA is no longer defined in terms of rbp, but is now
>>>>> defined in terms of rsp like on function entry.
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>> and adds an extra row of unwind details for that instruction.
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>> I mention x86_64 as being a possible good test case here
>>>>> because I worry about the i386 picbase sequence (call next-instruction; pop
>>>>> $ebx) which occurs a lot.  But for x86_64, my main concern is the epilogues.
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>> On Jul 30, 2014, at 2:52 PM, Tong Shen <endlessroad at google.com>
>>>>> wrote:
>>>>> > >>>>>
>>>>> > >>>>> Thanks Jason! That's a very informative post, clarify things a
>>>>> lot :-)
>>>>> > >>>>>
>>>>> > >>>>> Well I have to admit that my patch is specifically for certain
>>>>> kind of functions, and now I see that's not the general case.
>>>>> > >>>>>
>>>>> > >>>>> I did some experiment with gdb. gdb uses CFI for frame 0,
>>>>> either x86 or x86_64. It looks for FDE of frame 0, and do CFA calculations
>>>>> according to that.
>>>>> > >>>>>
>>>>> > >>>>> - For compiler generated functions: I think there are 2 usage
>>>>> scenarios for frame 0: breakpoint and signal.
>>>>> > >>>>>    - Breakpoints are usually at source line boundary instead
>>>>> of instruction boundary, and generally we won't be caught at stack pointer
>>>>> changing locations, so CFI is still valid.
>>>>> > >>>>>    - For signal, synchronous unwind table may not be
>>>>> sufficient here. But only stack changing instructions will cause incorrect
>>>>> CFA calculation, so it' not always the case.
>>>>> > >>>>> - For hand written assembly functions: from what I've seen,
>>>>> most of the time CFI is present and actually asynchronous.
>>>>> > >>>>> So it seems that in most cases, even with only synchronous
>>>>> unwind table, CFI is still correct.
>>>>> > >>>>>
>>>>> > >>>>> I believe we can trust eh_frame for frame 0 and use assembly
>>>>> profiling as fallback. If both failed, maybe code owner should use
>>>>> -fasynchronous-unwind-tables :-)
>>>>> > >>>>>
>>>>> > >>>>>
>>>>> > >>>>> On Tue, Jul 29, 2014 at 4:59 PM, Jason Molenda <
>>>>> jmolenda at apple.com> wrote:
>>>>> > >>>>> It was a tricky one and got lost in the shuffle of a busy
>>>>> week.  I was always reluctant to try profiling all the instructions in a
>>>>> function.  On x86, compiler generated code (gcc/clang anyway) is very
>>>>> simplistic about setting up the stack frame at the start and only having
>>>>> one epilogue - so anything fancier risked making mistakes and could
>>>>> possibly have a performance impact as we run functions through the
>>>>> disassembler.
>>>>> > >>>>>
>>>>> > >>>>> For hand-written assembly functions (which can be very
>>>>> creative with their prologue/epilogue and where it is placed), my position
>>>>> is that they should write eh_frame instructions in their assembly source to
>>>>> tell lldb where to find things.  There is one or two libraries on Mac OS X
>>>>> where we break the "ignore eh_frame for the currently executing function"
>>>>> because there are many hand-written assembly functions in there and the
>>>>> eh_frame is going to beat our own analysis.
>>>>> > >>>>>
>>>>> > >>>>>
>>>>> > >>>>> After I wrote the x86 unwinder, Greg and Caroline implemented
>>>>> the arm unwinder where it emulates every instruction in the function
>>>>> looking for prologue/epilogue instructions.  We haven't seen it having a
>>>>> particularly bad impact performance-wise (lldb only does this disassembly
>>>>> for functions that it finds on stacks during an execution run, and it saves
>>>>> the result so it won't re-compute it for a given function).  The clang
>>>>> armv7 codegen often has mid-function epilogues (early returns) which
>>>>> definitely complicated things and made it necessary to step through the
>>>>> entire function bodies.  There's a bunch of code I added to support these
>>>>> mid-function epilogues - I have to save the register save state when I see
>>>>> an instruction which looks like an epilogue, and when I see the final ret
>>>>> instruction (aka restoring the saved lr contents into pc), I re-install the
>>>>> register save state from before the epilogue started.
>>>>> > >>>>>
>>>>> > >>>>> These things always make me a little nervous because the
>>>>> instruction analyzer obviously is doing a static analysis so it knows
>>>>> nothing about flow control.  Tong's patch stops when it sees the first CALL
>>>>> instruction - but that's not right, that's just solving the problem for his
>>>>> particular function which doesn't have any CALL instructions before his
>>>>> prologue. :) You could imagine a function which saves a couple of
>>>>> registers, calls another function, then saves a couple more because it
>>>>> needs more scratch registers.
>>>>> > >>>>>
>>>>> > >>>>> If we're going to change to profiling deep into the function
>>>>> -- and I'm not opposed to doing that, it's been fine on arm -- we should
>>>>> just do the entire function I think.
>>>>> > >>>>>
>>>>> > >>>>>
>>>>> > >>>>> Another alternative would be to trust eh_frame on x86_64 at
>>>>> frame 0.  This is one of those things where there's not a great solution.
>>>>> The unwind instructions in eh_frame are only guaranteed to be accurate for
>>>>> synchronous unwinds -- that is, they are only guaranteed to be accurate at
>>>>> places where an exception could be thrown - at call sites.  So for
>>>>> instances, there's no reason why the compiler has to describe the function
>>>>> prologue instructions at all.  There's no requirement that the eh_frame
>>>>> instructions describe the epilogue instructions.  The information about
>>>>> spilled registers only needs to be emitted where we could throw an
>>>>> exception, or where a callee could throw an exception.
>>>>> > >>>>>
>>>>> > >>>>> clang/gcc both emit detailed instructions for the prologue
>>>>> setup.  But for i386 codegen if the compiler needs to access some
>>>>> pc-relative data, it will do a "call next-instruction; pop %eax" to get the
>>>>> current pc value.  (x86_64 has rip-relative addressing so this isn't
>>>>> needed)  If you're debugging -fomit-frame-pointer code, that means your CFA
>>>>> is expressed in terms of the stack pointer and the stack pointer just
>>>>> changed mid-function --- and eh_frame instructions don't describe this.
>>>>> > >>>>>
>>>>> > >>>>> The end result: If you want accurate unwinds 100% of the time,
>>>>> you can't rely on the unwind instructions from eh_frame.  But they'll get
>>>>> you accurate unwinds 99.9% of the time ...  also, last I checked, neither
>>>>> clang nor gcc describe the epilogue instructions.
>>>>> > >>>>>
>>>>> > >>>>>
>>>>> > >>>>> In *theory* the unwind instructions from the DWARF debug_frame
>>>>> section should be asynchronous -- they should describe how to find the CFA
>>>>> address for every instruction in the function.  Which makes sense - you
>>>>> want eh_frame to be compact because it's bundled into the executable, so it
>>>>> should only have the information necessary for exception handling and you
>>>>> can put the verbose stuff in debug_frame DWARF for debuggers.  But instead
>>>>> (again, last time I checked), the compilers put the exact same thing in
>>>>> debug_frame even if you use the -fasynchronous-unwind-tables (or whatever
>>>>> that switch was) option.
>>>>> > >>>>>
>>>>> > >>>>>
>>>>> > >>>>> So I don't know, maybe we should just start trusting eh_frame
>>>>> at frame 0 and write off those .1% cases where it isn't correct instead of
>>>>> trying to get too fancy with the assembly analysis code.
>>>>> > >>>>>
>>>>> > >>>>>
>>>>> > >>>>>
>>>>> > >>>>>> On Jul 29, 2014, at 4:17 PM, Todd Fiala <tfiala at google.com>
>>>>> wrote:
>>>>> > >>>>>>
>>>>> > >>>>>> Hey Jason,
>>>>> > >>>>>>
>>>>> > >>>>>> Do you have any feedback on this?
>>>>> > >>>>>>
>>>>> > >>>>>> Thanks!
>>>>> > >>>>>>
>>>>> > >>>>>> -Todd
>>>>> > >>>>>>
>>>>> > >>>>>>
>>>>> > >>>>>> On Fri, Jul 25, 2014 at 1:42 PM, Tong Shen <
>>>>> endlessroad at google.com> wrote:
>>>>> > >>>>>> Sorry, wrong version of patch...
>>>>> > >>>>>>
>>>>> > >>>>>>
>>>>> > >>>>>> On Fri, Jul 25, 2014 at 1:41 PM, Tong Shen <
>>>>> endlessroad at google.com> wrote:
>>>>> > >>>>>> Hi Molenda, lldb-commits,
>>>>> > >>>>>>
>>>>> > >>>>>> For now, x86 assembly profiler will stop after 10
>>>>> "non-prologue" instructions. In practice it may not be sufficient. For
>>>>> example, we have a hand-written assembly function, which have hundreds of
>>>>> instruction before actual (stack-adjusting) prologue instructions.
>>>>> > >>>>>>
>>>>> > >>>>>> One way is to change the limit to 1000; but there will always
>>>>> be functions that break the limit :-) I believe the right thing to do here
>>>>> is parsing all instructions before "ret"/"call" as prologue instructions.
>>>>> > >>>>>>
>>>>> > >>>>>> Here's what I changed:
>>>>> > >>>>>> - For "push %rbx" and "mov %rbx, -8(%rbp)": only add first
>>>>> row for that register. They may appear multiple times in function body. But
>>>>> as long as one of them appears, first appearance should be in prologue(If
>>>>> it's not in prologue, this function will not use %rbx, so these 2
>>>>> instructions should not appear at all).
>>>>> > >>>>>> - Also monitor "add %rsp 0x20".
>>>>> > >>>>>> - Remove non prologue instruction count.
>>>>> > >>>>>> - Add "call" instruction detection, and stop parsing after it.
>>>>> > >>>>>>
>>>>> > >>>>>> Thanks.
>>>>> > >>>>>>
>>>>> > >>>>>> --
>>>>> > >>>>>> Best Regards, Tong Shen
>>>>> > >>>>>>
>>>>> > >>>>>>
>>>>> > >>>>>>
>>>>> > >>>>>> --
>>>>> > >>>>>> Best Regards, Tong Shen
>>>>> > >>>>>>
>>>>> > >>>>>> _______________________________________________
>>>>> > >>>>>> lldb-commits mailing list
>>>>> > >>>>>> lldb-commits at cs.uiuc.edu
>>>>> > >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>> > >>>>>>
>>>>> > >>>>>>
>>>>> > >>>>>>
>>>>> > >>>>>>
>>>>> > >>>>>> --
>>>>> > >>>>>> Todd Fiala |   Software Engineer |     tfiala at google.com |
>>>>>    650-943-3180
>>>>> > >>>>>>
>>>>> > >>>>>
>>>>> > >>>>>
>>>>> > >>>>>
>>>>> > >>>>>
>>>>> > >>>>> --
>>>>> > >>>>> Best Regards, Tong Shen
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>> --
>>>>> > >>>> Best Regards, Tong Shen
>>>>> > >>>
>>>>> > >>>
>>>>> > >>>
>>>>> > >>>
>>>>> > >>> --
>>>>> > >>> Best Regards, Tong Shen
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >> --
>>>>> > >> Best Regards, Tong Shen
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >> --
>>>>> > >> Best Regards, Tong Shen
>>>>> > >> <adjust_cfi_for_frame_zero.patch>
>>>>> > >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Best Regards, Tong Shen
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Best Regards, Tong Shen
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Tong Shen
>>>>
>>>> _______________________________________________
>>>> lldb-commits mailing list
>>>> lldb-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>
>>>>
>>>
>>>
>>> --
>>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>>
>>
>>
>>
>> --
>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>
>
>
>
> --
> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>

-- 
Best Regards, Tong Shen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20140822/8dae06a2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: augment_eh_frame.patch
Type: text/x-patch
Size: 20912 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20140822/8dae06a2/attachment.bin>