[lldb-dev] Stepping into function generates EXC_BAD_INSTRUCTION signal

Mario Zechner badlogicgames at gmail.com
Mon Dec 1 08:13:38 PST 2014


I think i understand the issue now.
ThreadPlanStepRange::SetNextBranchBreakpoint
is falsely selecting the blne instruction instead of the it instruction.
The condition is not meet, so the CPU jumps over the instruction after it.
Since we have a trap there that's 2 bytes long, it will end up at 0x27b2ec
(PC after 2 byte trap instruction) instead of 0x27b2ee (PC after 4 byte
blne). So the CPU ends up in the middle of the blne instruction, which is
of course not a valid instruction.

I guess the next thing i have to figure out is why the it instruction isn't
marked as a branch instruction, which is why it isn't selected by
ThreadPlanStepRange::SetNextBranchBreakpoint as the next branch breakpoint.

On Mon, Dec 1, 2014 at 4:59 PM, Mario Zechner <badlogicgames at gmail.com>
wrote:

> I traced through ThreadPlanStepRange and ThreadPlanStepRange for this
> piece of code:
>
> 0x27b2d4 <[J]java.lang.Object.<init>()V>: push   {r7, lr}
>
> 0x27b2d6 <[J]java.lang.Object.<init>()V+2>: mov    r7, sp
>
> 0x27b2d8 <[J]java.lang.Object.<init>()V+4>: sub    sp, #0x4
>
> 0x27b2da <[J]java.lang.Object.<init>()V+6>: movs   r2, #0x0
>
> 0x27b2dc <[J]java.lang.Object.<init>()V+8>: str    r2, [sp]
>
> 0x27b2de <[J]java.lang.Object.<init>()V+10>: str    r1, [sp]
>
> 0x27b2e0 <[J]java.lang.Object.<init>()V+12>: ldr    r2, [r1]
>
> 0x27b2e2 <[J]java.lang.Object.<init>()V+14>: ldr    r2, [r2, #0x30]
>
> 0x27b2e4 <[J]java.lang.Object.<init>()V+16>: tst.w  r2, #0x100000
>
> 0x27b2e8 <[J]java.lang.Object.<init>()V+20>: it     ne
>
> 0x27b2ea <[J]java.lang.Object.<init>()V+22>: blne   0x466290
>     ; _bcRegisterFinalizer
>
> 0x27b2ee <[J]java.lang.Object.<init>()V+26>: add    sp, #0x4
>
> 0x27b2f0 <[J]java.lang.Object.<init>()V+28>: pop    {r7, pc}
>
> 0x27b2f2 <[J]java.lang.Object.<init>()V+30>: nop
>
>
> Execution is halted at 0x27b2e0 when i issue a source-level step. The
> ThreadPlanStepRange::DidPush method sets up a breakpoint at 0x27b2ea (2
> bytes) successfully after identifying the instruction at 0x27b2ea (blne) as
> the next branch instruction in ThreadPlanStepRange
> ::SetNextBranchBreakpoint.
>
> Next, the threads are then resumed by the command interpreter. We receive
> an event from the inferior with stop reason eStopReasonException
> (EXC_BAD_INSTRUCTION) right after the resume, stopping the process.
>
> I guess this means i need to figure out how "it" and "blne" work together
> (my ARM assembler knowledge is minimal) to then understand why the
> breakpoint instruction that's written to the inferior results in a
> EXC_BAD_INSTRUCTION. If someone knows what could be the culprit let me know
> :)
>
> Thanks,
>
> Mario
>
> On Mon, Dec 1, 2014 at 2:07 PM, Mario Zechner <badlogicgames at gmail.com>
> wrote:
>
>> Well, i wrote a very long mail detailing my journey to resolve issue #2
>> (hanging after setting target.use-fast-stepping=false), only to eventually
>> realize that it doesn't hang but instead just waits for the above loop to
>> complete.
>>
>> This means turning off target.use-fast-stepping is not an option and i'm
>> back to square one. I'd be grateful for any pointers on how to fix issue #1
>> (EXC_BAD_INSTRUCTION). I guess i'll start by investigating the "run to
>> next branch" stepping algorithm in LLDB, though my understanding is
>> likely not sufficient to make a dent.
>>
>> Thanks,
>> Mario
>>
>>
>>
>>
>> On Mon, Dec 1, 2014 at 11:05 AM, Mario Zechner <badlogicgames at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> setting target.use-fast-stepping to false did indeed solve this issue,
>>> albeit at the cost of increased runtime obviously. However, i ran into
>>> another issue right after i stepped out of the previously problematic
>>> function: http://sht.tl/bdAKRC
>>>
>>> Trying to source-level step this function (with use-fast-stepping=false)
>>> results in 1) the disassembly getting all kinds of messed up and 2) the
>>> process not stepping but hanging at the `cmp r1, #0` instruction. The
>>> original assembly code around that PC looks like this:
>>>
>>> LBB24_1:                                @ %label0
>>>                                         @ =>This Inner Loop Header:
>>> Depth=1
>>> @DEBUG_VALUE:
>>> [J]java.lang.Thread.<init>(Ljava/lang/Runnable;Ljava/lang/String;)V:__$env
>>> <- R5
>>> ldrexd r1, r2, [r0]
>>> strexd r1, r6, r6, [r0]
>>> cmp r1, #0
>>> bne LBB24_1
>>> @ BB#2:                                 @ %label0
>>> @DEBUG_VALUE:
>>> [J]java.lang.Thread.<init>(Ljava/lang/Runnable;Ljava/lang/String;)V:__$env
>>> <- R5
>>> dmb ish
>>> movs r1, #5
>>>
>>> A simple loop, which is actually part of an inlined function. We had
>>> some issues with inlined functions previously, i assume this issue is
>>> related. Interestingly enough, the back trace is also a bit wonky:
>>>
>>> (lldb) bt
>>>
>>> * thread #1: tid = 0x18082, 0x0021a9b4
>>> AttachTestIOSDev`[J]java.lang.Thread.<init>(Ljava/lang/Runnable;Ljava/lang/String;)V
>>> [inlined] [j]java.lang.Thread.threadPtr(J)[set] + 14 at Thread.java:1, stop
>>> reason = trace
>>>
>>>   * frame #0: 0x0021a9b4
>>> AttachTestIOSDev`[J]java.lang.Thread.<init>(Ljava/lang/Runnable;Ljava/lang/String;)V
>>> [inlined] [j]java.lang.Thread.threadPtr(J)[set] + 14 at Thread.java:1
>>>
>>>     frame #1: 0x0021a9a6
>>> AttachTestIOSDev`[J]java.lang.Thread.<init>(__$env=0x01662fc8,
>>> __$this=0x64da3833, runnable=0xa4f07400, threadName=0x00286000)V + 46 at
>>> Thread.java:138
>>> There should be a lot more frame. I'm gonna try to dig up some more
>>> details.
>>>
>>> Thanks a lot!
>>> Mario
>>>
>>>
>>>
>>> On Sun, Nov 30, 2014 at 1:32 AM, Jason Molenda <jason at molenda.com>
>>> wrote:
>>>
>>>> The size of the breakpoint instruction is set by
>>>> GetSoftwareBreakpointTrapOpcode().  In your case, most likely you're in
>>>> PlatformDarwin::GetSoftwareBreakpointTrapOpcode() - lldb uses the symbol
>>>> table (from the binary file) to determine if the code in a given function
>>>> is arm or thumb.  If it's arm, a 4 byte breakpoint is used.  If it's thumb,
>>>> a 2 byte breakpoint.  Of course thumbv2 of T32 instructions can be 4 bytes
>>>> -- the blne instruction is in your program -- but I assume the 2 byte
>>>> breakpoint instruction still works correctly in these cases; the cpu sees
>>>> the 2-byte instruction and stops execution.
>>>>
>>>> I am a little wary about the fact that this comes after an it
>>>> instruction, I kind of vaguely remember issues with that instruction's
>>>> behavior.
>>>>
>>>> It shouldn't make any difference but you might want to try
>>>>
>>>> (lldb) settings set target.use-fast-stepping false
>>>>
>>>> which will force lldb to single instruction step through the function.
>>>> Right now lldb is looking at the instruction stream and putting breakpoints
>>>> on branch/call/jump instructions to do your high-level "step" command,
>>>> instead of stopping on every instruction.  It is possible there could be a
>>>> problem with that approach and the it instruction.  Please report back if
>>>> this changes the behavior.
>>>>
>>>> J
>>>>
>>>>
>>>> > On Nov 26, 2014, at 9:22 AM, Mario Zechner <badlogicgames at gmail.com>
>>>> wrote:
>>>> >
>>>> > I dug a little deeper, inspecting the GDB remote packets send by LLDB
>>>> to perform the stepping. It appears when sending memory breakpoint commands
>>>> used for stepping, the size of the instruction being replaced isn't taken
>>>> into account, or writing back the original instruction isn't done properly.
>>>> The following log shows what happens when stepping into the previously
>>>> mentioned function:
>>>> >
>>>> > (lldb) s
>>>> > Process 166 stopped
>>>> > * thread #1: tid = 0x0fd9, 0x002602e0
>>>> AttachTestIOSDev`[J]java.lang.Object.<init>(__$env=0x016bffc8,
>>>> __$this=0x017864b0)V + 12 at Object.java:136, queue =
>>>> 'com.apple.main-thread', stop reason = step in
>>>> >     frame #0: 0x002602e0
>>>> AttachTestIOSDev`[J]java.lang.Object.<init>(__$env=0x016bffc8,
>>>> __$this=0x017864b0)V + 12 at Object.java:136
>>>> > (lldb) disassemble -p
>>>> > AttachTestIOSDev`[J]java.lang.Object.<init>()V + 12 at
>>>> Object.java:136:
>>>> > -> 0x2602e0:  ldr    r2, [r1]
>>>> >    0x2602e2:  ldr    r2, [r2, #0x30]
>>>> >    0x2602e4:  tst.w  r2, #0x100000
>>>> >    0x2602e8:  it     ne
>>>> > (lldb) s
>>>> > Process 166 stopped
>>>> > * thread #1: tid = 0x0fd9, 0x002602ec
>>>> AttachTestIOSDev`[J]java.lang.Object.<init>(__$env=0x016bffc8,
>>>> __$this=0x017864b0)V + 24 at Object.java:136, queue =
>>>> 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION
>>>> (code=EXC_ARM_UNDEFINED, subcode=0xffd1b001)
>>>> >     frame #0: 0x002602ec
>>>> AttachTestIOSDev`[J]java.lang.Object.<init>(__$env=0x016bffc8,
>>>> __$this=0x017864b0)V + 24 at Object.java:136
>>>> > (lldb) disassemble -p
>>>> > AttachTestIOSDev`[J]java.lang.Object.<init>()V + 24 at
>>>> Object.java:136:
>>>> > -> 0x2602ec:  .long  0xb001ffd1                ; unknown opcode
>>>> >    0x2602f0:  pop    {r7, pc}
>>>> >
>>>> > AttachTestIOSDev`[J]java.lang.Object.<init>()V + 30:
>>>> >    0x2602f2:  nop
>>>> >
>>>> > AttachTestIOSDev`[J]java.lang.Object.clone()Ljava/lang/Object; at
>>>> Object.java:154:
>>>> >    0x2602f4:  push   {r4, r5, r7, lr}
>>>> > (lldb) disassemble -f
>>>> > AttachTestIOSDev`[J]java.lang.Object.<init>()V at Object.java:136:
>>>> >    0x2602d4:  push   {r7, lr}
>>>> >    0x2602d6:  mov    r7, sp
>>>> >    0x2602d8:  sub    sp, #0x4
>>>> >    0x2602da:  movs   r2, #0x0
>>>> >    0x2602dc:  str    r2, [sp]
>>>> >    0x2602de:  str    r1, [sp]
>>>> >    0x2602e0:  ldr    r2, [r1]
>>>> >    0x2602e2:  ldr    r2, [r2, #0x30]
>>>> >    0x2602e4:  tst.w  r2, #0x100000
>>>> >    0x2602e8:  it     ne
>>>> >    0x2602ea:  blne   0x44b290                  ; _bcRegisterFinalizer
>>>> >    0x2602ee:  add    sp, #0x4
>>>> >    0x2602f0:  pop    {r7, pc}
>>>> >
>>>> > AttachTestIOSDev`[J]java.lang.Object.<init>()V + 30:
>>>> >    0x2602f2:  nop
>>>> >
>>>> > The first step succeeds and ends up right after the prologue, at
>>>> 0x2602e0:  ldr    r2, [r1]. The next step ends up at 0x2602ec:  .long
>>>> 0xb001ffd1 which is wrong, it should be 0x2602ea:  blne   0x44b290.
>>>> >
>>>> > The GDB remote conversation between lldb and the debugserver on the
>>>> device (only relevant parts):
>>>> >
>>>> > # First step
>>>> > lldb->debugserver: $Z0,2602e0,2#73
>>>> > debugserver->lldb: $OK#00
>>>> > lldb->debugserver: $vCont;c:0fd9#15
>>>> > debugserver->lldb: (320)
>>>> $T05thread:fd9;qaddr:37ebfad0;threads:fd9,ffa,ffb,ffd,fff,1009,100a,100b;00:c8ff6b01;01:b0647801;02:00000000;03:c87d6a00;04:00000000;05:c8ff6b01;06:fc6a6501;07:0c6a6501;08:90e96b01;09:28000000;0a:74a0ea37;0b:c8ff6b01;0c:b09e5b00;0d:086a6501;0e:d1b22000;0f:
>>>> >
>>>> > # Second step
>>>> > lldb->debugserver: $Z0,2602ea,2#a4
>>>> > debugserver->lldb: $OK#00
>>>> > lldb->debugserver: $vCont;c:0fd9#15
>>>> > debugserver->lldb: (324)
>>>> $T92thread:fd9;qaddr:37ebfad0;threads:fd9,ffa,ffb,ffd,fff,1009,100a,100b;00:c8ff6b01;01:b0647801;02:01004300;03:c87d6a00;04:00000000;05:c8ff6b01;06:fc6a6501;07:0c6a6501;08:90e96b01;09:28000000;0a:74a0ea37;0b:c8ff6b01;0c:b09e5b00;0d:086a6501;0e:d1b22000;0f:
>>>> >
>>>> > For the first step, a 2 byte memory breakpoint is written to 0x2602e0
>>>> ($Z0,2602e0,2#73), which is where the first step ended up. The instruction
>>>> that got replaced is 2 bytes long. The GDB command wrote a 2 bytes memory
>>>> breakpoint to the address, so all is good.
>>>> >
>>>> > For the second step, a 2 byte memory breakpoint is written to
>>>> 0x2602ea ($Z0,2602ea,2#a4). But instead of ending up at 0x2602ec, which is
>>>> in the middle of the 4-byte blne instruction.
>>>> >
>>>> > Is it correct for LLDB to set a 2 byte memory breakpoint instead of a
>>>> 4-byte memory breakpoint in this case? The PC will be set to an invalid
>>>> address, which then causes the EXC_BAD_INSTRUCTION.
>>>> >
>>>> > Am i understanding this correctly? Is there a way for me to fix this?
>>>> >
>>>> > On Wed, Nov 26, 2014 at 5:26 PM, Mario Zechner <
>>>> badlogicgames at gmail.com> wrote:
>>>> > Hi,
>>>> >
>>>> > we generate thumbv7 binaries for iOS devices. We deploy, launch and
>>>> debug those via LLDB. Stepping into functions seems to almost always
>>>> generate a EXC_BAD_INSTRUCTION signal. The signal is not generated when
>>>> running the app without the debugger attached. It is also not generated
>>>> when we attach a debugger, but simply let the app run without breakpoints
>>>> or any stepping.
>>>> >
>>>> > Here's one of these function's LLVM IR:
>>>> >
>>>> > =======================
>>>> > define external void @"[J]java.lang.Object.<init>()V"(%Env* %p0,
>>>> %Object* %p1) nounwind noinline optsize {
>>>> > label0:
>>>> >     call void @"llvm.dbg.declare"(metadata !{%Env* %p0}, metadata
>>>> !19), !dbg !{i32 136, i32 0, metadata !{i32 786478, metadata !0, metadata
>>>> !1, metadata !"[J]java.lang.Object.<init>()V", metadata
>>>> !"[J]java.lang.Object.<init>()V", metadata !"", i32 136, metadata !15, i1
>>>> false, i1 true, i32 0, i32 0, null, i32 256, i1 false, void (%Env*,
>>>> %Object*)* @"[J]java.lang.Object.<init>()V", null, null, metadata !17, i32
>>>> 136}, null}
>>>> >     %r0 = alloca %Object*
>>>> >     store %Object* null, %Object** %r0
>>>> >     call void @"llvm.dbg.declare"(metadata !{%Object** %r0}, metadata
>>>> !21), !dbg !{i32 136, i32 0, metadata !14, null}
>>>> >     store %Object* %p1, %Object** %r0
>>>> >     call void @"register_finalizable"(%Env* %p0, %Object* %p1), !dbg
>>>> !{i32 136, i32 0, metadata !18, null}
>>>> >     ret void, !dbg !{i32 136, i32 0, metadata !18, null}
>>>> > }
>>>> > =======================
>>>> >
>>>> > The corresponding thumbv7 assembler code as generated by LLVM:
>>>> >
>>>> > =======================
>>>> >       .globl  "_[J]java.lang.Object.<init>()V"
>>>> >       .align  2
>>>> >       .code   16                      @
>>>> @"[J]java.lang.Object.<init>()V"
>>>> >       .thumb_func     "_[J]java.lang.Object.<init>()V"
>>>> > "_[J]java.lang.Object.<init>()V":
>>>> >       .cfi_startproc
>>>> > Lfunc_begin18:
>>>> >       .loc    1 136 0                 @ Object.java:136:0
>>>> > @ BB#0:                                 @ %label0
>>>> >       .loc    1 136 0                 @ Object.java:136:0
>>>> >       push    {r7, lr}
>>>> >       mov     r7, sp
>>>> >       sub     sp, #4
>>>> >       @DEBUG_VALUE: [J]java.lang.Object.<init>()V:__$env <- R0
>>>> >       movs    r2, #0
>>>> >       str     r2, [sp]
>>>> >       str     r1, [sp]
>>>> >       .loc    1 136 0 prologue_end    @ Object.java:136:0
>>>> > Ltmp6:
>>>> >       ldr     r2, [r1]
>>>> >       ldr     r2, [r2, #48]
>>>> >       tst.w   r2, #1048576
>>>> > Ltmp7:
>>>> >       @DEBUG_VALUE: [J]java.lang.Object.<init>()V:__$env <- R0
>>>> >       it      ne
>>>> >       blxne   __bcRegisterFinalizer
>>>> >       add     sp, #4
>>>> >       pop     {r7, pc}
>>>> > Ltmp8:
>>>> > Lfunc_end18:
>>>> > "L_[J]java.lang.Object.<init>()V_end":
>>>> >
>>>> >       .cfi_endproc
>>>> > =======================
>>>> >
>>>> > Now, when stepping into this function, LLDB receives a signal from
>>>> the debug server:
>>>> >
>>>> > =======================
>>>> > (lldb) s
>>>> > Process 176 stopped
>>>> > * thread #1: tid = 0x11f5, 0x0023e2ec
>>>> AttachTestIOSDev`[J]java.lang.Object.<init>(__$env=0x0169efc8,
>>>> __$this=0x0174cd10)V + 24 at Object.java:136, queue =
>>>> 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION
>>>> (code=EXC_ARM_UNDEFINED, subcode=0xffd1b001)
>>>> >     frame #0: 0x0023e2ec
>>>> AttachTestIOSDev`[J]java.lang.Object.<init>(__$env=0x0169efc8,
>>>> __$this=0x0174cd10)V + 24 at Object.java:136
>>>> > =======================
>>>> >
>>>> > Disassembling around the PC gives:
>>>> >
>>>> > =======================
>>>> > (lldb) disassemble --pc
>>>> > AttachTestIOSDev`[J]java.lang.Object.<init>()V + 24 at
>>>> Object.java:136:
>>>> > -> 0x23e2ec:  .long  0xb001ffd1                ; unknown opcode
>>>> >    0x23e2f0:  pop    {r7, pc}
>>>> >
>>>> > AttachTestIOSDev`[J]java.lang.Object.<init>()V + 30:
>>>> >    0x23e2f2:  nop
>>>> >
>>>> > Disassembling until the beginning of the frame gives:
>>>> >
>>>> > (lldb) disassemble -f
>>>> > AttachTestIOSDev`[J]java.lang.Object.<init>()V at Object.java:136:
>>>> >    0x23e2d4:  push   {r7, lr}
>>>> >    0x23e2d6:  mov    r7, sp
>>>> >    0x23e2d8:  sub    sp, #0x4
>>>> >    0x23e2da:  movs   r2, #0x0
>>>> >    0x23e2dc:  str    r2, [sp]
>>>> >    0x23e2de:  str    r1, [sp]
>>>> >    0x23e2e0:  ldr    r2, [r1]
>>>> >    0x23e2e2:  ldr    r2, [r2, #0x30]
>>>> >    0x23e2e4:  tst.w  r2, #0x100000
>>>> >    0x23e2e8:  it     ne
>>>> >    0x23e2ea:  blne   0x429290                  ; _bcRegisterFinalizer
>>>> >    0x23e2ee:  add    sp, #0x4
>>>> >    0x23e2f0:  pop    {r7, pc}
>>>> >
>>>> > Accprding to this, execution should never end up at address 0x23e2ec.
>>>> That's right in the middle of the blne and add instructions in the second
>>>> disassembly. I have a hunch that the debugserver on the device may
>>>> interfere here, e.g. add a trap instruction to implement the stepping. I'm
>>>> not quite sure what to make of it.
>>>> >
>>>> > I'd appreciate any hints. If you require more information, i got
>>>> plenty of logs :)
>>>> >
>>>> > Thanks,
>>>> > Mario
>>>> >
>>>> > _______________________________________________
>>>> > lldb-dev mailing list
>>>> > lldb-dev at cs.uiuc.edu
>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20141201/82b122e4/attachment.html>


More information about the lldb-dev mailing list