[Lldb-commits] Instruction emulation of arm64 'stp d8, d9, [sp, #-0x70]!' style instruction

Wed Oct 12 18:38:11 PDT 2016

Yeah, it's incorrectly grabbing the stack pointer reg number (31) from the Rn bits and using that as the register # being saved, instead of the Rt and Rt2 register numbers, and saying that v31 is being pushed twice.  It's an easy bug in EmulateInstructionARM64::EmulateLDPSTP but fixing that just presents us with the next problem so I haven't fixed it.

When I connect to debugserver on an arm64 device, it tells lldb about the pseudo regs on the device, like

  <reg name="x0" regnum="0" offset="0" bitsize="64" group="general" altname="arg1" group_id="1" ehframe_regnum="0" dwarf_regnum="0" generic="arg1" invalidate_regnums="0,34"/>

  <reg name="w0" regnum="34" offset="0" bitsize="32" group="general" group_id="1" value_regnums="0" invalidate_regnums="0,34"/>

  <reg name="v0" regnum="63" offset="268" bitsize="128" group="vector" type="float" altname="q0" encoding="vector" format="vector-uint8" group_id="2" dwarf_regnum="64" invalidate_regnums="63,129,97"/>

  <reg name="s0" regnum="97" offset="268" bitsize="32" group="float" type="float" encoding="ieee754" format="float" group_id="2" value_regnums="63" invalidate_regnums="63,129,97"/>

  <reg name="d0" regnum="129" offset="268" bitsize="64" group="float" type="float" encoding="ieee754" format="float" group_id="2" value_regnums="63" invalidate_regnums="63,129,97"/>

(this is the reply to the "qXfer:features:read:target.xml" packet that lldb sends to debugserver at the startup of the debug session) so the user can refer to these by name:

(lldb) reg read -f x s0
      s0 = 0x404e809d
(lldb) reg read -f x d0
      d0 = 0x41bdaf19404e809d
(lldb) reg read -f x v0
      v0 = 0x000000000000000041bdaf19404e809d
(lldb) 

J

> On Oct 12, 2016, at 6:23 PM, Tamas Berghammer <tberghammer at google.com> wrote:
> 
> I am a bit surprised that we don't define the smaller floating point registers on AArch64 the same way we do it on x86/x86_64/Arm (have no idea about MIPS) and I would consider this as a bug what we should fix (will gave it a try when I have a few free cycles) because currently it is very difficult to get a 64bit double value out of the data we are displaying.
> 
> Considering the fact that based on the ABI only the bottom 64bit have to be saved I start to like the idea of teaching our unwinder about the smaller floating point registers because that will mean that we can inspect float and double variables from the parent stack frame while we never show some corrupted data for variables occupying the full 128 bit vector register.
> 
> "...the unwind plan generated currently records this as v31 being saved...": I don't understand this part. The code never references v31 nor d31 so if the unwind plan records it as we saved v31 then something is clearly wrong there. Does it treat all d0-d31 register as v31 because of some instruction decoding reason?
> 
> Tamas
> 
> On Wed, Oct 12, 2016 at 4:30 PM Jason Molenda <jmolenda at apple.com> wrote:
> Hi Tamas, sorry for that last email being a mess, I was doing something else while writing it and only saw how unclear it was after I sent it.  You understood everything I was trying to say.
> 
> I looked at the AAPCS64 again.  It says v8-v15 are callee saved, but says, "Additionally, only the bottom 64-bits of each value stored in v8-v15 need to be preserved[1]; it is the responsibility of the caller to preserve larger values." so we're never going to have a full 128-bit value that we can get off the stack (for targets following this ABI).
> 
> DWARF can get away with only defining register numbers for v0-v31 because their use in DWARF always includes a byte size.  Having lldb register numbers and using that numbering scheme instead of DWARF is an interesting one.  It looks like all of the arm64 targets are using the definitions from Plugins/Process/Utility/RegisterInfos_arm64.h.  It also only defines v0-v31 but the x86_64 RegisterInfos file defines pseudo registers ("eax") which are a subset of real registers ("rax") - maybe we could do something like that.
> 
> I'm looking at this right now, but fwiw I wrote a small C program that uses enough fp registers that the compiler will spill all of the preserved registers (d8-d15) when I compile it with clang and -Os.  I'll use this for testing
> 
> 
> 
> I get a prologue like
> 
> a.out`foo:
> a.out[0x100007a08] <+0>:   0x6dba3bef   stp    d15, d14, [sp, #-96]!
> a.out[0x100007a0c] <+4>:   0x6d0133ed   stp    d13, d12, [sp, #16]
> a.out[0x100007a10] <+8>:   0x6d022beb   stp    d11, d10, [sp, #32]
> a.out[0x100007a14] <+12>:  0x6d0323e9   stp    d9, d8, [sp, #48]
> a.out[0x100007a18] <+16>:  0xa9046ffc   stp    x28, x27, [sp, #64]
> a.out[0x100007a1c] <+20>:  0xa9057bfd   stp    x29, x30, [sp, #80]
> a.out[0x100007a20] <+24>:  0x910143fd   add    x29, sp, #80              ; =80
> 
> 
> and the unwind plan generated currently records this as v31 being saved in the first instruction and ignores the others (because that reg has already been saved).  That's the easy bug - but it's a good test case that I'll strip down into a unit test once we get this figured out.
> 
> 
> J
> 
> 
> > On Oct 12, 2016, at 2:15 PM, Tamas Berghammer <tberghammer at google.com> wrote:
> >
> > Hi Jason,
> >
> > Thank you for adding unit test for this code. I think the current implementation doesn't fail terribly on 16 vs 32 byte stack alignment because we use the "opc" from the instruction to calculate the write back address (to adjust the SP) so having the wrong size of the register won't effect that part.
> >
> > Regarding the issue about s0/d0/v0 I don't have a perfect solution but here are some ideas:
> > * Use a different register numbering schema then DWARF (e.g. LLDB/GDB/???). I think it would only require us to change EmulateInstructionARM64::CreateFunctionEntryUnwind to create an UnwindPlan with a different register numbering schema and then to lookup the register based on a different numbering schema using GetRegisterInfo in functions where we want to reference s0-s31/d0-d31. In theory this should be a simple change but I won't be surprised if changing the register numbering breaks something.
> > * Introduce the concept of register pieces. This concept already exists in the DWARF expressions where you can say that a given part of the register is saved at a given location. I expect that doing it won't be trivial but it would solve both this problem and would improve the way we understand DWARF as well.
> >
> > About your idea about saying that we saved v8&v9 even though we only saved d8&d9: I think it is a good quick and dirty solution but it has a few issues what is hard to solve. Most importantly we will lie to the user when they read out v8&v9 what will contain some garbage data. Regarding big/little endian we should be able to detect which one the inferior uses and we can do different thing based on that (decide the location of v8&v9) but it will make the hack even worth.
> >
> > Tamas
> >
> > On Tue, Oct 11, 2016 at 6:15 PM Jason Molenda <jmolenda at apple.com> wrote:
> > Hi Tamas, I'm writing some unit tests for the unwind source generators - x86 last week, arm64 this week, and I noticed with this prologue:
> >
> >
> >
> > JavaScriptCore`JSC::B3::reduceDoubleToFloat:
> >
> >     0x192b45c0c <+0>:  0x6db923e9   stp    d9, d8, [sp, #-0x70]!
> >
> >     0x192b45c10 <+4>:  0xa9016ffc   stp    x28, x27, [sp, #0x10]
> >
> >     0x192b45c14 <+8>:  0xa90267fa   stp    x26, x25, [sp, #0x20]
> >
> >     0x192b45c18 <+12>: 0xa9035ff8   stp    x24, x23, [sp, #0x30]
> >
> >     0x192b45c1c <+16>: 0xa90457f6   stp    x22, x21, [sp, #0x40]
> >
> >     0x192b45c20 <+20>: 0xa9054ff4   stp    x20, x19, [sp, #0x50]
> >
> >     0x192b45c24 <+24>: 0xa9067bfd   stp    x29, x30, [sp, #0x60]
> >
> >     0x192b45c28 <+28>: 0x910183fd   add    x29, sp, #0x60            ; =0x60
> >
> >     0x192b45c2c <+32>: 0xd10a83ff   sub    sp, sp, #0x2a0            ; =0x2a0
> >
> >
> >
> >
> >
> >
> >
> > EmulateInstructionARM64::EmulateLDPSTP interprets this as a save of v31.  The use of reg 31 is an easy bug, the arm manual C7.2.284 ("STP (SIMD&FP)") gives us an "opc" (0b00 == 32-bit registers, 0b01 == 64-bit registers, 0b10 == 128-bit registers), an immediate value, and three registers (Rt2, Rn, Rt).  In the above example, these work out to Rt2 == 8 (d8), Rn == 31 ("sp"), Rt == 9 (d9).  The unwinder is incorrectly saying v31 right now because it's using Rn -
> >
> >
> >
> >   if (vector) {
> >
> >     if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + n, reg_info_Rt))
> >
> >       return false;
> >
> >     if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + n, reg_info_Rt2))
> >
> >       return false;
> >
> >   }
> >
> >
> >
> > This would normally take up 32 bytes of stack space and cause big problems, but because we're writing the same reg twice, I think we luck out and only take 16 bytes of the stack.
> >
> >
> >
> > We don't have dwarf register numbers for s0..31, d0..31, so we can't track this instruction's behavior 100% correctly but maybe if we said that
> >
> >
> >
> > That would be an easy fix, like
> >
> >
> >
> >   if (vector) {
> >
> >     if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + t, reg_info_Rt))
> >
> >       return false;
> >
> >     if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + t2, reg_info_Rt2))
> >
> >       return false;
> >
> >   }
> >
> >
> >
> >
> >
> > We don't have dwarf register numbers for s0..31, d0..31, so I don't think we can correctly track this instruction's actions today.  Maybe we should put a save of v8 at CFA-112 and a save of v9 at CFA-104.  As long as the target is operating in little endian mode, when we go to get the contents of v8/v9 we're only actually USING the lower 64 bits so it'll work out, right?  I think I have that right.  We'll be reading garbage in the upper 64 bits - the register reading code won't have any knowledge of the fact that we only have the lower 32/64 bits available to us.
> >
> >
> >
> >
> >
> > Throwing the problem out there, would like to hear what you think.  I don't want to encode buggy behavior in a unit test ;) so I'd like it for us to think about what correct behavior would be, and do that before I write the test.
>