[Lldb-commits] Instruction emulation of arm64 'stp d8, d9, [sp, #-0x70]!' style instruction

Wed Oct 12 14:15:13 PDT 2016

Hi Jason,

Thank you for adding unit test for this code. I think the current
implementation doesn't fail terribly on 16 vs 32 byte stack alignment
because we use the "opc" from the instruction to calculate the write back
address (to adjust the SP) so having the wrong size of the register won't
effect that part.

Regarding the issue about s0/d0/v0 I don't have a perfect solution but here
are some ideas:
* Use a different register numbering schema then DWARF (e.g. LLDB/GDB/???).
I think it would only require us to
change EmulateInstructionARM64::CreateFunctionEntryUnwind to create an
UnwindPlan with a different register numbering schema and then to lookup
the register based on a different numbering schema using GetRegisterInfo in
functions where we want to reference s0-s31/d0-d31. In theory this should
be a simple change but I won't be surprised if changing the register
numbering breaks something.
* Introduce the concept of register pieces. This concept already exists in
the DWARF expressions where you can say that a given part of the register
is saved at a given location. I expect that doing it won't be trivial but
it would solve both this problem and would improve the way we understand
DWARF as well.

About your idea about saying that we saved v8&v9 even though we only saved
d8&d9: I think it is a good quick and dirty solution but it has a few
issues what is hard to solve. Most importantly we will lie to the user when
they read out v8&v9 what will contain some garbage data. Regarding
big/little endian we should be able to detect which one the inferior uses
and we can do different thing based on that (decide the location of v8&v9)
but it will make the hack even worth.

Tamas

On Tue, Oct 11, 2016 at 6:15 PM Jason Molenda <jmolenda at apple.com> wrote:

Hi Tamas, I'm writing some unit tests for the unwind source generators -
x86 last week, arm64 this week, and I noticed with this prologue:

JavaScriptCore`JSC::B3::reduceDoubleToFloat:

    0x192b45c0c <+0>:  0x6db923e9   stp    d9, d8, [sp, #-0x70]!

    0x192b45c10 <+4>:  0xa9016ffc   stp    x28, x27, [sp, #0x10]

    0x192b45c14 <+8>:  0xa90267fa   stp    x26, x25, [sp, #0x20]

    0x192b45c18 <+12>: 0xa9035ff8   stp    x24, x23, [sp, #0x30]

    0x192b45c1c <+16>: 0xa90457f6   stp    x22, x21, [sp, #0x40]

    0x192b45c20 <+20>: 0xa9054ff4   stp    x20, x19, [sp, #0x50]

    0x192b45c24 <+24>: 0xa9067bfd   stp    x29, x30, [sp, #0x60]

    0x192b45c28 <+28>: 0x910183fd   add    x29, sp, #0x60            ; =0x60

    0x192b45c2c <+32>: 0xd10a83ff   sub    sp, sp, #0x2a0            ;
=0x2a0

EmulateInstructionARM64::EmulateLDPSTP interprets this as a save of v31.
The use of reg 31 is an easy bug, the arm manual C7.2.284 ("STP (SIMD&FP)")
gives us an "opc" (0b00 == 32-bit registers, 0b01 == 64-bit registers, 0b10
== 128-bit registers), an immediate value, and three registers (Rt2, Rn,
Rt).  In the above example, these work out to Rt2 == 8 (d8), Rn == 31
("sp"), Rt == 9 (d9).  The unwinder is incorrectly saying v31 right now
because it's using Rn -

  if (vector) {

    if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + n,
reg_info_Rt))

      return false;

    if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + n,
reg_info_Rt2))

      return false;

  }

This would normally take up 32 bytes of stack space and cause big problems,
but because we're writing the same reg twice, I think we luck out and only
take 16 bytes of the stack.

We don't have dwarf register numbers for s0..31, d0..31, so we can't track
this instruction's behavior 100% correctly but maybe if we said that

That would be an easy fix, like

  if (vector) {

    if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + t,
reg_info_Rt))

      return false;

    if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + t2,
reg_info_Rt2))

      return false;

  }

We don't have dwarf register numbers for s0..31, d0..31, so I don't think
we can correctly track this instruction's actions today.  Maybe we should
put a save of v8 at CFA-112 and a save of v9 at CFA-104.  As long as the
target is operating in little endian mode, when we go to get the contents
of v8/v9 we're only actually USING the lower 64 bits so it'll work out,
right?  I think I have that right.  We'll be reading garbage in the upper
64 bits - the register reading code won't have any knowledge of the fact
that we only have the lower 32/64 bits available to us.

Throwing the problem out there, would like to hear what you think.  I don't
want to encode buggy behavior in a unit test ;) so I'd like it for us to
think about what correct behavior would be, and do that before I write the
test.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20161012/0cb2b89f/attachment-0001.html>