[Lldb-commits] Instruction emulation of arm64 'stp d8, d9, [sp, #-0x70]!' style instruction

Tamas Berghammer via lldb-commits lldb-commits at lists.llvm.org
Wed Oct 12 18:23:20 PDT 2016


I am a bit surprised that we don't define the smaller floating point
registers on AArch64 the same way we do it on x86/x86_64/Arm (have no idea
about MIPS) and I would consider this as a bug what we should fix (will
gave it a try when I have a few free cycles) because currently it is very
difficult to get a 64bit double value out of the data we are displaying.

Considering the fact that based on the ABI only the bottom 64bit have to be
saved I start to like the idea of teaching our unwinder about the smaller
floating point registers because that will mean that we can inspect float
and double variables from the parent stack frame while we never show some
corrupted data for variables occupying the full 128 bit vector register.

"...the unwind plan generated currently records this as v31 being saved...":
I don't understand this part. The code never references v31 nor d31 so if
the unwind plan records it as we saved v31 then something is clearly wrong
there. Does it treat all d0-d31 register as v31 because of some instruction
decoding reason?

Tamas

On Wed, Oct 12, 2016 at 4:30 PM Jason Molenda <jmolenda at apple.com> wrote:

> Hi Tamas, sorry for that last email being a mess, I was doing something
> else while writing it and only saw how unclear it was after I sent it.  You
> understood everything I was trying to say.
>
> I looked at the AAPCS64 again.  It says v8-v15 are callee saved, but says,
> "Additionally, only the bottom 64-bits of each value stored in v8-v15 need
> to be preserved[1]; it is the responsibility of the caller to preserve
> larger values." so we're never going to have a full 128-bit value that we
> can get off the stack (for targets following this ABI).
>
> DWARF can get away with only defining register numbers for v0-v31 because
> their use in DWARF always includes a byte size.  Having lldb register
> numbers and using that numbering scheme instead of DWARF is an interesting
> one.  It looks like all of the arm64 targets are using the definitions from
> Plugins/Process/Utility/RegisterInfos_arm64.h.  It also only defines v0-v31
> but the x86_64 RegisterInfos file defines pseudo registers ("eax") which
> are a subset of real registers ("rax") - maybe we could do something like
> that.
>
> I'm looking at this right now, but fwiw I wrote a small C program that
> uses enough fp registers that the compiler will spill all of the preserved
> registers (d8-d15) when I compile it with clang and -Os.  I'll use this for
> testing
>
>
>
> I get a prologue like
>
> a.out`foo:
> a.out[0x100007a08] <+0>:   0x6dba3bef   stp    d15, d14, [sp, #-96]!
> a.out[0x100007a0c] <+4>:   0x6d0133ed   stp    d13, d12, [sp, #16]
> a.out[0x100007a10] <+8>:   0x6d022beb   stp    d11, d10, [sp, #32]
> a.out[0x100007a14] <+12>:  0x6d0323e9   stp    d9, d8, [sp, #48]
> a.out[0x100007a18] <+16>:  0xa9046ffc   stp    x28, x27, [sp, #64]
> a.out[0x100007a1c] <+20>:  0xa9057bfd   stp    x29, x30, [sp, #80]
> a.out[0x100007a20] <+24>:  0x910143fd   add    x29, sp, #80              ;
> =80
>
>
> and the unwind plan generated currently records this as v31 being saved in
> the first instruction and ignores the others (because that reg has already
> been saved).  That's the easy bug - but it's a good test case that I'll
> strip down into a unit test once we get this figured out.
>
>
> J
>
>
> > On Oct 12, 2016, at 2:15 PM, Tamas Berghammer <tberghammer at google.com>
> wrote:
> >
> > Hi Jason,
> >
> > Thank you for adding unit test for this code. I think the current
> implementation doesn't fail terribly on 16 vs 32 byte stack alignment
> because we use the "opc" from the instruction to calculate the write back
> address (to adjust the SP) so having the wrong size of the register won't
> effect that part.
> >
> > Regarding the issue about s0/d0/v0 I don't have a perfect solution but
> here are some ideas:
> > * Use a different register numbering schema then DWARF (e.g.
> LLDB/GDB/???). I think it would only require us to change
> EmulateInstructionARM64::CreateFunctionEntryUnwind to create an UnwindPlan
> with a different register numbering schema and then to lookup the register
> based on a different numbering schema using GetRegisterInfo in functions
> where we want to reference s0-s31/d0-d31. In theory this should be a simple
> change but I won't be surprised if changing the register numbering breaks
> something.
> > * Introduce the concept of register pieces. This concept already exists
> in the DWARF expressions where you can say that a given part of the
> register is saved at a given location. I expect that doing it won't be
> trivial but it would solve both this problem and would improve the way we
> understand DWARF as well.
> >
> > About your idea about saying that we saved v8&v9 even though we only
> saved d8&d9: I think it is a good quick and dirty solution but it has a few
> issues what is hard to solve. Most importantly we will lie to the user when
> they read out v8&v9 what will contain some garbage data. Regarding
> big/little endian we should be able to detect which one the inferior uses
> and we can do different thing based on that (decide the location of v8&v9)
> but it will make the hack even worth.
> >
> > Tamas
> >
> > On Tue, Oct 11, 2016 at 6:15 PM Jason Molenda <jmolenda at apple.com>
> wrote:
> > Hi Tamas, I'm writing some unit tests for the unwind source generators -
> x86 last week, arm64 this week, and I noticed with this prologue:
> >
> >
> >
> > JavaScriptCore`JSC::B3::reduceDoubleToFloat:
> >
> >     0x192b45c0c <+0>:  0x6db923e9   stp    d9, d8, [sp, #-0x70]!
> >
> >     0x192b45c10 <+4>:  0xa9016ffc   stp    x28, x27, [sp, #0x10]
> >
> >     0x192b45c14 <+8>:  0xa90267fa   stp    x26, x25, [sp, #0x20]
> >
> >     0x192b45c18 <+12>: 0xa9035ff8   stp    x24, x23, [sp, #0x30]
> >
> >     0x192b45c1c <+16>: 0xa90457f6   stp    x22, x21, [sp, #0x40]
> >
> >     0x192b45c20 <+20>: 0xa9054ff4   stp    x20, x19, [sp, #0x50]
> >
> >     0x192b45c24 <+24>: 0xa9067bfd   stp    x29, x30, [sp, #0x60]
> >
> >     0x192b45c28 <+28>: 0x910183fd   add    x29, sp, #0x60            ;
> =0x60
> >
> >     0x192b45c2c <+32>: 0xd10a83ff   sub    sp, sp, #0x2a0            ;
> =0x2a0
> >
> >
> >
> >
> >
> >
> >
> > EmulateInstructionARM64::EmulateLDPSTP interprets this as a save of
> v31.  The use of reg 31 is an easy bug, the arm manual C7.2.284 ("STP
> (SIMD&FP)") gives us an "opc" (0b00 == 32-bit registers, 0b01 == 64-bit
> registers, 0b10 == 128-bit registers), an immediate value, and three
> registers (Rt2, Rn, Rt).  In the above example, these work out to Rt2 == 8
> (d8), Rn == 31 ("sp"), Rt == 9 (d9).  The unwinder is incorrectly saying
> v31 right now because it's using Rn -
> >
> >
> >
> >   if (vector) {
> >
> >     if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + n,
> reg_info_Rt))
> >
> >       return false;
> >
> >     if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + n,
> reg_info_Rt2))
> >
> >       return false;
> >
> >   }
> >
> >
> >
> > This would normally take up 32 bytes of stack space and cause big
> problems, but because we're writing the same reg twice, I think we luck out
> and only take 16 bytes of the stack.
> >
> >
> >
> > We don't have dwarf register numbers for s0..31, d0..31, so we can't
> track this instruction's behavior 100% correctly but maybe if we said that
> >
> >
> >
> > That would be an easy fix, like
> >
> >
> >
> >   if (vector) {
> >
> >     if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + t,
> reg_info_Rt))
> >
> >       return false;
> >
> >     if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + t2,
> reg_info_Rt2))
> >
> >       return false;
> >
> >   }
> >
> >
> >
> >
> >
> > We don't have dwarf register numbers for s0..31, d0..31, so I don't
> think we can correctly track this instruction's actions today.  Maybe we
> should put a save of v8 at CFA-112 and a save of v9 at CFA-104.  As long as
> the target is operating in little endian mode, when we go to get the
> contents of v8/v9 we're only actually USING the lower 64 bits so it'll work
> out, right?  I think I have that right.  We'll be reading garbage in the
> upper 64 bits - the register reading code won't have any knowledge of the
> fact that we only have the lower 32/64 bits available to us.
> >
> >
> >
> >
> >
> > Throwing the problem out there, would like to hear what you think.  I
> don't want to encode buggy behavior in a unit test ;) so I'd like it for us
> to think about what correct behavior would be, and do that before I write
> the test.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20161013/9114f986/attachment-0001.html>


More information about the lldb-commits mailing list