[all-commits] [llvm/llvm-project] 46e782: [lldb][debugserver] Read/write SME registers on ar...
Jason Molenda via All-commits
all-commits at lists.llvm.org
Thu Dec 19 09:57:49 PST 2024
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 46e782300765eeac8026377bf30d5f08888c2b25
https://github.com/llvm/llvm-project/commit/46e782300765eeac8026377bf30d5f08888c2b25
Author: Jason Molenda <jmolenda at apple.com>
Date: 2024-12-19 (Thu, 19 Dec 2024)
Changed paths:
M lldb/source/Plugins/Architecture/AArch64/ArchitectureAArch64.cpp
M lldb/test/API/commands/register/register/register_command/TestRegisters.py
A lldb/test/API/macosx/sme-registers/Makefile
A lldb/test/API/macosx/sme-registers/TestSMERegistersDarwin.py
A lldb/test/API/macosx/sme-registers/main.c
M lldb/tools/debugserver/source/DNBDefs.h
M lldb/tools/debugserver/source/MacOSX/MachProcess.mm
M lldb/tools/debugserver/source/MacOSX/MachThread.cpp
M lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.cpp
M lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.h
A lldb/tools/debugserver/source/MacOSX/arm64/sme_thread_status.h
M lldb/tools/debugserver/source/RNBRemote.cpp
Log Message:
-----------
[lldb][debugserver] Read/write SME registers on arm64 (#119171)
**Note:** The register reading and writing depends on new register
flavor support in thread_get_state/thread_set_state in the kernel, which
will be first available in macOS 15.4.
The Apple M4 line of cores includes the Scalable Matrix Extension (SME)
feature. The M4s do not implement Scalable Vector Extension (SVE),
although the processor is in Streaming SVE Mode when the SME is being
used. The most obvious side effects of being in SSVE Mode are that (on
the M4 cores) NEON instructions cannot be used, and watchpoints may get
false positives, the address comparisons are done at a lowered
granularity.
When SSVE mode is enabled, the kernel will provide the Streaming Vector
Length register, which is a maximum of 64 bytes with the M4. Also
provided are SVCR (with bits indicating if SSVE mode and SME mode are
enabled), TPIDR2, SVL. Then the SVE registers Z0..31 (SVL bytes long),
P0..15 (SVL/8 bytes), the ZA matrix register (SVL*SVL bytes), and the M4
supports SME2, so the ZT0 register (64 bytes).
When SSVE/SME are disabled, none of these registers are provided by the
kernel - reads and writes of them will fail.
Unlike Linux, lldb cannot modify the SVL through a thread_set_state
call, or change the processor state's SSVE/SME status. There is also no
way for a process to request a lowered SVL size today, so the work that
David did to handle VL/SVL changing while stepping through a process is
not an issue on Darwin today. But debugserver should be providing
everything necessary so we can reuse all of David's work on resizing the
register contexts in lldb if it happens in the future. debugbserver
sends svl, svcr, and tpidr2 in the expedited registers when a thread
stops, if SSVE|SME mode are enabled (if the kernel allows it to read the
ARM_SME_STATE register set).
While the maximum SVL is 64 bytes on M4, the AArch64 maximum possible
SVL is 256; this would give us a 64k ZA register. If debugserver sized
all of its register contexts assuming the largest possible SVL, we could
easily use 2MB more memory for the register contexts of all threads in a
process -- and on iOS et al, processes must run within a small memory
allotment and this would push us over that.
Much of the work in debugserver was changing the arm64 register context
from being a static compile-time array of register sets, to being
initialized at runtime if debugserver is running on a machine with SME.
The ZA is only created to the machine's actual maximum SVL. The size of
the 32 SVE Z registers is less significant so I am statically allocating
those to the architecturally largest possible SVL value today.
Also, debugserver includes information about registers that share the
same part of the register file. e.g. S0 and D0 are the lower parts of
the NEON 128-bit V0 register. And when running on an SME machine, v0 is
the lower 128 bits of the SVE Z0 register. So the register maps used
when defining the VFP registers must differ depending on the
capabilities of the cpu at runtime.
I also changed register reading in debugserver, where formerly when
debugserver was asked to read a register, and the thread_get_state read
of that register failed, it would return all zero's. This is necessary
when constructing a `g` packet that gets all registers - because there
is no separation between register bytes, the offsets are fixed. But when
we are asking for a single register (e.g. Z0) when not in SSVE/SME mode,
this should return an error.
This does mean that when you're running on an SME capabable machine, but
not in SME mode, and do `register read -a`, lldb will report that 48 SVE
registers were unavailable and 5 SME registers were unavailable. But
that's only when `-a` is used.
The register reading and writing depends on new register flavor support
in thread_get_state/thread_set_state in the kernel, which is not yet in
a release. The test case I wrote is skipped on current OSes. I pilfered
the SME register setup from some of David's existing SME test files;
there were a few Linux specific details in those tests that they weren't
easy to reuse on Darwin.
rdar://121608074
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list