[llvm] [AArch64][SME] Implement the SME ABI (ZA state management) in Machine IR (PR #149062)
via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 16 03:45:38 PDT 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-aarch64
Author: Benjamin Maxwell (MacDue)
<details>
<summary>Changes</summary>
## Short Summary
This patch adds a new pass `aarch64-machine-sme-abi` to handle the ABI for ZA state (e.g., lazy saves and agnostic ZA functions). This is currently not enabled by default (but aims to be by LLVM 22). The goal is for this new pass to more optimally place ZA saves/restores and to work with exception handling.
## Long Description
This patch reimplements management of ZA state for functions with private and shared ZA state. Agnostic ZA functions will be handled in a later patch. For now, this is under the flag `-aarch64-new-sme-abi`, however, we intend for this to replace the current SelectionDAG implementation once complete.
The approach taken here is to mark instructions as needing ZA to be in a specific ("ACTIVE" or "LOCAL_SAVED"). Machine instructions implicitly defining or using ZA registers (such as $zt0 or $zab0) require the "ACTIVE" state. Function calls may need the "LOCAL_SAVED" or "ACTIVE" state depending on the callee (having shared or private ZA).
We already add ZA register uses/definitions to machine instructions, so no extra work is needed to mark these.
Calls need to be marked by glueing Arch64ISD::INOUT_ZA_USE or Arch64ISD::REQUIRES_ZA_SAVE to the CALLSEQ_START.
These markers are then used by the MachineSMEABIPass to find instructions where there is a transition between required ZA states. These are the points we need to insert code to set up or restore a ZA save (or initialize ZA).
To handle control flow between blocks (which may have different ZA state requirements), we bundle the incoming and outgoing edges of blocks. Bundles are formed by assigning each block an incoming and outgoing bundle (initially, all blocks have their own two bundles). Bundles are then combined by joining the outgoing bundle of a block with the incoming bundle of all successors.
These bundles are then assigned a ZA state based on the blocks that participate in the bundle. Blocks whose incoming edges are in a bundle "vote" for a ZA state that matches the state required at the first instruction in the block, and likewise, blocks whose outgoing edges are in a bundle vote for the ZA state that matches the last instruction in the block. The ZA state with the most votes is used, which aims to minimize the number of state transitions.
---
Patch is 216.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/149062.diff
23 Files Affected:
- (modified) llvm/lib/Target/AArch64/AArch64.h (+2)
- (modified) llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp (+26-13)
- (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+88-59)
- (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+4)
- (modified) llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h (+17-19)
- (modified) llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td (+28)
- (modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+2-1)
- (modified) llvm/lib/Target/AArch64/AArch64Subtarget.h (+5-1)
- (modified) llvm/lib/Target/AArch64/AArch64TargetMachine.cpp (+20-5)
- (modified) llvm/lib/Target/AArch64/CMakeLists.txt (+1)
- (added) llvm/lib/Target/AArch64/MachineSMEABIPass.cpp (+642)
- (modified) llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h (+1-1)
- (modified) llvm/test/CodeGen/AArch64/sme-agnostic-za.ll (+39)
- (modified) llvm/test/CodeGen/AArch64/sme-lazy-save-call.ll (+628-1)
- (added) llvm/test/CodeGen/AArch64/sme-lazy-sve-nzcv-live.mir (+132)
- (modified) llvm/test/CodeGen/AArch64/sme-new-za-function.ll (+154-41)
- (modified) llvm/test/CodeGen/AArch64/sme-shared-za-interface.ll (+52-1)
- (added) llvm/test/CodeGen/AArch64/sme-za-control-flow.ll (+1131)
- (added) llvm/test/CodeGen/AArch64/sme-za-exceptions.ll (+288)
- (modified) llvm/test/CodeGen/AArch64/sme-za-lazy-save-buffer.ll (+79-5)
- (modified) llvm/test/CodeGen/AArch64/sme-zt0-state.ll (+172-57)
- (modified) llvm/test/CodeGen/AArch64/sve-stack-frame-layout.ll (+371-297)
- (modified) llvm/unittests/Target/AArch64/SMEAttributesTest.cpp (+2-2)
``````````diff
diff --git a/llvm/lib/Target/AArch64/AArch64.h b/llvm/lib/Target/AArch64/AArch64.h
index 5496ebd495a55..8d0ff41fc8c08 100644
--- a/llvm/lib/Target/AArch64/AArch64.h
+++ b/llvm/lib/Target/AArch64/AArch64.h
@@ -60,6 +60,7 @@ FunctionPass *createAArch64CleanupLocalDynamicTLSPass();
FunctionPass *createAArch64CollectLOHPass();
FunctionPass *createSMEABIPass();
FunctionPass *createSMEPeepholeOptPass();
+FunctionPass *createMachineSMEABIPass();
ModulePass *createSVEIntrinsicOptsPass();
InstructionSelector *
createAArch64InstructionSelector(const AArch64TargetMachine &,
@@ -111,6 +112,7 @@ void initializeFalkorMarkStridedAccessesLegacyPass(PassRegistry&);
void initializeLDTLSCleanupPass(PassRegistry&);
void initializeSMEABIPass(PassRegistry &);
void initializeSMEPeepholeOptPass(PassRegistry &);
+void initializeMachineSMEABIPass(PassRegistry &);
void initializeSVEIntrinsicOptsPass(PassRegistry &);
void initializeAArch64Arm64ECCallLoweringPass(PassRegistry &);
} // end namespace llvm
diff --git a/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp b/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
index 7de66ccbf6f29..c0d118aa3afed 100644
--- a/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
@@ -92,8 +92,8 @@ class AArch64ExpandPseudo : public MachineFunctionPass {
bool expandCALL_BTI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI);
bool expandStoreSwiftAsyncContext(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI);
- MachineBasicBlock *expandRestoreZA(MachineBasicBlock &MBB,
- MachineBasicBlock::iterator MBBI);
+ MachineBasicBlock *expandCommitOrRestoreZA(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator MBBI);
MachineBasicBlock *expandCondSMToggle(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI);
};
@@ -974,9 +974,13 @@ bool AArch64ExpandPseudo::expandStoreSwiftAsyncContext(
}
MachineBasicBlock *
-AArch64ExpandPseudo::expandRestoreZA(MachineBasicBlock &MBB,
- MachineBasicBlock::iterator MBBI) {
+AArch64ExpandPseudo::expandCommitOrRestoreZA(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator MBBI) {
MachineInstr &MI = *MBBI;
+ bool IsRestoreZA = MI.getOpcode() == AArch64::RestoreZAPseudo;
+ assert((MI.getOpcode() == AArch64::RestoreZAPseudo ||
+ MI.getOpcode() == AArch64::CommitZAPseudo) &&
+ "Expected ZA commit or restore");
assert((std::next(MBBI) != MBB.end() ||
MI.getParent()->successors().begin() !=
MI.getParent()->successors().end()) &&
@@ -984,21 +988,23 @@ AArch64ExpandPseudo::expandRestoreZA(MachineBasicBlock &MBB,
// Compare TPIDR2_EL0 value against 0.
DebugLoc DL = MI.getDebugLoc();
- MachineInstrBuilder Cbz = BuildMI(MBB, MBBI, DL, TII->get(AArch64::CBZX))
- .add(MI.getOperand(0));
+ MachineInstrBuilder Branch =
+ BuildMI(MBB, MBBI, DL,
+ TII->get(IsRestoreZA ? AArch64::CBZX : AArch64::CBNZX))
+ .add(MI.getOperand(0));
// Split MBB and create two new blocks:
// - MBB now contains all instructions before RestoreZAPseudo.
- // - SMBB contains the RestoreZAPseudo instruction only.
- // - EndBB contains all instructions after RestoreZAPseudo.
+ // - SMBB contains the [Commit|RestoreZA]Pseudo instruction only.
+ // - EndBB contains all instructions after [Commit|RestoreZA]Pseudo.
MachineInstr &PrevMI = *std::prev(MBBI);
MachineBasicBlock *SMBB = MBB.splitAt(PrevMI, /*UpdateLiveIns*/ true);
MachineBasicBlock *EndBB = std::next(MI.getIterator()) == SMBB->end()
? *SMBB->successors().begin()
: SMBB->splitAt(MI, /*UpdateLiveIns*/ true);
- // Add the SMBB label to the TB[N]Z instruction & create a branch to EndBB.
- Cbz.addMBB(SMBB);
+ // Add the SMBB label to the CB[N]Z instruction & create a branch to EndBB.
+ Branch.addMBB(SMBB);
BuildMI(&MBB, DL, TII->get(AArch64::B))
.addMBB(EndBB);
MBB.addSuccessor(EndBB);
@@ -1006,8 +1012,12 @@ AArch64ExpandPseudo::expandRestoreZA(MachineBasicBlock &MBB,
// Replace the pseudo with a call (BL).
MachineInstrBuilder MIB =
BuildMI(*SMBB, SMBB->end(), DL, TII->get(AArch64::BL));
- MIB.addReg(MI.getOperand(1).getReg(), RegState::Implicit);
- for (unsigned I = 2; I < MI.getNumOperands(); ++I)
+ unsigned FirstBLOperand = 1;
+ if (IsRestoreZA) {
+ MIB.addReg(MI.getOperand(1).getReg(), RegState::Implicit);
+ FirstBLOperand = 2;
+ }
+ for (unsigned I = FirstBLOperand; I < MI.getNumOperands(); ++I)
MIB.add(MI.getOperand(I));
BuildMI(SMBB, DL, TII->get(AArch64::B)).addMBB(EndBB);
@@ -1617,8 +1627,9 @@ bool AArch64ExpandPseudo::expandMI(MachineBasicBlock &MBB,
return expandCALL_BTI(MBB, MBBI);
case AArch64::StoreSwiftAsyncContext:
return expandStoreSwiftAsyncContext(MBB, MBBI);
+ case AArch64::CommitZAPseudo:
case AArch64::RestoreZAPseudo: {
- auto *NewMBB = expandRestoreZA(MBB, MBBI);
+ auto *NewMBB = expandCommitOrRestoreZA(MBB, MBBI);
if (NewMBB != &MBB)
NextMBBI = MBB.end(); // The NextMBBI iterator is invalidated.
return true;
@@ -1629,6 +1640,8 @@ bool AArch64ExpandPseudo::expandMI(MachineBasicBlock &MBB,
NextMBBI = MBB.end(); // The NextMBBI iterator is invalidated.
return true;
}
+ case AArch64::InOutZAUsePseudo:
+ case AArch64::RequiresZASavePseudo:
case AArch64::COALESCER_BARRIER_FPR16:
case AArch64::COALESCER_BARRIER_FPR32:
case AArch64::COALESCER_BARRIER_FPR64:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 4f13a14d24649..49135d05b689b 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -8154,53 +8154,54 @@ SDValue AArch64TargetLowering::LowerFormalArguments(
if (Subtarget->hasCustomCallingConv())
Subtarget->getRegisterInfo()->UpdateCustomCalleeSavedRegs(MF);
- // Create a 16 Byte TPIDR2 object. The dynamic buffer
- // will be expanded and stored in the static object later using a pseudonode.
- if (Attrs.hasZAState()) {
- TPIDR2Object &TPIDR2 = FuncInfo->getTPIDR2Obj();
- TPIDR2.FrameIndex = MFI.CreateStackObject(16, Align(16), false);
- SDValue SVL = DAG.getNode(AArch64ISD::RDSVL, DL, MVT::i64,
- DAG.getConstant(1, DL, MVT::i32));
-
- SDValue Buffer;
- if (!Subtarget->isTargetWindows() && !hasInlineStackProbe(MF)) {
- Buffer = DAG.getNode(AArch64ISD::ALLOCATE_ZA_BUFFER, DL,
- DAG.getVTList(MVT::i64, MVT::Other), {Chain, SVL});
- } else {
- SDValue Size = DAG.getNode(ISD::MUL, DL, MVT::i64, SVL, SVL);
- Buffer = DAG.getNode(ISD::DYNAMIC_STACKALLOC, DL,
- DAG.getVTList(MVT::i64, MVT::Other),
- {Chain, Size, DAG.getConstant(1, DL, MVT::i64)});
- MFI.CreateVariableSizedObject(Align(16), nullptr);
- }
- Chain = DAG.getNode(
- AArch64ISD::INIT_TPIDR2OBJ, DL, DAG.getVTList(MVT::Other),
- {/*Chain*/ Buffer.getValue(1), /*Buffer ptr*/ Buffer.getValue(0)});
- } else if (Attrs.hasAgnosticZAInterface()) {
- // Call __arm_sme_state_size().
- SDValue BufferSize =
- DAG.getNode(AArch64ISD::GET_SME_SAVE_SIZE, DL,
- DAG.getVTList(MVT::i64, MVT::Other), Chain);
- Chain = BufferSize.getValue(1);
-
- SDValue Buffer;
- if (!Subtarget->isTargetWindows() && !hasInlineStackProbe(MF)) {
- Buffer =
- DAG.getNode(AArch64ISD::ALLOC_SME_SAVE_BUFFER, DL,
- DAG.getVTList(MVT::i64, MVT::Other), {Chain, BufferSize});
- } else {
- // Allocate space dynamically.
- Buffer = DAG.getNode(
- ISD::DYNAMIC_STACKALLOC, DL, DAG.getVTList(MVT::i64, MVT::Other),
- {Chain, BufferSize, DAG.getConstant(1, DL, MVT::i64)});
- MFI.CreateVariableSizedObject(Align(16), nullptr);
+ if (!Subtarget->useNewSMEABILowering() || Attrs.hasAgnosticZAInterface()) {
+ // Old SME ABI lowering (deprecated):
+ // Create a 16 Byte TPIDR2 object. The dynamic buffer
+ // will be expanded and stored in the static object later using a
+ // pseudonode.
+ if (Attrs.hasZAState()) {
+ TPIDR2Object &TPIDR2 = FuncInfo->getTPIDR2Obj();
+ TPIDR2.FrameIndex = MFI.CreateStackObject(16, Align(16), false);
+ SDValue SVL = DAG.getNode(AArch64ISD::RDSVL, DL, MVT::i64,
+ DAG.getConstant(1, DL, MVT::i32));
+ SDValue Buffer;
+ if (!Subtarget->isTargetWindows() && !hasInlineStackProbe(MF)) {
+ Buffer = DAG.getNode(AArch64ISD::ALLOCATE_ZA_BUFFER, DL,
+ DAG.getVTList(MVT::i64, MVT::Other), {Chain, SVL});
+ } else {
+ SDValue Size = DAG.getNode(ISD::MUL, DL, MVT::i64, SVL, SVL);
+ Buffer = DAG.getNode(ISD::DYNAMIC_STACKALLOC, DL,
+ DAG.getVTList(MVT::i64, MVT::Other),
+ {Chain, Size, DAG.getConstant(1, DL, MVT::i64)});
+ MFI.CreateVariableSizedObject(Align(16), nullptr);
+ }
+ Chain = DAG.getNode(
+ AArch64ISD::INIT_TPIDR2OBJ, DL, DAG.getVTList(MVT::Other),
+ {/*Chain*/ Buffer.getValue(1), /*Buffer ptr*/ Buffer.getValue(0)});
+ } else if (Attrs.hasAgnosticZAInterface()) {
+ // Call __arm_sme_state_size().
+ SDValue BufferSize =
+ DAG.getNode(AArch64ISD::GET_SME_SAVE_SIZE, DL,
+ DAG.getVTList(MVT::i64, MVT::Other), Chain);
+ Chain = BufferSize.getValue(1);
+ SDValue Buffer;
+ if (!Subtarget->isTargetWindows() && !hasInlineStackProbe(MF)) {
+ Buffer = DAG.getNode(AArch64ISD::ALLOC_SME_SAVE_BUFFER, DL,
+ DAG.getVTList(MVT::i64, MVT::Other),
+ {Chain, BufferSize});
+ } else {
+ // Allocate space dynamically.
+ Buffer = DAG.getNode(
+ ISD::DYNAMIC_STACKALLOC, DL, DAG.getVTList(MVT::i64, MVT::Other),
+ {Chain, BufferSize, DAG.getConstant(1, DL, MVT::i64)});
+ MFI.CreateVariableSizedObject(Align(16), nullptr);
+ }
+ // Copy the value to a virtual register, and save that in FuncInfo.
+ Register BufferPtr =
+ MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
+ FuncInfo->setSMESaveBufferAddr(BufferPtr);
+ Chain = DAG.getCopyToReg(Chain, DL, BufferPtr, Buffer);
}
-
- // Copy the value to a virtual register, and save that in FuncInfo.
- Register BufferPtr =
- MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
- FuncInfo->setSMESaveBufferAddr(BufferPtr);
- Chain = DAG.getCopyToReg(Chain, DL, BufferPtr, Buffer);
}
if (CallConv == CallingConv::PreserveNone) {
@@ -8217,6 +8218,15 @@ SDValue AArch64TargetLowering::LowerFormalArguments(
}
}
+ if (Subtarget->useNewSMEABILowering()) {
+ // Clear new ZT0 state. TODO: Move this to the SME ABI pass.
+ if (Attrs.isNewZT0())
+ Chain = DAG.getNode(
+ ISD::INTRINSIC_VOID, DL, MVT::Other, Chain,
+ DAG.getConstant(Intrinsic::aarch64_sme_zero_zt, DL, MVT::i32),
+ DAG.getTargetConstant(0, DL, MVT::i32));
+ }
+
return Chain;
}
@@ -8781,14 +8791,12 @@ static SDValue emitSMEStateSaveRestore(const AArch64TargetLowering &TLI,
MachineFunction &MF = DAG.getMachineFunction();
AArch64FunctionInfo *FuncInfo = MF.getInfo<AArch64FunctionInfo>();
FuncInfo->setSMESaveBufferUsed();
-
TargetLowering::ArgListTy Args;
TargetLowering::ArgListEntry Entry;
Entry.Ty = PointerType::getUnqual(*DAG.getContext());
Entry.Node =
DAG.getCopyFromReg(Chain, DL, Info->getSMESaveBufferAddr(), MVT::i64);
Args.push_back(Entry);
-
SDValue Callee =
DAG.getExternalSymbol(IsSave ? "__arm_sme_save" : "__arm_sme_restore",
TLI.getPointerTy(DAG.getDataLayout()));
@@ -8906,6 +8914,9 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
*DAG.getContext());
RetCCInfo.AnalyzeCallResult(Ins, RetCC);
+ // Determine whether we need any streaming mode changes.
+ SMECallAttrs CallAttrs = getSMECallAttrs(MF.getFunction(), CLI);
+
// Check callee args/returns for SVE registers and set calling convention
// accordingly.
if (CallConv == CallingConv::C || CallConv == CallingConv::Fast) {
@@ -8919,14 +8930,26 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
CallConv = CallingConv::AArch64_SVE_VectorCall;
}
+ bool UseNewSMEABILowering = Subtarget->useNewSMEABILowering();
+ bool IsAgnosticZAFunction = CallAttrs.caller().hasAgnosticZAInterface();
+ auto ZAMarkerNode = [&]() -> std::optional<unsigned> {
+ // TODO: Handle agnostic ZA functions.
+ if (!UseNewSMEABILowering || IsAgnosticZAFunction)
+ return std::nullopt;
+ if (!CallAttrs.caller().hasZAState() && !CallAttrs.caller().hasZT0State())
+ return std::nullopt;
+ return CallAttrs.requiresLazySave() ? AArch64ISD::REQUIRES_ZA_SAVE
+ : AArch64ISD::INOUT_ZA_USE;
+ }();
+
if (IsTailCall) {
// Check if it's really possible to do a tail call.
IsTailCall = isEligibleForTailCallOptimization(CLI);
// A sibling call is one where we're under the usual C ABI and not planning
// to change that but can still do a tail call:
- if (!TailCallOpt && IsTailCall && CallConv != CallingConv::Tail &&
- CallConv != CallingConv::SwiftTail)
+ if (!ZAMarkerNode.has_value() && !TailCallOpt && IsTailCall &&
+ CallConv != CallingConv::Tail && CallConv != CallingConv::SwiftTail)
IsSibCall = true;
if (IsTailCall)
@@ -8978,9 +9001,6 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
assert(FPDiff % 16 == 0 && "unaligned stack on tail call");
}
- // Determine whether we need any streaming mode changes.
- SMECallAttrs CallAttrs = getSMECallAttrs(MF.getFunction(), CLI);
-
auto DescribeCallsite =
[&](OptimizationRemarkAnalysis &R) -> OptimizationRemarkAnalysis & {
R << "call from '" << ore::NV("Caller", MF.getName()) << "' to '";
@@ -8994,7 +9014,7 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
return R;
};
- bool RequiresLazySave = CallAttrs.requiresLazySave();
+ bool RequiresLazySave = !UseNewSMEABILowering && CallAttrs.requiresLazySave();
bool RequiresSaveAllZA = CallAttrs.requiresPreservingAllZAState();
if (RequiresLazySave) {
const TPIDR2Object &TPIDR2 = FuncInfo->getTPIDR2Obj();
@@ -9076,10 +9096,21 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
AArch64ISD::SMSTOP, DL, DAG.getVTList(MVT::Other, MVT::Glue), Chain,
DAG.getTargetConstant((int32_t)(AArch64SVCR::SVCRZA), DL, MVT::i32));
- // Adjust the stack pointer for the new arguments...
+ // Adjust the stack pointer for the new arguments... and mark ZA uses.
// These operations are automatically eliminated by the prolog/epilog pass
- if (!IsSibCall)
+ assert((!IsSibCall || !ZAMarkerNode.has_value()) &&
+ "ZA markers require CALLSEQ_START");
+ if (!IsSibCall) {
Chain = DAG.getCALLSEQ_START(Chain, IsTailCall ? 0 : NumBytes, 0, DL);
+ if (ZAMarkerNode) {
+ // Note: We need the CALLSEQ_START to glue the ZAMarkerNode to, simply
+ // using a chain can result in incorrect scheduling. The markers referer
+ // to the position just before the CALLSEQ_START (though occur after as
+ // CALLSEQ_START lacks in-glue).
+ Chain = DAG.getNode(*ZAMarkerNode, DL, DAG.getVTList(MVT::Other),
+ {Chain, Chain.getValue(1)});
+ }
+ }
SDValue StackPtr = DAG.getCopyFromReg(Chain, DL, AArch64::SP,
getPointerTy(DAG.getDataLayout()));
@@ -9551,7 +9582,7 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
}
}
- if (CallAttrs.requiresEnablingZAAfterCall())
+ if (RequiresLazySave || CallAttrs.requiresEnablingZAAfterCall())
// Unconditionally resume ZA.
Result = DAG.getNode(
AArch64ISD::SMSTART, DL, DAG.getVTList(MVT::Other, MVT::Glue), Result,
@@ -9572,7 +9603,6 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
SDValue TPIDR2_EL0 = DAG.getNode(
ISD::INTRINSIC_W_CHAIN, DL, MVT::i64, Result,
DAG.getConstant(Intrinsic::aarch64_sme_get_tpidr2, DL, MVT::i32));
-
// Copy the address of the TPIDR2 block into X0 before 'calling' the
// RESTORE_ZA pseudo.
SDValue Glue;
@@ -9584,7 +9614,6 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
DAG.getNode(AArch64ISD::RESTORE_ZA, DL, MVT::Other,
{Result, TPIDR2_EL0, DAG.getRegister(AArch64::X0, MVT::i64),
RestoreRoutine, RegMask, Result.getValue(1)});
-
// Finally reset the TPIDR2_EL0 register to 0.
Result = DAG.getNode(
ISD::INTRINSIC_VOID, DL, MVT::Other, Result,
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 6afb3c330d25b..72897a0446ca2 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -173,6 +173,10 @@ class AArch64TargetLowering : public TargetLowering {
MachineBasicBlock *EmitZTInstr(MachineInstr &MI, MachineBasicBlock *BB,
unsigned Opcode, bool Op0IsDef) const;
MachineBasicBlock *EmitZero(MachineInstr &MI, MachineBasicBlock *BB) const;
+
+ // Note: The following group of functions are only used as part of the old SME
+ // ABI lowering. They will be removed once -aarch64-new-sme-abi=true is the
+ // default.
MachineBasicBlock *EmitInitTPIDR2Object(MachineInstr &MI,
MachineBasicBlock *BB) const;
MachineBasicBlock *EmitAllocateZABuffer(MachineInstr &MI,
diff --git a/llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h b/llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
index 800787cc0b4f5..3f6980fe11aea 100644
--- a/llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
@@ -213,9 +213,6 @@ class AArch64FunctionInfo final : public MachineFunctionInfo {
/// or return type
bool IsSVECC = false;
- /// The frame-index for the TPIDR2 object used for lazy saves.
- TPIDR2Object TPIDR2;
-
/// Whether this function changes streaming mode within the function.
bool HasStreamingModeChanges = false;
@@ -231,14 +228,6 @@ class AArch64FunctionInfo final : public MachineFunctionInfo {
// on function entry to record the initial pstate of a function.
Register PStateSMReg = MCRegister::NoRegister;
- // Holds a pointer to a buffer that is large enough to represent
- // all SME ZA state and any additional state required by the
- // __arm_sme_save/restore support routines.
- Register SMESaveBufferAddr = MCRegister::NoRegister;
-
- // true if SMESaveBufferAddr is used.
- bool SMESaveBufferUsed = false;
-
// Has the PNReg used to build PTRUE instruction.
// The PTRUE is used for the LD/ST of ZReg pairs in save and restore.
unsigned PredicateRegForFillSpill = 0;
@@ -250,6 +239,16 @@ class AArch64FunctionInfo final : public MachineFunctionInfo {
// Holds the SME function attributes (streaming mode, ZA/ZT0 state).
SMEAttrs SMEFnAttrs;
+ // Note: The following properties are only used for the old SME ABI lowering:
+ /// The frame-index for the TPIDR2 object used for lazy saves.
+ TPIDR2Object TPIDR2;
+ // Holds a pointer to a buffer that is large enough to represent
+ // all SME ZA state and any additional state required by the
+ // __arm_sme_save/restore support routines.
+ Register SMESaveBufferAddr = MCRegister::NoRegister;
+ // true if SMESaveBufferAddr is used.
+ bool SMESaveBufferUsed = false;
+
public:
AArch64FunctionInfo(const Function &F, const AArch64Subtarget *STI);
@@ -258,6 +257,13 @@ class AArch64FunctionInfo final : public MachineFunctionInfo {
const DenseMap<MachineBasicBlock *, MachineBasicBlock *> &Src2DstMBB)
const override;
+ // Old SME ABI lowering state getters/setters:
+...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/149062
More information about the llvm-commits
mailing list