[llvm] [clang] [AArch64] Stack probing for function prologues (PR #66524)
Momchil Velikov via cfe-commits
cfe-commits at lists.llvm.org
Tue Nov 21 16:41:13 PST 2023
================
@@ -9460,6 +9461,94 @@ bool AArch64InstrInfo::isReallyTriviallyReMaterializable(
return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
}
+MachineBasicBlock::iterator
+AArch64InstrInfo::insertStackProbingLoop(MachineBasicBlock::iterator MBBI,
+ Register ScratchReg,
+ Register TargetReg) const {
+ MachineBasicBlock &MBB = *MBBI->getParent();
+ MachineFunction &MF = *MBB.getParent();
+ const AArch64InstrInfo *TII =
+ MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
+ int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
+ DebugLoc DL = MBB.findDebugLoc(MBBI);
+
+ MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
+ MachineBasicBlock *LoopTestMBB =
+ MF.CreateMachineBasicBlock(MBB.getBasicBlock());
+ MF.insert(MBBInsertPoint, LoopTestMBB);
+ MachineBasicBlock *LoopBodyMBB =
+ MF.CreateMachineBasicBlock(MBB.getBasicBlock());
+ MF.insert(MBBInsertPoint, LoopBodyMBB);
+ MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
+ MF.insert(MBBInsertPoint, ExitMBB);
+
+ // LoopTest:
+ // SUB ScratchReg, ScratchReg, #ProbeSize
+ emitFrameOffset(*LoopTestMBB, LoopTestMBB->end(), DL, ScratchReg, ScratchReg,
+ StackOffset::getFixed(-ProbeSize), TII,
+ MachineInstr::FrameSetup);
+
+ // CMP ScratchReg, TargetReg
+ AArch64CC::CondCode Cond = AArch64CC::LE;
+ Register Op1 = ScratchReg;
+ Register Op2 = TargetReg;
+ if (Op2 == AArch64::SP) {
+ assert(Op1 != AArch64::SP && "At most one of the registers can be SP");
+ // CMP TargetReg, ScratchReg
+ std::swap(Op1, Op2);
+ Cond = AArch64CC::GT;
+ }
+ BuildMI(*LoopTestMBB, LoopTestMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
+ AArch64::XZR)
+ .addReg(Op1)
+ .addReg(Op2)
+ .addImm(AArch64_AM::getArithExtendImm(AArch64_AM::UXTX, 0))
+ .setMIFlags(MachineInstr::FrameSetup);
+
+ // B.<Cond> LoopExit
+ BuildMI(*LoopTestMBB, LoopTestMBB->end(), DL, TII->get(AArch64::Bcc))
+ .addImm(Cond)
+ .addMBB(ExitMBB)
+ .setMIFlags(MachineInstr::FrameSetup);
+
+ // STR XZR, [ScratchReg]
+ BuildMI(*LoopBodyMBB, LoopBodyMBB->end(), DL, TII->get(AArch64::STRXui))
+ .addReg(AArch64::XZR)
+ .addReg(ScratchReg)
+ .addImm(0)
+ .setMIFlags(MachineInstr::FrameSetup);
+
+ // B loop
+ BuildMI(*LoopBodyMBB, LoopBodyMBB->end(), DL, TII->get(AArch64::B))
+ .addMBB(LoopTestMBB)
+ .setMIFlags(MachineInstr::FrameSetup);
+
+ // LoopExit:
+ // STR XZR, [TargetReg]
+ BuildMI(*ExitMBB, ExitMBB->begin(), DL, TII->get(AArch64::STRXui))
+ .addReg(AArch64::XZR)
+ .addReg(TargetReg)
+ .addImm(0)
+ .setMIFlags(MachineInstr::FrameSetup);
----------------
momchil-velikov wrote:
> ```
> sub sp, sp, #0x1, lsl #0xc
> cmp sp, x1
> b.le 0x5555557388
> str xzr, [x1] {0x0}
> ```
>
> We are probing the _old_ stack head! `x1` contains `0x7fffffee80` but `sp` is at `7fffffde80`! This means that the selection of the `x1` register instead of `sp` is incorrect.
I can't quite see how it is possible to generate this code. This is part of the sequence for allocating a compile time unknown amount of stack space that is done by `AArch64InstrInfo::insertStackProbingLoop`. In this function `TargetReg` is the new
top of the stack and right now [1] `ScratchReg` is always `AArch64::SP` .
Thus first we have
```
// LoopTest:
// SUB ScratchReg, ScratchReg, #ProbeSize
emitFrameOffset(*LoopTestMBB, LoopTestMBB->end(), DL, ScratchReg, ScratchReg,
StackOffset::getFixed(-ProbeSize), TII,
MachineInstr::FrameSetup);
```
This is the code the emits the ` su sp, sp, #0x1, lsl #0xc`. Note, it uses `ScratchReg`.
Then we emit the compare
```
// CMP ScratchReg, TargetReg
AArch64CC::CondCode Cond = AArch64CC::LE;
Register Op1 = ScratchReg;
Register Op2 = TargetReg;
if (Op2 == AArch64::SP) { // condition is false here
// ...
}
BuildMI(*LoopTestMBB, LoopTestMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
AArch64::XZR)
.addReg(Op1)
.addReg(Op2)
.addImm(AArch64_AM::getArithExtendImm(AArch64_AM::UXTX, 0))
.setMIFlags(MachineInstr::FrameSetup);
```
That is the `cmp sp, x1`. So, `Op2` is `TargetReg` and `TargetReg` is `x1`.
Then we emit the loop exit branch:
```
// B.<Cond> LoopExit
BuildMI(*LoopTestMBB, LoopTestMBB->end(), DL, TII->get(AArch64::Bcc))
.addImm(Cond)
.addMBB(ExitMBB)
.setMIFlags(MachineInstr::FrameSetup);
```
This is the `b.le 0x5555557388` above.
and then, still inside the probing loop, we emit a stack probe to `ScratchReg`, i.e. to `SP`.
```
// STR XZR, [ScratchReg]
BuildMI(*LoopBodyMBB, LoopBodyMBB->end(), DL, TII->get(AArch64::STRXui))
.addReg(AArch64::XZR)
.addReg(ScratchReg)
.addImm(0)
.setMIFlags(MachineInstr::FrameSetup);
```
However, in your assembly snippet I see ` str xzr, [x1] {0x0}`, i.e. a store to `TargetReg`.
That looks impossible to happen with the current source.
That part of the source that you changed is the final probe to `TargetReg` after the loop
```
// B loop
BuildMI(*LoopBodyMBB, LoopBodyMBB->end(), DL, TII->get(AArch64::B))
.addMBB(LoopTestMBB)
.setMIFlags(MachineInstr::FrameSetup);
// LoopExit:
// STR XZR, [TargetReg]
BuildMI(*ExitMBB, ExitMBB->begin(), DL, TII->get(AArch64::STRXui))
.addReg(AArch64::XZR)
.addReg(TargetReg)
.addImm(0)
.setMIFlags(MachineInstr::FrameSetup);
```
You can see, it's preceded by an unconditional branch, so it's not the code that generates the stack probe instruction in you assembly snippet.
Having said that, this is indeed a bug!!! Although it might not be the same bug you're seeing.
The instruction should issue a probe to the new top of the stack. And it indeed does so, in the sense it writes to the correct address.
**However, the stack pointer should have been already set to that address.** It is set immediately after that instruction, but it should be set before that instruction.
I'll prepare a fix (may take 3-4 days) and then we'll see.
Thanks for looking at it, much appreciated!!!
[1] the function is slightly more flexible.
https://github.com/llvm/llvm-project/pull/66524
More information about the cfe-commits
mailing list