[PATCH] D131582: [AArch64][SME] Add support for arm_locally_streaming functions.
Sander de Smalen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 16 03:21:18 PDT 2022
sdesmalen added inline comments.
================
Comment at: llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1162
case ISD::EntryToken:
- llvm_unreachable("EntryToken should have been excluded from the schedule!");
case ISD::MERGE_VALUES:
----------------
efriedma wrote:
> sdesmalen wrote:
> > sdesmalen wrote:
> > > aemerson wrote:
> > > > ?
> > > Perhaps removing this without any alternative `assert` is a little crude, but I wasn't really sure what to test for instead.
> > >
> > > The change here is that EntryToken now "becomes part of the schedule" because other instructions can be glued to it, even though the node itself isn't schedulable (it is always guaranteed to be the first instruction). I was hoping someone would have a suggestion here.
> > @efriedma just wanting a second opinion on this, are you aware of anything else I might need to worry about here?
> >
> > I needed to remove this llvm_unreachable because I changed EntryToken to also have Glue, and when using that it makes the EntryToken part of the schedule (even though the node itself can't be scheduled).
> This feels pretty isolated; the entry token basically has the same meaning during isel, and we're throwing it away after isel. The only possible side-effect I can think of is that maybe there's something assumes the entry token only has one successor.
>
> That said, it feels like a sort of strange thing to be inserting smstart/smend this early... would it make sense to insert it as part of prologue/epilogue lowering instead?
I considered that early on but thought doing this at ISEL level would be easier because we get the benefits from the register allocator. Also, we already have the node/mechanisms in place for SelectionDAG, so adding it here seemed sensible.
For some simple example:
```define <8 x i32> @foo(<8 x i32> %x) nounwind "aarch64_pstate_sm_body" {
%y = add <8 x i32> %x, %x
ret <8 x i32> %y
}
```
if we were to do this at point of Prologue/Epilogue lowering, we'd have the following input:
```
# *** IR Dump Before Prologue/Epilogue Insertion & Frame Finalization (prologepilog) ***:
# Machine code for function foo: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues
Function Live Ins: $q0, $q1
bb.0 (%ir-block.0):
liveins: $q0, $q1
renamable $q0 = ADDv4i32 killed renamable $q0, renamable $q0
renamable $q1 = ADDv4i32 killed renamable $q1, renamable $q1
RET_ReallyLR implicit $q0, implicit $q1 ```
We'd then first need to do analysis which registers are clobbered by the smstart/smstop, add extra spill slots for the input arguments passed in NEON/FP/SVE registers, and then add the spills/smstart/reloads for the prolgoue, and spills/smstop/reloads for the epilogue. The frame-lowering code is already quite complex. We'd need to end up with something like:
```
foo: // @foo
// %bb.0:
sub sp, sp, #96
stp d15, d14, [sp, #32]
stp d13, d12, [sp, #48]
stp d11, d10, [sp, #64]
stp d9, d8, [sp, #80]
stp q0, q1, [sp] // input operands are clobbered by smstart, so spill to stack
smstart sm
ldr q0, [sp] // reload the spilled operand register
add v2.4s, v0.4s, v0.4s
ldr q0, [sp, #16] // reload the other spilled operand register
add v0.4s, v0.4s, v0.4s
stp q2, q0, [sp] // spill the results, because output operands are clobbered by smstop
smstop sm
ldp q0, q1, [sp] // reload the results
ldp d9, d8, [sp, #80]
ldp d11, d10, [sp, #64]
ldp d13, d12, [sp, #48]
ldp d15, d14, [sp, #32]
add sp, sp, #96
ret
```
If we insert the smstart/smstop as part of ISEL the register allocator will insert all the spills and fills for free. And we may have some minor benefits from the scheduler which can rearrange code around the spills/fills for the input arguments.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D131582/new/
https://reviews.llvm.org/D131582
More information about the llvm-commits
mailing list