[llvm] [AArch64] Fix pairing different types of registers when computing CSRs. (PR #66642)
Zhaoxuan Jiang via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 9 06:25:03 PDT 2023
nocchijiang wrote:
> For instance, it didn't save x29 originally while it now saves x29
I think this is the expected behavior. When `producePairRegisters` returns `true`, as long as `LR` is saved, `FP` will be unconditionally saved no matter if it is really needed (`AArch64FrameLowering::hasFP`). The same behavior can be observed on Darwin platform when switching off the homogeneous prolog epilog pass:
Sample IR:
```llvm
declare void @bar(i32 %i)
define void @test(i32 %i) nounwind minsize {
call void asm sideeffect "mov x0, #42", "~{x0},~{x19},~{x20},~{x21},~{x22},~{x23},~{x24},~{x25},~{x26},~{x27}"() nounwind
call void @bar(i32 %i)
ret void
}
```
Output:
```assembly
.section __TEXT,__text,regular,pure_instructions
.ios_version_min 7, 0
.globl _test ; -- Begin function test
.p2align 2
_test: ; @test
; %bb.0:
stp x28, x27, [sp, #-96]! ; 16-byte Folded Spill
stp x26, x25, [sp, #16] ; 16-byte Folded Spill
mov w8, w0
stp x24, x23, [sp, #32] ; 16-byte Folded Spill
stp x22, x21, [sp, #48] ; 16-byte Folded Spill
stp x20, x19, [sp, #64] ; 16-byte Folded Spill
stp x29, x30, [sp, #80] ; 16-byte Folded Spill
; InlineAsm Start
mov x0, #42 ; =0x2a
; InlineAsm End
mov w0, w8
bl _bar
ldp x29, x30, [sp, #80] ; 16-byte Folded Reload
ldp x20, x19, [sp, #64] ; 16-byte Folded Reload
ldp x22, x21, [sp, #48] ; 16-byte Folded Reload
ldp x24, x23, [sp, #32] ; 16-byte Folded Reload
ldp x26, x25, [sp, #16] ; 16-byte Folded Reload
ldp x28, x27, [sp], #96 ; 16-byte Folded Reload
ret
; -- End function
.subsections_via_symbols
```
But I noticed that `LR`/`FP` is repeatedly saved in the helper function on Linux target because the lowering pass was assuming that `LR` would always be at an even index (0-based) of the CSRs list, which is broken by the odd CSRs cases. I decide to bail for the "odd index" cases (I don't think Swift is that popular in Android world).
> So, the subsequent instruction str x21, [sp, #8] in the function body seems overlapped with the latter.
I believe that `str x21, [sp, #8]` writes `x21` into the empty slot (where `xzr` was "saved") so the overlap is benign. I debugged around related code and found that `computeFreeStackSlots` (in `PrologEpilogInserter.cpp`) tries to put frame objects into the empty slots created by uneven CSRs. Anyway I decided to implement `str`/`ldr` in the lowering pass instead of storing/loading `xzr`.
https://github.com/llvm/llvm-project/pull/66642
More information about the llvm-commits
mailing list