[llvm] [AArch64] Fix pairing different types of registers when computing CSRs. (PR #66642)

Zhaoxuan Jiang via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 9 06:25:03 PDT 2023

nocchijiang wrote:

> For instance, it didn't save x29 originally while it now saves x29

I think this is the expected behavior. When `producePairRegisters` returns `true`, as long as `LR` is saved, `FP` will be unconditionally saved no matter if it is really needed (`AArch64FrameLowering::hasFP`). The same behavior can be observed on Darwin platform when switching off the homogeneous prolog epilog pass:

Sample IR:
declare void @bar(i32 %i)

define void @test(i32 %i) nounwind minsize {
  call void asm sideeffect "mov x0, #42", "~{x0},~{x19},~{x20},~{x21},~{x22},~{x23},~{x24},~{x25},~{x26},~{x27}"() nounwind
  call void @bar(i32 %i)
  ret void

	.section	__TEXT,__text,regular,pure_instructions
	.ios_version_min 7, 0
	.globl	_test                           ; -- Begin function test
	.p2align	2
_test:                                  ; @test
; %bb.0:
	stp	x28, x27, [sp, #-96]!           ; 16-byte Folded Spill
	stp	x26, x25, [sp, #16]             ; 16-byte Folded Spill
	mov	w8, w0
	stp	x24, x23, [sp, #32]             ; 16-byte Folded Spill
	stp	x22, x21, [sp, #48]             ; 16-byte Folded Spill
	stp	x20, x19, [sp, #64]             ; 16-byte Folded Spill
	stp	x29, x30, [sp, #80]             ; 16-byte Folded Spill
	; InlineAsm Start
	mov	x0, #42                         ; =0x2a
	; InlineAsm End
	mov	w0, w8
	bl	_bar
	ldp	x29, x30, [sp, #80]             ; 16-byte Folded Reload
	ldp	x20, x19, [sp, #64]             ; 16-byte Folded Reload
	ldp	x22, x21, [sp, #48]             ; 16-byte Folded Reload
	ldp	x24, x23, [sp, #32]             ; 16-byte Folded Reload
	ldp	x26, x25, [sp, #16]             ; 16-byte Folded Reload
	ldp	x28, x27, [sp], #96             ; 16-byte Folded Reload
                                        ; -- End function

But I noticed that `LR`/`FP` is repeatedly saved in the helper function on Linux target because the lowering pass was assuming that `LR` would always be at an even index (0-based) of the CSRs list, which is broken by the odd CSRs cases. I decide to bail for the "odd index" cases (I don't think Swift is that popular in Android world).

> So, the subsequent instruction str     x21, [sp, #8]  in the function body seems overlapped with the latter.

I believe that `str x21, [sp, #8]` writes `x21` into the empty slot (where `xzr` was "saved") so the overlap is benign. I debugged around related code and found that `computeFreeStackSlots` (in `PrologEpilogInserter.cpp`) tries to put frame objects into the empty slots created by uneven CSRs. Anyway I decided to implement `str`/`ldr` in the lowering pass instead of storing/loading `xzr`.


