[PATCH] D73126: [X86] Generate unaligned access for fixed slots in unaligned stack
Jann Horn via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Jan 25 08:29:01 PST 2020
thejh marked an inline comment as done.
thejh added inline comments.
================
Comment at: llvm/test/CodeGen/X86/avx512-intel-ocl.ll:263-264
+; WIN64-SKX-NEXT: vmovups %zmm7, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
+; WIN64-SKX-NEXT: vmovups %zmm6, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
; WIN64-SKX-NEXT: andq $-64, %rsp
; WIN64-SKX-NEXT: vmovaps %zmm1, {{[0-9]+}}(%rsp)
----------------
rnk wrote:
> Is there some reason we are using fixed objects for these spills? We could use unfixed objects and store them relative to the aligned RSP after this alignment here. It's a big change, but it seems like the code would be better.
AFAICS using unfixed objects makes sense as an optimization that will work in most cases. However, it might be easier to still have this variant, where spills go into fixed objects, as a fallback method at least, in case the function prologue has to call another function before the stack frame is properly set up - for example, when compiling with `-fsplit-stack`.
The following case looks broken at the moment, and I think it might be easier to fix if the spills are in fixed slots. Otherwise, it would be necessary to either allocate stack memory for all the fixed slots before `callq __morestack` so that the non-fixed slots can be spilled into, or alternatively do something entirely different with the normally-callee-saved registers there.
```
$ cat test.c
void bar();
__attribute__((preserve_all)) void foo(void) {
bar();
}
$ ../bin/clang -fsplit-stack -fno-asynchronous-unwind-tables -S -o test.s test.c
$ cat test.s
.text
.file "test.c"
.globl foo # -- Begin function foo
.p2align 4, 0x90
.type foo, at function
foo: # @foo
# %bb.2:
leaq -344(%rsp), %r11
cmpq %fs:112, %r11
ja .LBB0_0
# %bb.1:
movabsq $344, %r10 # imm = 0x158
movabsq $0, %r11
callq __morestack
retq
.LBB0_0:
pushq %rbp
movq %rsp, %rbp
pushq %rsp
pushq %r10
pushq %r9
pushq %r8
pushq %rdi
pushq %rsi
pushq %rdx
pushq %rcx
pushq %rax
subq $264, %rsp # imm = 0x108
movaps %xmm15, -96(%rbp) # 16-byte Spill
movaps %xmm14, -112(%rbp) # 16-byte Spill
[...]
```
At the moment, the spills that are necessary for the `preserve_all` calling convention only happen after various registers that would be caller-saved in the standard calling convention have already been clobbered - r11 and r10 are clobbered by the generated code directly, other registers are going to be clobbered by `__morestack`, which assumes that only argument registers need to be spilled before calling into standard C code.
(In case you're wondering: No, I don't actually want to use `-fsplit-stack`, it was just the only example I could think of right now that needs function calls in the prologue.)
But maybe I'm just overthinking this - I'm not all that familiar with LLVM internals.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D73126/new/
https://reviews.llvm.org/D73126
More information about the llvm-commits
mailing list