[PATCH] D73126: [X86] Generate unaligned access for fixed slots in unaligned stack

Sat Jan 25 08:37:51 PST 2020

thejh marked an inline comment as done.
thejh added inline comments.

================
Comment at: llvm/test/CodeGen/X86/avx512-intel-ocl.ll:263-264
+; WIN64-SKX-NEXT:    vmovups %zmm7, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
+; WIN64-SKX-NEXT:    vmovups %zmm6, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
 ; WIN64-SKX-NEXT:    andq $-64, %rsp
 ; WIN64-SKX-NEXT:    vmovaps %zmm1, {{[0-9]+}}(%rsp)
----------------
thejh wrote:
> rnk wrote:
> > Is there some reason we are using fixed objects for these spills? We could use unfixed objects and store them relative to the aligned RSP after this alignment here. It's a big change, but it seems like the code would be better.
> AFAICS using unfixed objects makes sense as an optimization that will work in most cases. However, it might be easier to still have this variant, where spills go into fixed objects, as a fallback method at least, in case the function prologue has to call another function before the stack frame is properly set up - for example, when compiling with `-fsplit-stack`.
> 
> The following case looks broken at the moment, and I think it might be easier to fix if the spills are in fixed slots. Otherwise, it would be necessary to either allocate stack memory for all the fixed slots before `callq __morestack` so that the non-fixed slots can be spilled into, or alternatively do something entirely different with the normally-callee-saved registers there.
> 
> ```
> $ cat test.c
> void bar();
> __attribute__((preserve_all)) void foo(void) {
>   bar();
> }
> $ ../bin/clang -fsplit-stack -fno-asynchronous-unwind-tables -S -o test.s test.c
> $ cat test.s
> 	.text
> 	.file	"test.c"
> 	.globl	foo                     # -- Begin function foo
> 	.p2align	4, 0x90
> 	.type	foo, at function
> foo:                                    # @foo
> # %bb.2:
> 	leaq	-344(%rsp), %r11
> 	cmpq	%fs:112, %r11
> 	ja	.LBB0_0
> # %bb.1:
> 	movabsq	$344, %r10              # imm = 0x158
> 	movabsq	$0, %r11
> 	callq	__morestack
> 	retq
> .LBB0_0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	pushq	%rsp
> 	pushq	%r10
> 	pushq	%r9
> 	pushq	%r8
> 	pushq	%rdi
> 	pushq	%rsi
> 	pushq	%rdx
> 	pushq	%rcx
> 	pushq	%rax
> 	subq	$264, %rsp              # imm = 0x108
> 	movaps	%xmm15, -96(%rbp)       # 16-byte Spill
> 	movaps	%xmm14, -112(%rbp)      # 16-byte Spill
> [...]
> ```
> 
> At the moment, the spills that are necessary for the `preserve_all` calling convention only happen after various registers that would be caller-saved in the standard calling convention have already been clobbered - r11 and r10 are clobbered by the generated code directly, other registers are going to be clobbered by `__morestack`, which assumes that only argument registers need to be spilled before calling into standard C code.
> 
> (In case you're wondering: No, I don't actually want to use `-fsplit-stack`, it was just the only example I could think of right now that needs function calls in the prologue.)
> 
> But maybe I'm just overthinking this - I'm not all that familiar with LLVM internals.
Anyway, my overall idea for this patch was that I wanted to fix a crash I was encountering; I realize that there is the potential for further optimization, but I did not want to invest the time necessary for that.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73126/new/

https://reviews.llvm.org/D73126