[PATCH] D87704: [AMDGPU] Reduce stack pointer alignment

Tue Sep 15 08:53:06 PDT 2020

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h:944-946
+  // the pointer. Set this to something above the minimum for HSA to avoid
+  // needing dynamic realignment in common cases.
+  Align getStackAlignment() const {
----------------
arsenm wrote:
> Flakebi wrote:
> > arsenm wrote:
> > > This isn't a real ABI requirement and I don't think should vary based on the triple. Why do you want to re-reduce this?
> > As far as I understand it, an alignment of 16 is needed for OpenCL but otherwise not required (as real memory addresses for lanes are aligned to 4 only anyway).
> > Forcing an alignment of 16 means that spilling one vgpr will reserve 1 kiB of scratch memory where only 256 Byte are needed.
> > If a single sgpr needs to be spilled to a vgpr (which is then in turn spilled to scratch), we need 1 kiB of memory for a 4 Byte value, which seems quite a lot to me.
> > 
> > Is there a downside of requiring an alignment of 4? Are there cases outside OpenCL where a higher alignment is required and the stack would need to be realigned?
> It's not needed, it just introduces stack realignment if you have any stack objects with a higher alignment. Values up to 16-bytes are common (and 8 are very common). This isn't a property of the source language since the target ABI still has higher alignments for common types.
I have been thinking we should add an optimization pass to reduce the alignment of allocas if the address isn't captured

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87704/new/

https://reviews.llvm.org/D87704