[LLVMdev] Stack alignment on X86 AVX seems incorrect

Fri Mar 2 09:19:55 PST 2012

On Mar 2, 2012, at 9:16 AM, Joerg Sonnenberger <joerg at britannica.bec.de> wrote:

> On Fri, Mar 02, 2012 at 11:58:29AM -0500, Cameron McInally wrote:
>> On Fri, Mar 2, 2012 at 11:32 AM, Evandro Menezes <emenezes at codeaurora.org>
>> wrote:
>> ...
>>> Figure 3.3 on page 16 of www.x86-64.org/documentation/abi.pdf is not
>>> normative.  See foot note 7 in the same page.  Figure 3.4 on page 21
>>> confirms that the use of a frame-pointer is optional.
>>> 
>>> So, if one doesn't use ENTER in the prologue and uses RSP to access local
>>> variables, RBP may be used as a calee-saved GPR.
>> 
>> I am not sure if I am completely following. The issue that required
>> aligning the frame to 32 bytes is when there are variable sized objects on
>> the stack (e.g. alloca). In that case, the RBP frame pointer is required to
>> access the spill slots. If I'm not mistaken, calculating the address of
>> spill slots off of RSP would be costly in this case.
> 
> No, stack realignment needs to happen if there are auto variables on the
> stack of types that need a larger alignment than the default. This
> currently means AVX vectors for x86-64 and SSE/AVX vectors for x86-32
> folloing the original sysv ABI. In that case %rbp/%ebp is used to
> reference the original arguments on the stack and %rsp/%esp is used to
> reference the auto variables.
> 
> This doesn't work though if dynamic allocas exist, so either stack
> variables with larger alignment need to be turned into / remain as
> dynamic allocas OR another register is needed to replace %rsp/%esp
> in the above.
> 

Exactly right.

>> This does bring up an interesting idea though. If we wanted to punt, it
>> would be possible to check for variable sized objects on the stack and then
>> only issue unaligned moves for 256b spills/reloads. Not ideal for
>> performance, but it would work as a stopgap.
> 
> The problem is worse on x86-32 following the original sysv ABI. In that
> case both GCC and LLVM currently just create broken code if a function
> uses both SSE instructions and alloca.
> 
> Joerg
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev