<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - __attribute__((force_align_arg_pointer)) produces incorrect stack alignment"
   href="https://bugs.llvm.org/show_bug.cgi?id=37162">37162</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>__attribute__((force_align_arg_pointer)) produces incorrect stack alignment
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>new-bugs
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>6.0
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>new bugs
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>henrik@gramner.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>__attribute__((force_align_arg_pointer)) uses a hardcoded alignment value of 16
which breaks when using a stack alignment that's larger than 16.

It should behave the same way as -mstackrealign (which works correctly) except
on a per-function basis.

The use case is a library that makes heavy use of AVX/AVX2 and uses 32-byte
stack alignment to be able to do aligned load/stores of ymm registers from/to
stack buffers without having to realign the stack in every function, in
combination with the force_align_arg_pointer attribute on API entry points.

GCC handles the alignment correctly.


Minimal test case:

void bar(void);
__attribute__((force_align_arg_pointer)) void foo(void)
{
    bar();
}

-mstack-alignment=32:

<foo>:
    push   rbp
    mov    rbp,rsp
    and    rsp,0xfffffffffffffff0
    sub    rsp,0x10
    call   11 <foo+0x11>    d: R_X86_64_PC32        bar-0x4
    mov    rsp,rbp
    pop    rbp
    ret

-mstack-alignment=32 -mstackrealign (with or without force_align_arg_pointer):

<foo>:
    push   rbp
    mov    rbp,rsp
    and    rsp,0xffffffffffffffe0
    sub    rsp,0x20
    call   11 <foo+0x11>    d: R_X86_64_PC32        bar-0x4
    mov    rsp,rbp
    pop    rbp
    ret


(Also the "sub rsp, <alignment>" instructions seems pointless to me. And yes,
they're still there with -O3).</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>