<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/64706>64706</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
ABI for `__m256` and `__m512` is wrong when `avx`/`avx512` is disabled globally, or enabled per-function
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
chorman0773
</td>
</tr>
</table>
<pre>
Based on this discussion: https://groups.google.com/g/x86-64-abi/c/FMhl2vDl1D8
Currently, llvm passes `__m256` and `__m512` parameters/return values when it cannot use ymm/zmm registers as follows:
* Parameters are passed on the stack
* Return values are spanned accross 2-4 `xmm` registers.
Further, when the avx/avx512f features are enabled at the function level (not globally, using `__attribute__((target))`), it passes parameters/return values:
* Paramaters are passed on the stack
* Return values are placed in a single `ymm`/`zmm` register.
In contrast the behaviour of gcc (which is apparantly the correct behaviour in both cases) is:
When ymm/zmm registers are unavailable:
* Parameters are passed on the stack
* Return values in memory (return pointer in rdi)
When ymm/zmm registers are available at the function level (using `__attribute__((target))`), it passes and returns values as it does when the feature is available globally via a `-m` flag.
The difference in behaviour can be demonstrated by https://godbolt.org/z/8sYcn6654.
Based on a short discussion on the x86-64 psABI mailing list, this appears to be entirely incorrect on behalf of llvm: When returning w/o the registers available, it must return in memory as the ABI requires it to place the 2nd SSEUP eightbyte in the 3rd eightbyte of `xmm0`, which fails, and sends the entire value to memory. In the locally-enabled case, the registers are available, so it should be passing fully in `ymm1` and returning fully in `ymm0` (llvm seems to think that it is available given that it does return in `ymm0`).
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysVl-v4rYT_TTmZQQKJgnwwMPdvb9I-_CTVt1WVZ-uHGeSuOvYqceBy376ahzgwrZXrdpKKBD_mzPnHM-giEznEA-i-CCK54WaYu_DQfc-DMpl2-1mUfvmfPigCBvwDmJvCBpDeiIy3onNE_QxjiQ2T0JWQlZd8NNIq877zuJK-4HHhKxed-WyzJeqNkJWWsiq-n9v5fHZrp93InsW2dP8_DiFgC7as5AfwdrjAKMiQgJRZi8vgyxKUWagXHMZKNaSB0YV1IARAwlZBYxTcHBUdkKCU48OTAStnPMRJkI4Dwzr2zBAwM4QbwNF0Hpr_SnlMiOST_D5djCogDOYCxMIFJX--rb2h4e4vJxG5Rw2oLQOngjkMmfcr8PAoG_BV_cMVFOIPQbOP0HnQOr4KmSljq_FWrbQoopTuIRAp2rLIWJa2U5OR-MdWDyiBSF3nHRnfa3sTOpExnUzeyrGYOop4suLkDshd1GFDqOQe_6UWfrxkcm7iPA-zX8kTf1T0karNDZgHChgrBYZ7TlxxiYrs2-P_D3Q98mB9i4GRTMhNfbqaPwUwLfQac2UnHqjezAEauSM2G5prfYhoI53e4yD2scetCIkIfdg7jJNz59Zoz81VECYnDoqY1mh_8hVxsGAgw9nzuOiwOiNi5jAhsawaH8X3w3d-_b5N3bhazpjpJvAxPONv17MFHT2cxLkBuhqWTgaBYoBLJPorVXdg-A_9giNaVsM6DQmxW7yacVv0ODgHcWgIjZQn78vWb6pvY0rH7hQfROy2tEv2pVlkT8EuhVBBdT7EO_q4FW6ucrBSE8fPsGgjGXurKHIvKTaqcYRVSCInoGhiyagPYNxV-_5Gb9t2a9cALnGJhFnJvnEk5CVTwHv5LwZbZZgmChettyZRlHaxvAC_jYZriImMpp069KkdA18-fK_nz4Dmq6P9TkmVnlqE5q7Qd9eilmWxOd6xfeqVcYSv7L8hK6ZY865zj7ggDOiFXyaj7Zes9zLaz3jGzezhu-ZlqfJM37q_WQbZpSdxxS1k020XkrH-to13kj8bgXnwH5PPYcQh6RR7I37CrFXkcM8GtQck3_nqeToN7rfzhRyv4LZQIvmsGn2m71a4GFd7uU6y_flZtEfdrpd7_OtLLY72ZRlodu8zQuJErOmzPebhTnITG6y3bpYb7K93Kzadd0W9bYp5RrzbVOIPEP224rhs5MXhmjCQ5lvs3JhVY2WUpOX0uEJ0qSQknt-OPCeZT11JPKMzUpvp0QTLR7YLq0Pf9GDDcEpeLYnu1WUGXeta8mee9dlWWNoFvm-L_lw62UjhuW1Ei2mYA_fXVgT-6m-_LlIV2T-Wo7B_4o6ClmlBLlJJQJ-DwAA__9l8vGk">