<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/120015>120015</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
AVX mem broadcasts are cached on the stack
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
KyleSiefring
</td>
</tr>
</table>
<pre>
After exhausting registers inside of a loop, clang stores the results of a broadcast on the stack. This is inefficient, since broadcasting from memory is as fast as loading
Consider the following pseudo code:
```
float *restrict arr = ...; // prevent aliasing
loop {
exhaust vector registers
__mm256 x = _mm256_set1_ps(arr[0]);
use x
}
```
When clang compiles this, arr[0] is broadcasted outside the loop then x is stored on the stack.
```
vbroadcastss ymm0, dword ptr [rdx]
vmovups ymmword ptr [rsp - 72], ymm0
loop:
...
load x from stack
use x
jmp loop
```
The expected behavior is:
```
loop:
...
vbroadcastss x, dword ptr [rdx]
use x
jmp loop
```
Obligatory Godbolt Sample: https://godbolt.org/z/v7MYcefxY (Sorry if my method of stressing register allocation results in too much asm/bytecode.)
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyEVEFvrDYQ_jXmMgoCA9ndA4fdRNtDVfWQp7bvFBl7AL_aGHkMYfvrK8NmN8lTVWQJWTPzzTffzFgQ6W5ArFl1YtVzIqbQO1__ejH4orH1euiSxqlLfWwDesClFxMFPXTgsdMU0BPogbRCcC0IMM6NjD-BNGLogILzSBB6BI80mUCbV-OdUFJQADesVgpC_p3Ct14TxDNg22qpcQgRjPQg8R4Us7feWbBonb_EAEHQRjhBYJxQeuhYdmTZ8cmt3PyapHXGuLcYPRJOyoF0ClkR_dhjdj3ZsTVOBGD86JGC1zKA8B5Y8QxpmrLiBIyfGT_D6HHGIYAwWtCWMVYPbHdi2RHid5ULZpTB-btk7_bXV2t59QjLCr9dXglD_joS43vhPatOGaueGT-w4gY7EcISWe-ev3D_s8fhqr10dtRmVV9TVPGOFhW7qYkK3BTWDkaR1hJChFmi29pB9blNm7Qf08J8gyOKDC_WZjGnenNewRg8sOrk1RJLuRYBALN18zRS9P7kRyM8wI6vdT9tWJu2W7Pew2M77rfYd1i2wVh5frC9CwY_7LiN6Gf-LDt-6xFwGVFGQRrsxaydB00_z8dPRK48vmqw_L8AN17X77_p_d4Y3YkQx_0XpxpnArwIO5o4v9CHMK5E18HsNnvqfMf4-R_Gz_Put-8S2-U7ML5_cT7uTAv2AhZD71TcSQoeiT7uNQhjnBRBu-G2vHqA4BzYSfYgyDJ-bi4B4xqljB8SVRfqUBxEgnW-K0p-KDK-T_r6kedK7WRTSn4QO8VzUfD8UYhWiUKKQ57omme8zHle8Swryn3aNPuyrRq1l-VuX8qMlRlaoU1qzGxjZYkmmrDOeZblVWJEg4bWN4zzAd9gtTIeRyjxdQx6aKaOWJkZTYHuMEEHg_Xxj7_iY3JfCgLhEaSQ_ZfhTyZv6i9669BPTSpd1CMCX38Po3c_UAbGzysdYvx85TvX_N8AAAD__zfUuRk">