<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/120015>120015</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            AVX mem broadcasts are cached on the stack
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          KyleSiefring
      </td>
    </tr>
</table>

<pre>
    After exhausting registers inside of a loop, clang stores the results of a broadcast on the stack. This is inefficient, since broadcasting from memory is as fast as loading

Consider the following pseudo code:
```
float *restrict arr = ...; // prevent aliasing
loop {
     exhaust vector registers
     __mm256 x = _mm256_set1_ps(arr[0]);
     use x
}
```
When clang compiles this, arr[0] is broadcasted outside the loop then x is stored on the stack.

```
 vbroadcastss    ymm0, dword ptr [rdx]
        vmovups ymmword ptr [rsp - 72], ymm0
loop:
        ...
        load x from stack
        use x
 jmp loop
```

The expected behavior is:
```
loop:
       ...
 vbroadcastss    x, dword ptr [rdx]
        use x
        jmp loop
```

Obligatory Godbolt Sample: https://godbolt.org/z/v7MYcefxY (Sorry if my method of stressing register allocation results in too much asm/bytecode.)
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyEVEFvrDYQ_jXmMgoCA9ndA4fdRNtDVfWQp7bvFBl7AL_aGHkMYfvrK8NmN8lTVWQJWTPzzTffzFgQ6W5ArFl1YtVzIqbQO1__ejH4orH1euiSxqlLfWwDesClFxMFPXTgsdMU0BPogbRCcC0IMM6NjD-BNGLogILzSBB6BI80mUCbV-OdUFJQADesVgpC_p3Ct14TxDNg22qpcQgRjPQg8R4Us7feWbBonb_EAEHQRjhBYJxQeuhYdmTZ8cmt3PyapHXGuLcYPRJOyoF0ClkR_dhjdj3ZsTVOBGD86JGC1zKA8B5Y8QxpmrLiBIyfGT_D6HHGIYAwWtCWMVYPbHdi2RHid5ULZpTB-btk7_bXV2t59QjLCr9dXglD_joS43vhPatOGaueGT-w4gY7EcISWe-ev3D_s8fhqr10dtRmVV9TVPGOFhW7qYkK3BTWDkaR1hJChFmi29pB9blNm7Qf08J8gyOKDC_WZjGnenNewRg8sOrk1RJLuRYBALN18zRS9P7kRyM8wI6vdT9tWJu2W7Pew2M77rfYd1i2wVh5frC9CwY_7LiN6Gf-LDt-6xFwGVFGQRrsxaydB00_z8dPRK48vmqw_L8AN17X77_p_d4Y3YkQx_0XpxpnArwIO5o4v9CHMK5E18HsNnvqfMf4-R_Gz_Put-8S2-U7ML5_cT7uTAv2AhZD71TcSQoeiT7uNQhjnBRBu-G2vHqA4BzYSfYgyDJ-bi4B4xqljB8SVRfqUBxEgnW-K0p-KDK-T_r6kedK7WRTSn4QO8VzUfD8UYhWiUKKQ57omme8zHle8Swryn3aNPuyrRq1l-VuX8qMlRlaoU1qzGxjZYkmmrDOeZblVWJEg4bWN4zzAd9gtTIeRyjxdQx6aKaOWJkZTYHuMEEHg_Xxj7_iY3JfCgLhEaSQ_ZfhTyZv6i9669BPTSpd1CMCX38Po3c_UAbGzysdYvx85TvX_N8AAAD__zfUuRk">