<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/58585>58585</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [X86] Suboptimal codegen for in-lane broadcast after combining two xmm registers into a ymm
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          tellowkrinkle
      </td>
    </tr>
</table>

<pre>
    Godbolt link: https://gcc.godbolt.org/z/cr9W1PhjW

When compiling for AVX2, the following function:
```c
__m256 combine_and_broadcast(__m128 a, __m128 b) {
        __m256 ab = _mm256_insertf128_ps(_mm256_castps128_ps256(a), b, 1);
        return _mm256_shuffle_ps(ab, ab, _MM_SHUFFLE(0, 0, 0, 0));
}
```
gets the broadcast moved to before combining the two registers, adding an extra instruction:
```asm
vbroadcastss    xmm0, xmm0
vbroadcastss    xmm1, xmm1
vinsertf128     ymm0, ymm0, xmm1, 1
```
instead of
```asm
vinsertf128     ymm0, ymm0, xmm1, 1
vpermilps       ymm0, ymm0, 0           # ymm0 = ymm0[0,0,0,0,4,4,4,4]
```
(This seems to only happen if you try to broadcast element 0, probably related to there being a dedicated instruction for that)
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyVVMtymzAU_RrYaOIB2WBYsEiaul00M51J22THSHABJUJiJOHE_fpeCSd2MsmiDBK6Dx3OfUhct4fqm265lo5IoR6j9SUZnJssLiK6w7dvmlW_eKy06VHzF0djyrv05_BwFyXXUXK5zHcDKNLocRII1ZNOG3L5555G9AtxA6AspX4Kllk1Tmjlf7Lsz5PlbRa5rkea5R6LCwU1U23NjWZtw6yLaIHmlBaEeeTjmke0JNH26oVPeYRgnETra1KPXqqFsmBchxtqDBHnRe1hJ7toUUYDQpcenfsp9cL6BG3AzUa9YNph7joJCyALG5a5vrmpb7__3u1-fEVL4lVvpvIN7Pb6XSoWsQdnQ_Ze4yej3kNLnCYcMMVwzJLPq_dzT5oY6IV1YGwg07bexhSBZ2cYwRw4M3-Sf2bHRbN__Z-1BJ_ncQy8w_czj_TokR49Tun2DuRwxDicsNIlvx9F7nkCa4nuPuX4f_j7Ccwo5BTYfuCekNMT0XXQh-YJDtmVdzofmzcj-7h-WPlfg7DEAozWF00reSADmyY8K6IjBz0TZw6hnK8VBgkjKLd0ymQ0Zxw3GZDMLZXHQmPhOYTCkhZa0QTTWW3D8XMDw_NSxlClOT5FkhVZ3FbrtlyXLHbCSagwsvsiR_7kduZ6cmJkEnuqhR4WEKEuJFPnHcg6bK7zvsOew2Sf-g73IEvmMxfPRlbvrhThhpmvcD8KUu5fPhcY6gM0yHgnrJ0B23eHhJHyUEHSbjkvszIDXnRZ03RQpHm33SQ0zbdFEUvGQVofTUSpgicSIHCNkcWiogmlaUKzJN-Um2zFioY2WZ5v6RZxuizaJDAyIVeeh7_nYlMFSnzuLRolxmVPRmat6BWE5Hl8NrtBm8qBv-EeDV6kEuJAoAoB_AN6eaOd">