<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/58261>58261</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [x86] improve cost/codegen of insertelement of FP element 0
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          rotateright
      </td>
    </tr>
</table>

<pre>
    Based on an example from: https://reviews.llvm.org/D135278

Inserting an FP scalar into element 0 of an FP vector is assumed to be 0 cost, but that may not be accurate when the base vector is not undef:

```ll
define <4 x float> @ins0_v4f32(<4 x float> %v, float %s) {
  %r = insertelement <4 x float> %v, float %s, i32 0
  ret <4 x float> %r
}
```

```
% opt -mtriple=x86_64 -passes="print<cost-model>" -disable-output -mattr=+sse2 inselt.ll
Printing analysis 'Cost Model Analysis' for function 'ins0_v4f32':
Cost Model: Found an estimated cost of 0 for instruction:   %r = insertelement <4 x float> %v, float %s, i32 0

% opt -mtriple=x86_64 -passes="print<cost-model>" -disable-output -mattr=+avx2 inselt.ll
Printing analysis 'Cost Model Analysis' for function 'ins0_v4f32':
Cost Model: Found an estimated cost of 0 for instruction:   %r = insertelement <4 x float> %v, float %s, i32 0

% llc -o - inselt.ll -mattr=sse2
        movss   %xmm1, %xmm0                    ## xmm0 = xmm1[0],xmm0[1,2,3]

% llc -o - inselt.ll -mattr=avx2
        vblendps        $1, %xmm1, %xmm0, %xmm0         ## xmm0 = xmm1[0],xmm0[1,2,3]
```

There's a comment in X86InstrSSE.td about this choice:
```
// Prefer a movss or movsd over a blendps when optimizing for size. these were
// changed to use blends because blends have better throughput on sandybridge
// and haswell, but movs[s/d] are 1-2 byte shorter instructions.
```
I'm not sure if that makes sense anymore. Either way, we probably want the cost model for insertelement to check if the base vector is undef or not?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzlVk2P4jgQ_TXhUgIFh4Rw4NANjdSHlVqaPext5CQV4p0kRrYTYH79PhvoZno4zK52TwsR2GX71asPV6XQ1Xn9LC1XpHuSPfFJdoeWqTa6i5Inapw7WAwiscNjeFR8tLO2HbuZNnuItvMkFcs8irdR_HT5fe0tG6f6vQfcvZEtZSsNqd5p4pY77h3FpOvr8sil01i2JK0dOlDBvoKxpdTWRWJDxeDINdJRJ8_Ua-dXZVkORjqmY8M9VpkKmHEH5vcNfcW1Z3_HLsriy9O2FwG2qJ4pSjYLOlHdaumi5IWiRax6G38dF3UiIpH_tC7S0ZMLAj-zkVhRtHy-oJIXGaBuYbj3x83yX8HZkEoExTckww-PmatBy-0nyx6ae52KlPTB0bRzRiHQ4HfKs6_ZgqYHuJ8R620kxMEgWlDpIzDtdMUtdEJO00pZWbQ81YM7DB5HOmfCoWccF8HY1s1uzn3zQJdUkO3ZIi6RWG6ASr95VHq6iiGlGoGrh750CrkIwb37l-9R_Djs83OnEeOQuNYpcEH2eM4-u-IACBBnhoDp9_97Yfnv_SnH0__Sn21b0lTT9MP2D7f4HLvtX3V6tBb_OHTqurkHvAxjevCJRIKHwrLnG46kz3GUwtkbL8bMg8A9m8RL_wYvH6t3XiMC2leHC7XFHa97io_Y_lOKjy7-7w0bRqBRVhHCLgRF9fRHnr36EH758jJziHShQ3FFIpWNViV_VMvPhcM3AHozXLMBZPA9ISH8AN1jDNKr4ZeijHuhOvXdZ6vPHKu-88xXapTpo-d2j1s2st9fSv-A9YBjUedLeTdt5IgxOwddrjF62Df-ziC7reyrc2FUtf8RFmKcskfG_bk2Es8XXkT67Sp4kKRhmk8FFWc0E9to49HvstzOHvrjFa7tQpOxAxBUfetQ39iSZWQIlJ87bWDzi4LVho7y7EkcmQ5GF7j1Z4h6F5pXuGShMNxu2d1dglPKhstvFy0_dbrQ5XwowCZKdhNez7MsXcTZYpVOqnVSrZKVnDjlWl7DctQnb7fqwGLka5PdldC990GrPymHAE36vW9PBtOuf3wr2MO8oZghyTDxrwbXvykU_AmWmCp0dvYuT3ORzSfNOs7lPI3LKk-5WtXz1TJLllWeL-SqyGsW9aSVBbfW80WZ7PlIAQJjcJ-otYiFmMf4inSRprO4yDKRLeZlnXA-L5do3txJ1b6_qUzMOlAqhr3FYqus-3iNmaBSq33PwT0eXw4OibA22qECGrVv3CSoXwf6fwEAStIJ">