<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/61002>61002</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[X86] Garbage in undemanded vector elements can cause fdiv performance drops
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:X86,
performance
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
RKSimon
</td>
</tr>
</table>
<pre>
This is related to #60632
We've noticed that when dealing with partially demanded or short vectors, the values in the undemanded elements can cause performance drops in some fp instructions (most notable in fdiv but also fsqrt/divps), even with DAZ/FTZ enabled. This has been noticed most on btver2 targets, but I expect there's other CPUs that can be affected in other ways.
Sometimes this appears to be values that would raise fp-exceptions (fdivzero etc. - even if they've been disabled), other times its just because the values are particularly large or poorly canonicalized - basically if the element's bits don't represent a typical float value then it seems some weaker fdiv units are likely to drop to a slower execution path.
Pulling out exact examples is proving to be tricky, but something like:
```
define <2 x float> @fdiv_post_shuffle(<2 x float> %a0, <2 x float> %a1) {
%d = fdiv <2 x float> %a0, %a1
%s = shufflevector <2 x float> %d, <2 x float> poison, <2 x i32> <i32 1, i32 0>
ret <2 x float> %s
}
fdiv_post_shuffle:
vdivps %xmm1, %xmm0, %xmm0
vpermilps $225, %xmm0, %xmm0 # xmm0 = xmm0[1,0,2,3]
retq
```
would be better if actually performed as something like:
```
fdiv_pre_shuffle:
vpermilps $17, %xmm0, %xmm0 # xmm0 = xmm0[1,0,1,0]
vpermilps $17, %xmm1, %xmm1 # xmm1 = xmm1[1,0,1,0]
vdivps %xmm1, %xmm0, %xmm0
retq
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVV9v2zYQ_zT0yyGGRFl28uCHNq6HYS_F2mFDXwqSOllsKFLlUY7dTz8cpaTO6nSbYYikeP9-v7vTKSJ78IhbUb8V9W6hxtSFuP39tw-2D36hQ3PefuwsgSWI6FTCBlIAIat1sa6kKHaieDM9_0QhN0cEH5I1LNapBI8demhQOesP8GhTB4OKySrnztBgr3yDDYQI1IWY4IgmhUhC3kPqEI7KjUhgfT6N_lkBHfboE4FRHowaCWHA2IbYK28QmhiGrEahR2gHsJ5SHE2ywRMIedsHShyn0g5Zrm3sEfSYQDkK0NLXmITcN_Y4kJB3HA4e0U_x7958EnK___gJ0LN-s4RMUKcINKJ_xp-dBA86HTFKSCoeMGVs7OlXwNOAJjG0yMwRBN7C_fs_aOKOwWkE1bZomHfrZ5FHdablJfUfQo_J9siKlkANA6pInCj9zOKUjjC6BqKyxLzc4Mng8MwKs_ANYwBMZgk3E2bbcoTnKbUZX2Mp456ZmUKavNtE8GWkBBqnrFxkUUWccm9Gp6I7g2NCOPdDCHw2ygdvjXL2GzZwA1oRn9x5juEp65krza6a4IXcJIg4RCT0CRSk88Ba0Lqg0uSblT3YBITY01QUj6geME6JHz0b4_icfUB3Zt64hHhVQC48YgQ8oRmZKhhU6l6w_350ubzDmABPyuRnPzjMXTPEcOTbKRkpWvNwfioCDiV1fMueRTUbFOti_udjg631CKK6l3CagInqHYhVweF_HgKlz9SNbetQyNsfxGStCnZ47aIU8g7E5u3kCAD4bQOi2k3UvG4sK19qUdaa45ga-Zp6cy2UIVjiXD7f2Epm-ereVhJKvuFNIap3351GTNc80EziZjdtfiTpiWeYf8fc6ax76vtyhnfq--Jy-1JjwNhbN9B8FnIlZX1d80mignxilrJE_ZZdsaQU8r4S9e6lj4jp69V6mJpYczumhJHbQ5k05laZP4PYgKL_WF4TPxFfp-cZrJCrcvMzlD-FOa3_hPmq9YtElC-tl0_Wy3-3PqX2_-T2Cu-LZls1d9WdWuC2XG82q3ItK7notlLfGqx0I1c11m1rVMFyWpo1lpu6bRd2KwtZFVKuS1lvSrks2lu50nqjlNG61JVYFdgr65bOHftliIeFJRpxuy6LQi6c0ugoD2cptTIP6BtRvfnrdi0kV42Q8mLw8bt6t4hbtnWjxwOJVeEsJfpuPdnk8rRnG_UOflFRq0OeghcDdu7fK3M2fxZ-GLaLMbptl9JAXD1yL-T-YFM36qUJvZB7dj8vN0MMX9DwgM1ISch9Bvt3AAAA__8NWqSX">