<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/78109>78109</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[SelectionDAG] Miscompilation of nested shuffles of partially-undef splat values
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Benjins
</td>
</tr>
</table>
<pre>
Recently, there seems to have been a regression in SelectionDAG involving nested shuffles of splat optimisation interacting with partially `undef` elements
The following is a minimal repro (comparing trunk with 17.0.1): https://godbolt.org/z/jajGr53r3
```llvm
define <4 x i32> @do_stuff() {
; <0, 0, 7, 7>
%shuffle.1 = shufflevector <4 x i32> <i32 7, i32 7, i32 0, i32 7>, <4 x i32> zeroinitializer, <4 x i32> <i32 2, i32 2, i32 1, i32 1>
; <0, 0, 3, 7>
%shift = lshr <4 x i32> %shuffle.1, <i32 0, i32 0, i32 1, i32 0>
; <3, 3, 0, 0>
%shuffle.2 = shufflevector <4 x i32> %shift, <4 x i32> zeroinitializer, <4 x i32> <i32 2, i32 2, i32 0, i32 0>
; <0, 1, 0, 1>
%shuffle.3 = shufflevector <4 x i32> %shuffle.2, <4 x i32> <i32 1, i32 1, i32 1, i32 1>, <4 x i32> <i32 2, i32 6, i32 3, i32 7>
ret <4 x i32> %shuffle.3
}
```
Apologies that it's not super-minimal: pretty much all of this is trying to coax SelectionDAG into the right state to trigger the repro
The crux of the issue seems to be in `combineShuffleOfSplatVal`:
The input DAG at that point:
```
SelectionDAG has 15 nodes:
t0: ch,glue = EntryToken
t21: v4i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
t18: v4i32 = BUILD_VECTOR Constant:i32<0>, undef:i32, undef:i32, undef:i32
t9: v4i32 = srl t21, t18
t22: v4i32 = vector_shuffle<u,u,0,0> t9, undef:v4i32
t11: v4i32 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1>
t12: v4i32 = vector_shuffle<2,5,3,7> t22, t11
t15: ch,glue = CopyToReg t0, Register:v4i32 $xmm0, t12
t16: ch = X86ISD::RET_GLUE t15, TargetConstant:i32<0>, Register:v4i32 $xmm0, t15:1
```
which then gets transformed to:
```
SelectionDAG has 13 nodes:
t0: ch,glue = EntryToken
t21: v4i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
t18: v4i32 = BUILD_VECTOR Constant:i32<0>, undef:i32, undef:i32, undef:i32
t9: v4i32 = srl t21, t18
t11: v4i32 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1>
t12: v4i32 = vector_shuffle<2,5,3,7> t9, t11 ; <<<<< changed
t15: ch,glue = CopyToReg t0, Register:v4i32 $xmm0, t12
t16: ch = X86ISD::RET_GLUE t15, TargetConstant:i32<0>, Register:v4i32 $xmm0, t15:1
```
A bisect shows that this started with faecc736e2ac3cd8c77bebf41b1ed2e2d8cb575f, with the following change to `isSplatValue`:
https://github.com/llvm/llvm-project/blob/7f1d757fb40f06cc1c6b134d770987b340286996/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp#L2884-L2888
What's happening is that `t9` is now considered a splat, so since `t22` is a shuffle of it, `t22` is replaced with `t9` directly in `t12`. However, this leads to undefs in elements where there where none previously, and that leads to us outputting a different value.
>From debugging, it looks like it may be interacting with this code in `combineShuffleOfSplatVal`: it does not pass in an explicit `DemandedElts`, so it implicitly demands all elements. This means that it `isSplatValue` thinks there are no undef elements, and it returns true even if `AllowUndefs` is false
https://github.com/llvm/llvm-project/blob/7f1d757fb40f06cc1c6b134d770987b340286996/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L25100-L25104
I have confirmed that this still repros on the latest trunk, 850f713e80426f1706c0d3dad143c330ca872d5d
For priority/triage purposes: this was found with a fuzzer meant to test SIMD codegen, it was not in manually-written code
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzcWN1v2zgS_2uYl0EMirIk-yEPqZ30CnSxQJvu3VtBUSOJLUUKJGXX-esPJOXEdpt-7C0OexfEsiwN5-M3H5whd052GvGGFK9Isb3ik--NvXmF-pPU7qo2zeHmHQrUXh0I24Dv0SI4xMGBN9DzHUKNqIGDxc6ic9JokBreo0LhpdHb29cg9c6ondQdaHQeG3D91LYKHZgW3Ki4BzN6OUjHfVrv0XLhw4q99D2M3HrJlToAKemkG2xJSQEVDqi9I3RL6G26PvQIrVHK7MNi6YDDILUcuAKLozVA2EqYYeQ2vPd20p-TiKxa0EVG2Jrkt9B7PzqS3xJ2T9h9Z5raKL8wtiPs_pGw-0_802tb5DaHWXZJ079SuyE9arCVGoHkmyV8AZkzkt8BWdLGfHR-alvCVoStgVSvEj0AyV8FchpwjpcqXfK7mYKwYgZukQHJt0cYdyi8sZei8o3MWeJxdkOfn-R34f583SNaI7UMcMtHtF8TzIzZkc3TTfZ8c1T5RdPyC9OSbbL10S7l-q_MObF91unMGPqVFvQlLfInBWZ9LrRIUtiPEZ5V_isxpD_UPlJkT9pn39Y-_zntZ1NfVDD7hm9PnfxDw8rjTX4WdbPCFv3LWuWz-dX2IsdOYbkdjTKdRAe-5x6kJ6xyoI0HN41or-fcDzk9WvT-AMMkeuBKhdLje-lCjfD2EKuBAWH4l8va5U0oe2Bl13twnnsMlN7KrkObXoXKclmFhJ2-JCEI0rnppGzWGEokKakwQy01vk8m_96-D7XwD66ClfntJUepx8lDUIr7ZO9opPbPlOcInZnRcwdZAdo06J4WAAD4IAlET9imUxPGuLnT3h4ezGfUz3Tpz7MskO-WwZWB9NWHN2-3H_-42zz8_g42RjvPo0bRnxs6B8lf9eJSm2z1Z7RJG0h6-v2fZ_L8-lyYsyriEbbFbHVK6xk7J00Z-HEObZJvJsLih8ZPfheYn8iOK884Zr-Ee_YSin_yxXO0ZD-yLGBYEBarbBUtYyxBlB3Z-Kz4OuY2Zjw8mHfYhYhkG3iHnXQe7RENIGz5ZRho4sWeeZWJV2Tyr1X55v02xHd---7u4ePrtx_uoji2gQduO_QvBsX35QWFs-_UoX0vRR9SXUOHPlQUrl1r7IANePPzGZr_hxn6d8rP_2Z2_kJu_g9n0npOpKdm4PQfRM91h83_cZbdQi0dCg-uN_t5y497uPPchrkidvItRyGqvETGRS6alaiqGut2mdUZNgxZsxJ1URVtkBkX-LOZIcEYtmlSUumOW_KEp5vyxYAgfT_VC2EGwu7jCJC-rkdrPqHwhN3XytSE3Vdt1lRF1dZL2tJSiEyUdZYvm6qi61VV50vKVuV6XZ7wkWHdxjT4GjVh96c14-LnQowjYflbtlotr8N1dQreP3ueuqOejyPqeTyKGJKS-nWYqGTonfYgjHayQYsN8DSfBaycASe1wEjO2EzPj11m6HVkaoZP31scFRdH3zxJaqRF4dVhboRCrJV0Af8we9yljjk6ViFvYssU894F6uPUB_s4i6aJNN1rozH0eTtpJpcGVq6bZOMzJwdm8uPk43jJoZFtixa1h13w8uIUs3trBmiwnrpO6i62sB6UMZ8dKPkZw6-BH1I_dzGyRvWFaX6u1QucGoOpdR25i5ZyDfhlVFLI6KItDlw32Nwp78Kq5BLpQQ6JSB2giTQuNrhHoBbwEHQZkOunNvkbsR001p_dDCiPcCbYnwftGVDpQ-s-2cDPTgi4Qw2yDUxvQxp9iM6aA6DlyuHfLGu2t683yR_2KWmKjNLr-LU8DYE36ZBDGN3KtJ2fVB2p5mMFB0bHOqK4R-fTwUKAa1XQtspyXNElK9usoqWgTd7wJlvmIs-p4KuKNUVzFnXGwmilsdIfCLv3VvIOYZzsaFzsDZL4PXfQmknPqcWhnR4f0UZH-ziiBE3ev_ltG-OwC0DEAA4LQ5xJDQPXE1fqcL230nvUkfKqucmbdb7mV3iTVbRgRVGy9VV_U-Vl0_KaF1lWC9rUWJfYcLpsm2pdNO36St4wypY0y5bZipbFelFXdLkS2C4x560QJVlSHLhUi-CnhbHdVRyNbqpVRtdXiteoXDyLYkzjPs1NhDFSbK_sTQyOeuocWVIlnXfPXLz0Kh5inTm62MJv0gkzjFKlkyXTfusI6ul46TrFezqSiuXAXU1W3fxy4EbFXQjdYNi_AwAA___0Mq1e">