<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/117170>117170</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            SLP vectorizer produces bad shuffles
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            miscompilation,
            llvm:SLPVectorizer
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          wjschmidt
      </td>
    </tr>
</table>

<pre>
    [slp-shuffle-bug.ll.txt](https://github.com/user-attachments/files/17847373/slp-shuffle-bug.ll.txt)
[slp-shuffle-output.ll.txt](https://github.com/user-attachments/files/17847374/slp-shuffle-output.ll.txt)

opt -passes=slp-vectorizer slp-shuffle-bug.ll -o slp-shuffle-output.ll

The attached reduced test case slp-shuffle-bug.ll.txt contains a number of scalar load-multiply-store chains that SLP vectorizes.  However, something goes wrong with generating two of the shuffles in the vectorized sequence, shown in slp-shuffle-output.ll.

The shuffle

`  %14 = shufflevector <16 x float> %13, <16 x float> %12, <16 x i32> <i32 1, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>`

produces wrong code, as all of the 0 indices should have referenced indices from the second operand %12 instead.  Also, the shuffle

`  %12 = shufflevector <2 x float> %11, <2 x float> %8, <16 x i32> <i32 poison, i32 0, i32 2, i32 1, i32 0, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>`

is suspect, as the first four indices are never used and the fifth entry is incorrect (0 instead of 2).

The test case loads from two areas of array GLOB, performs multiplications, and stores the result in a third area of GLOB.  The expected behavior is:

[2928:2956] = [1612] * [1208:1236]
[2960:2996] = [1616] * [1240:1276]
[3000:3036] = [1620] * [1280:1316]
[3040:3052] = [1624] * [1320:1332]

where [X] references dereference of the location at offset X of GLOB, and [X:Y] references a sequence of such dereferences.  The vectorized code produces correct values for locations [2928:2996], but due to the bad shuffles the rest of the values are:

[3000:3004] = [1612] * [1280:1284]
[3008] = [1612] * [1620]
[3012:3052] = [1612] * [1292:1332]
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzEVkGP2zgP_TXKhUggU7aTHHLINJ3vOxTYAl0sukfFomMVsuWV5KSzv34hOfE4mWn3skAviWXykU98FC3pvT51RDtWPLHisJBDaKzbXb75qmm1CoujVS_R6E2_9M1Q14aWx-G0MmYVvgdWHBhumhB6z8Se4TPD55MOzXBcVbZl-Dx4cksZgqyalrrgGT7X2lD8z9abfC3WguHzD4LjlvED4_uH9HYI_RD-Owb5A4P7-BOJ9Gv7AMteek-eiUNEnakK1um_ycHbbcDSwruh5zF_bwhGgqTAkRoqUhDIB6ikp3eiRl5Q2S5I3XmQ0A3tkRzYGnwljXRgrFTLdjBB9-Zl6YN1BFWTvEMjA3z59Bkm3n4F8H97oTM5hh_A25ZCo7sTnCx5uDjbneCiQwMn6sjJEE3hYmO60BBcqXnQXVpPcRV4-mugrqIUtrGXLvq8X-nHelw95q9ZyQEYFlkOTBxuHmM6YOJDVsJ3qI2VgYmPyVHExO9acG7RAtN78UELhCya7h74r3kQH1nJ5wXonY29cdOksipVVnqQxtzk4KA7paOXb-xgFDTyTOCoJhelUJO5drYdBaTKdgpsT052aiwP6M4HkmoFsDfexjThX1TB91XBx9Jn19I_GjY_lqS32tvuTYnwx0qJ28M99Bet3iipPfjB91SFq4CxuLV2PkBtBzdpJB1BFw8mDJ4URHlGzzo0QF1wL6Djwausc1QFYLjhN-liQyDD7Zuj9TpZ4pi4NcLFxmzSR5h0Tr7A_z799hTp9eRq61oP14GiKxm07Xyi3ilI42XcgiM_mBBPuYTQaKdSzBgyBlsBxPz0PW6cFBypkWdtHeg0uudNVTzhFjdM7HFblKw4pN5ixVNWZpiWuE9L5NEpQxGdZtiSJ-z2AVveYXOesOs7rOA8vhZc3GOR32E3CSuyB2w-Ygu8x-ZzrMARK_AVm34vDTmKLl-j-3RkPSiaFrdjbuyoAsgAtq49Bfh6q_NNmBRJ7P98iCanuZy-GEPVzBP4q0qzMR4HDUzD59ZrZ2mGOEasm8h4mAuXih-5HIcAaiAINlE_SvX60bh2Tbjt6xpVOnrbEpM0PP9ZS4zS4CZ_kHXzE9Ao78w7w_eEfMi0xVchF2on1FZs5YJ22Vogcl7idtHsaLMpy7UUPOMVciyzupYF1kdRlVwV22Khd8gxzzLMsrzguF3xoqSyVJXiKLFAyXJOrdRmZcy5XVl3WmjvB9pl2Tpb84WRRzI-XeAQW-0r2_baJEEYxhnJECOSif2XT5__mK4r0VgcFm4XjfFm4VnOjfbBvyYKOhja3V0X3GsrzJVcDM7sfnINSwzGv2Xv7Lc0-57TRtJVbNzLeYf_BAAA__9eFica">