<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/109725>109725</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[SLP] Improvement of reordering for consts, splats and ops breaks vectorization of XOR instructions
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
alexey-bataev
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
ivankelarev
</td>
</tr>
</table>
<pre>
It appears that #87091 change partially breaks vectorization of XOR instructions for this code:
```
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
define i32 @a() #0 {
br label %1
1:
%2 = phi i8 [ 0, %0 ], [ %40, %1 ]
%3 = phi i8 [ 0, %0 ], [ %28, %1 ]
%4 = phi i8 [ 0, %0 ], [ %16, %1 ]
%5 = phi i8 [ 0, %0 ], [ %6, %1 ]
%6 = load i8, ptr null, align 4
%7 = xor i8 %6, %3
%8 = xor i8 %7, %4
%9 = xor i8 %8, %5
store i8 %9, ptr null, align 4
%10 = xor i8 %6, %2
%11 = xor i8 %10, %5
%12 = add i64 0, 1
%13 = getelementptr i8, ptr null, i64 %12
store i8 %11, ptr %13, align 1
%14 = add i64 0, 1
%15 = getelementptr i8, ptr null, i64 %14
%16 = load i8, ptr %15, align 1
%17 = xor i8 %16, %2
%18 = xor i8 %17, %3
%19 = xor i8 %18, %4
%20 = add i64 0, 2
%21 = getelementptr i8, ptr null, i64 %20
store i8 %19, ptr %21, align 2
%22 = xor i8 %16, %6
%23 = xor i8 %22, %4
%24 = add i64 0, 3
%25 = getelementptr i8, ptr null, i64 %24
store i8 %23, ptr %25, align 1
%26 = add i64 0, 2
%27 = getelementptr i8, ptr null, i64 %26
%28 = load i8, ptr %27, align 2
%29 = xor i8 %28, %6
%30 = xor i8 %29, %2
%31 = xor i8 %30, %3
%32 = add i64 0, 4
%33 = getelementptr i8, ptr null, i64 %32
store i8 %31, ptr %33, align 4
%34 = xor i8 %28, %16
%35 = xor i8 %34, %3
%36 = add i64 0, 5
%37 = getelementptr i8, ptr null, i64 %36
store i8 %35, ptr %37, align 1
%38 = add i64 0, 3
%39 = getelementptr i8, ptr null, i64 %38
%40 = load i8, ptr %39, align 1
%41 = xor i8 %40, %16
%42 = xor i8 %41, %6
%43 = xor i8 %42, %2
%44 = add i64 0, 6
%45 = getelementptr i8, ptr null, i64 %44
store i8 %43, ptr %45, align 2
%46 = xor i8 %40, %28
%47 = xor i8 %46, %2
%48 = add i64 0, 7
%49 = getelementptr i8, ptr null, i64 %48
store i8 %47, ptr %49, align 1
br label %1
}
attributes #0 = { "target-cpu"="core-avx2" }
```
Before the change all the XOR instructions were vectorized:
```
...
%5 = load <4 x i8>, ptr null, align 4
%6 = shufflevector <4 x i8> %5, <4 x i8> poison, <4 x i32> <i32 poison, i32 poison, i32 0, i32 1>
%7 = shufflevector <2 x i8> %3, <2 x i8> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
%8 = shufflevector <4 x i8> %6, <4 x i8> %7, <4 x i32> <i32 4, i32 5, i32 2, i32 3>
%9 = xor <4 x i8> %5, %8
...
```
Now 4 out of 20 XOR instructions are not vectorized:
```
...
%6 = load <4 x i8>, ptr null, align 4
%7 = extractelement <4 x i8> %6, i32 3
%8 = extractelement <4 x i8> %6, i32 2
%9 = extractelement <4 x i8> %6, i32 0
%10 = xor i8 %9, %3
%11 = extractelement <4 x i8> %6, i32 1
%12 = xor i8 %11, %2
%13 = xor i8 %8, %9
%14 = xor i8 %7, %11
...
```
This leads to a significant performance degradations in one of our benchmarks. @alexey-bataev, could you please take a look if the vectorizations can be restored for this code?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJycWE2P66wV_jVkc5TIHPDXIouZO-9Ir1S1VW8X3VXExgkdx1iAMzP99a-w44ntkDvJlSIHzOHwnC94jLBW7RsptyR-Joiilh_yc70TTsgTQSTxy0p07qDNVp1E8yZrYeRptdPl5_ZPB6JtpTAW3EE4IMiyNMopFAfR7CW0wjgl6voTdkaKNwsnWTht1P-FU7oBXcF__vEvUI11piv8KwuVNuAOykKhS0nYE4leSDQ-k-j867tOmL10UAonavGpOweEvQBBlOsjYU9y3WIaEfbEsH_4Lp13fSvh_WOtvhoUM8KeKGbrKovOraZ_lYzTE77-6cUQZ1CcUW0tRxgfWfLfhK-75q3R7826Vk33sd433des4VnKSjUSFEMgPBLEq829KyMg6fMgA7AzUIudrIFgTKfT6ZeTwI9hv3p7UKAyIPEzRAR_-IEISPzSt-Nn3-fjAO0HLhrYvRo80KAGfq8GmtzQEN-r4ZaCpFdQa1GC6mG2zkDT1bVvi1rtG-AT8bQX_9CmX--ilk1ksoVMepaZ6skXMqOH4lHGOm3keTD_HheNbgDDqRBdCNFouax_OWSGKEtQCR8cSqfjQ9z30slaHmXjPLJr3_m5vbaQRZSO4r3Ci02zhfg3QOKHgMzcFQx7r_MGlmXgadDBy9DTNJAfdBl8mgUyBKNr66drIX3EeoyCYcgnpiO9mD5bCG-Znkyl2EIKMWRUIKRT1-BDIUUeMgrZ1Kgb8cTkG--mDwGZeSK7kVuY3nDwMh--9supWrascMwDGciWJc6iQAayQIlPw8QeKnEWLHE2LXHGwtsW47dMpzPb46VVPGRVIKbTjY09FFOWBK2Kp1al4eRi2a-znOUPAcmmJ2Z0I7lYHsbClwlxOc6nHubLIuc0kIN8WeQcAznIA0U-0_JQkfNgkfNpkfM4XFg8uWU7zny63Nt5aG_ngaCm0_GHgsqzoFXp1KpQREPsLn2Z0jzhnFG7zkl7poaeY6aeA-FAPtdF2zNL9kIQC23kWpw-kPht_kvTnD8Pz2dZeaTuIEfWLuq6715x83dp5BeFl-Wv2flms7kidH1-E_aDw4f3IfvjewY0xNoeuqqq5bD2TMNAc3xcpy9braxupq89bf_D9zzPvgxf96KxQT2-JUW8AoJTIOy8It4LZLJYEMy5NwOS3eGR5MojX4Q1BIOP68VjA8cGmy9-OdPCQfCUdx7-YNL9Xb8DB__JpivA6DrVhJHQaPeb2Zb8brYNQZYfzohiLPiwcwffLKNy70y8cum9M6NffR3kIVpKH9I_bkmXL4YJQ6Qhcrw8PMajPr_m_NcfT5Teky7_PigLtRSlBadBgFX7RlWqEI2DVppKm6NoCgml3BtRiiGFVAO6kT7BdGdgJ5vicBTmzW76z-z5LccPKHRXl_CpO2hrKawEJ94kCKi1fgNV9Rvi7PbCQiEa2Ekwst_ry-XVxeuAfVVuWZmzXKzklqaYUkZZHq8O2zRjvOJllFY03yEtEoEsy_Oy5DlNkkKu1BYj5FGODJEzGm_krsiKjFHkcscrSgmP5FGoelPXp-NGm_1KWdvJLY3yFONVf6LY87VOI9-hHz1f6Zitn7TedXtLeFQr6-xFjVOu7u-Dfv7tnyR-gT-PrdGnIXN0BUZqU0qjmn1vc6Eb66x3om1r4SyIpgTd2rsvfVadqbcH51rrixxfCb7ulTt0u02hjwRfPa7z37o1-n-ycARfe2sswdezuact_hUAAP__0oXbAQ">