<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/63855>63855</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[SLP][RISCV] Sub-optimal IR for mul(S, step_vector())
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:RISC-V,
llvm:SLPVectorizer
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
preames
</td>
</tr>
</table>
<pre>
Noticed some odd SLP behavior around (non-constant) strided indexing for gathers on RISCV. In the case below, we're accessing every 'S'th element. While SLP does manage to vectorize this, it does so in a far from ideal way. First, it appears to not know that shl and mul are interchangeable in this case. Second, it appears to not realize that 0 and %5 can be represented by mul of %5 by 0 and 1 respectively.
I know that SLP has the alternative opcode mechanism, but this is one of the areas of SLP internals I've always found the most confusing. Not sure if this is a small bug somewhere, or a huge architectural lift. Guidance welcomed. :)
`clang -march=rv64gcv -target riscv64-unknown-linux-gnu -S -O3 example.c`
`cat example.c`
```
typedef signed char int8_t;
// Signed subtract which is not UB on overflow
inline int8_t sub(int8_t a, int8_t b) {
return (int8_t)((int)a - (int)b);
}
int sum_of_absolute_diff_8(int8_t* restrict a, int8_t* restrict b, int S) {
int sum = 0;
for (unsigned i = 0; i < 8; i++)
sum += sub(0, b[i*S]);
return sum;
}
```
```
*** IR Dump After SLPVectorizerPass on sum_of_absolute_diff_8 ***
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read) uwtable vscale_range(2,1024)
define dso_local signext i32 @sum_of_absolute_diff_8(ptr noalias nocapture noundef readnone %a, ptr noalias nocapture noundef readonly %b, i32 noundef signext %S) local_unnamed_addr #0 {
entry:
%idxprom.1 = zext i32 %S to i64
%arrayidx.1 = getelementptr inbounds i8, ptr %b, i64 %idxprom.1
%0 = insertelement <4 x i32> poison, i32 %S, i32 0
%1 = shufflevector <4 x i32> %0, <4 x i32> poison, <4 x i32> zeroinitializer
%2 = shl <4 x i32> %1, <i32 1, i32 3, i32 2, i32 5>
%3 = mul <4 x i32> %1, <i32 1, i32 3, i32 2, i32 5>
%4 = shufflevector <4 x i32> %2, <4 x i32> %3, <4 x i32> <i32 0, i32 5, i32 2, i32 7>
%mul.6 = mul i32 %S, 6
%idxprom.6 = zext i32 %mul.6 to i64
%arrayidx.6 = getelementptr inbounds i8, ptr %b, i64 %idxprom.6
%mul.7 = mul i32 %S, 7
%idxprom.7 = zext i32 %mul.7 to i64
%arrayidx.7 = getelementptr inbounds i8, ptr %b, i64 %idxprom.7
%5 = load i8, ptr %b, align 1, !tbaa !7
%6 = load i8, ptr %arrayidx.1, align 1, !tbaa !7
%7 = zext <4 x i32> %4 to <4 x i64>
%8 = insertelement <4 x ptr> poison, ptr %b, i32 0
%9 = shufflevector <4 x ptr> %8, <4 x ptr> poison, <4 x i32> zeroinitializer
%10 = getelementptr i8, <4 x ptr> %9, <4 x i64> %7
%11 = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %10, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> poison), !tbaa !7
%12 = load i8, ptr %arrayidx.6, align 1, !tbaa !7
%13 = load i8, ptr %arrayidx.7, align 1, !tbaa !7
%14 = insertelement <8 x i8> poison, i8 %6, i32 0
%15 = insertelement <8 x i8> %14, i8 %5, i32 1
%16 = shufflevector <4 x i8> %11, <4 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
%17 = shufflevector <8 x i8> %15, <8 x i8> %16, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 10, i32 11, i32 poison, i32 poison>
%18 = insertelement <8 x i8> %17, i8 %12, i32 6
%19 = insertelement <8 x i8> %18, i8 %13, i32 7
%20 = sub <8 x i8> zeroinitializer, %19
%21 = sext <8 x i8> %20 to <8 x i32>
%22 = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %21)
ret i32 %22
}
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysWF1v4zoO_TXqC5HAlmPHeehDP6aLAhd3BxPs7GMh23SsHVkyJDlp5tcvKDuJ3aYzvbgXCFrZEQ8PqUNKinBO7jTiLUvvWfp4I3rfGHvbWRQtupvCVMfbP42XJVbgTItgqgq2f3yFAhuxl8aCsKbXFTCea6MXpdHOC-0Z34DzVlZYgdQVvkq9g9pY2AnfoHVgNHx73j58XwI8a_ANQikcQoHKHBh_gAMyvrYIoizRObLGPdojML7eMr72DaDCFrVfAvy3kQoDq8qgg1ZosUPwBvZYemPlTwTfSEew0g9znAGpQUAtLNTWtCArFAoO4rgEeJLW-XG26DoU1hGaNh5-aHMA3wgPrlEgdAVtr0BYBKk92rIReoeiUPQcnIawlrDF0ujqOqZFoQaOwkMUQBlPUyiFhgLBYmfRofZYQXEM_kw9zCiO4_wYLLoOSy_3qCgEFj2y6G74-zxhTUlqhAsJF8qj1YJswHSlqRBapAika4lp0fshBEnLheQ1mFkUjh4IK0SthXLwzPh6T6AHcXRQB1HQ9NY4D6XRdU-ruAT403hwPWWsPsMLcK1QCop-F2R2aNAicSCBQdPvyG3ZSI-l761QoGRNK_-vXlZClwgHVKVpsaLQkzvGN9MEsCwqldA7WLSEwpJHu89Wu3IPCy_sDj1Y6cp9tlr0mlKlF0rq_nWx0z0strD4dwL4KtpO4bJkWXQBFf76F-MnPPpjhxXWEOqsgrIRlrKWv3iW3M9Y8ifGn2A7zHN94a0oPRwaWTaUI5LKf-6pcMwebU11EuykVlLjiEl2jOfjgwiCG8YFlSRbjy7Bou-thvNcShnPh0fGNwIWcH4g0wvZ9eOUtdTks30x9YsonFG9x5dK1vVLPoEmd9QNyhml2ftifA_bOc8RH1jyCBFL7kdlQ2gmjOe9HhMrL1No-AB5GDJ-Hz6bkx0McPyepg_ZioLaWXovGb_bsvRxGi-cUuX69koS5ot9_SUfP_D8DR77toO72qOl-vl-alD2q3ChKV7PJZwxRsjkHp56XXppNNx5bx1L7qDtne-s2Vl0JJfaIoI2FsveOhq5oy5Bm14fpK7gIJUaQ2uxNfbIeC7srsWWsCyKilaiP_jQzvauFApfLPU3xnPO-EMc8dU5sRXWpMLKmRdlSqEGvb96kAkHtoo-1EjnLWgjlBTEuRSdp95ALKlqiIam5sN4GqTz--lGK9om0kFRCT9_eWLEeBpEFni-9FqLFqsXUVUkqCS6iA-1t0dqJ6MQGE9l9dpZ0y7joLaf5wB5uqV2LrPVOJn4WiuOsnodJ-_Qj1sWxSB1QbQcyPwU1plztpq5mriPApLUDu0JjLS-gleiwZIv0BnpjD6FPoQ6jKMJzsDINX1dKxx2yTc45IwsP0Kfv_-J1kgtvQw7mZ144qMn9R4_HnGIXHximZwG_DRIWfJlApgEQNoF_yHA1Wdywd_HTFyuvB3cRxdnb92v5-7bXi2zc0zTRcuu6C57p7vBfqa9mfiyvyW-7A3T9VWm6ytM11eZrn_BdP23mE45pAFJGVFdsRNK7vSgD8ZjXwhB_6fm2Qfml4L-HM4kBe-ks6JEnN5mq7km8o_LvPN2XoizlLwp883H0h5xyNlExO_QP1PmVHnRtbW7gkykprgh8pCraXca2lNJJ8LTxDzMW0VK7dtlK9wPrJbDXWK5X8l8uV91ER1g3nqLz5UYT_3Gp1qNwds-HDV_PUy-TM3zaZY2v5BAzH-npexzWoqT3wGtPwm0ui6u_F1cDyDzUA5Xd5D0dzDB1wUlvazDBCX7RfM9o8Qfpv7h7PJ6943fdd_zdjDfKD_xNKvQeH2d-Dz8dMrw_Db7C7zz02Bz_uoyKf4LfD9oKXNm68tyxeeMTbeAePMZmHwCk1y2vcmhYGgXri_mxm_7S1BxGm-mpuPRZWyqM8c8GpvqJbNTS37pKuOhNDSTYfGWFqu-xKWoquU-J-PQTPL5KSCeXCQsnnc2zj-6FtxUt0m1STbiBm_jLN_wzYqvk5vmlq_KhCebmFfRqk6KOOGViOpUYMyzJI_TG3nLI55E63gVRXyVJMs8LRA3myzOeVzluWCrCFsh1TJEYezuRjrX422W5Gl6o0SByoWfdTgvRPkDdcWSu2_P24fFd8ZpdRnnZMqSu9lNhL5MH2_sLX25KPqdo0xJ593Fk5dehd-Mtn98pftSeh9-zGHpI2z7YmE6L1uh6L5D17S2V4zn4aTgPHYvQ8bDdXPD-Oamt-q28b5z4epOt-Cd9E1fLEvTMv4USA7_Fp01_8PSM_4UgnWMP4V4_x8AAP__SF1lIw">