<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/134272>134272</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[RISCV] `llvm::is_contained` is suboptimal compared to X86/AArch64
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:RISC-V,
llvm:SLPVectorizer,
missed-optimization,
llvm:transforms
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
wangpc-pp
</td>
</tr>
</table>
<pre>
I found this in https://github.com/llvm/llvm-project/pull/134057.
The code is https://godbolt.org/z/6Tbqac9jW and I paste it here:
```cpp
#include <initializer_list>
/// Returns true iff \p Element exists in \p Set. This overload takes \p Set as
/// an initializer list and is `constexpr`-friendly.
template <typename T, typename E>
constexpr bool is_contained(std::initializer_list<T> Set, const E &Element) {
// TODO: Use std::find when we switch to C++20.
for (const T &V : Set)
if (V == Element)
return true;
return false;
}
bool bar2(unsigned v){
return is_contained({
1, 3}, v);
}
bool bar4(unsigned v){
return is_contained({
1, 3, 5, 6}, v);
}
bool bar8(unsigned v){
return is_contained({
1, 3, 5, 6, 7, 8, 9, 10}, v);
}
bool bar16(unsigned v){
return is_contained({
1, 3, 5, 6, 7, 8, 9, 10,
5, 4, 8, 1, 3, 1, 2, 4}, v);
}
```
For RISC-V, it generates suboptimal instruction sequences, especially when the data size is small.
For example, when the data size is 2:
X86
```asm
bar2(unsigned int):
dec edi
test edi, -3
sete al
ret
```
AArch64
```asm
bar2(unsigned int):
cmp w0, #1
ccmp w0, #3, #4, ne
cset w0, eq
ret
```
The results are really simple.
But on RISC-V, it generates:
```asm
bar2(unsigned int):
addi sp, sp, -16
li a1, 1
li a3, 3
li a2, 4
sw a1, 8(sp)
sw a3, 12(sp)
addi a1, sp, 8
.LBB0_1:
lw a3, 0(a1)
beq a3, a0, .LBB0_3
mv a4, a2
addi a2, a2, -4
addi a1, a1, 4
bnez a4, .LBB0_1
.LBB0_3:
xor a0, a0, a3
seqz a0, a0
addi sp, sp, 16
ret
```
The count of instructions is enormous!
I compared the compiler log, there are two points that may cause this divergence:
1. The first point is when doing SLP. X86/AArch64 convert the array to vectors while RISC-V doesn't.
2. The second point is SROA. The SROA can't see the offset because of PHI instruction and then these `lifetime` intrinsics can't be removed:
```
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable vscale_range(2,1024)
define dso_local noundef zeroext i1 @_Z3barj(i32 noundef signext %v) local_unnamed_addr #0 {
entry:
%ref.tmp = alloca [2 x i32], align 4
call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %ref.tmp) #2
store i32 1, ptr %ref.tmp, align 4, !tbaa !9
%arrayinit.element = getelementptr inbounds nuw i8, ptr %ref.tmp, i64 4
store i32 3, ptr %arrayinit.element, align 4, !tbaa !9
br label %for.body.i
for.body.i: ; preds = %for.body.i, %entry
%__begin0.013.i.idx = phi i64 [ 0, %entry ], [ %__begin0.013.i.add, %for.body.i ]
%__begin0.013.i.ptr = getelementptr inbounds nuw i8, ptr %ref.tmp, i64 %__begin0.013.i.idx
%0 = load i32, ptr %__begin0.013.i.ptr, align 4, !tbaa !9
%cmp2.not.i = icmp eq i32 %0, %v
%__begin0.013.i.add = add nuw nsw i64 %__begin0.013.i.idx, 4
%cmp.not.not.i = icmp eq i64 %__begin0.013.i.add, 8
%or.cond = select i1 %cmp2.not.i, i1 true, i1 %cmp.not.not.i
br i1 %or.cond, label %_Z12is_containedIijEbSt16initializer_listIT_ERKT0_.exit, label %for.body.i
_Z12is_containedIijEbSt16initializer_listIT_ERKT0_.exit: ; preds = %for.body.i
call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %ref.tmp) #2
ret i1 %cmp2.not.i
}
SROA function: _Z3barj
SROA alloca: %ref.tmp = alloca [2 x i32], align 4
Rewriting FCA loads and stores...
Can't analyze slices for alloca: %ref.tmp = alloca [2 x i32], align 4
A pointer to this alloca escaped by:
%0 = load i32, ptr %__begin0.013.i.ptr, align 4, !tbaa !9
```
----
I think this code pattern is really common in C++ code, but currently the RISC-V compiler can't compile it to the best binary code. I don't know which part I should focus on:
1. Let SLP kick in earlier?
2. Fix the SROA via SCEV? (I don't even know if it is by-design...)
I will appreciate it if someone can give me some suggestions, or, fix this issue!
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy8WEtv4zgS_jXMpWBBovw85OA4MTbYBmbQyfYu5mJQUslmhyIVkrLj_PpFkfIr7ensNAYbBJIs1ruKVZ8onJNrjXjLRndsdH8jOr8x9nYn9LotB217U5hqf_sItel0BX4jHUgNG-9bx_I540vGl2vpN12RlKZhfKnU9nAbtNZ8x9Izvmw7pRhfZvkwHU0Sls6fNwilqRCk-yjNVIVRPjF2zfjynfHl-Ll4FeXs-79B6AoeoRXOI0gPG7RIfOmcjdP4X7Yt_eS51KXqKgSWL6SWXgol39GulHSe5Q9Ek_YKGV_CV_Sd1Q687RBkXQMbLVp4UNig9oBv0vngeHj9hD6BZwqF2aJVRlTgxQu64yoIdyFdaDizAciG4Ip0QCYb7Ty-tZaN00FtJepK7SlGHptWCR988PsWtWgQnhlfwPHXQ_TlKAIKYxRItyqN9kJqrBifOl9RlPL5j4FYPLP8gUwmqUEKPADj495zxmfAJncsnQP0vjz_dv8by-fwL4dwFFxLXcFugxp2CG4nfbkBb2DB-B3jdzxNgoTaWGB8GtU8k5pvQKKC-lkgAZA10dDCPcvv4WRIvw5gQ65Cqlgebetf1UK5_h2b3Mcch4AUwnLGp50OxV7BluT1fh25P0TttE5_GQUoJ6l8Edmvqxn-LWr4AkZ0GX-ucPo3K-QLmNBlSpcZXbL0cyuy8f_DDL44YwmEwyPNSUJ44nH1Tww_dAuWzpfGwtfHp8XgG1FKD2vUaIVHB64rTOtlIxRI7bztSi-NBoevHeoSHTGga7GUQql9rH-_QaiEF-Dke2hurhFKJb0ifBNNq5AYr1Pz2M7-Mx2fmylcQ7H-UMQybovAcAhKhWW4YyXP3np0PrziCxjkZwsOPdJdqLOXFv2HGM3nttyMh79mUtm04b6j_AHjeXa-2K8eF_P-HhKr8ZzUoe_p8PWn5tJsseg65R0IS88hP05S7CkVd50Ho_8k7R8Gyv_uqKgqGYLaksh4HWSUSFCyJ8ligZ6Yjit5rN8rK30pn2Vt1wuivU96ZpeLZwIzfoXkYGgUEg2dsnSefLm7S1dZ75W6EJQyPiX6czkFvp5RiJCaKOPcj2bbywkpFfyaIbxfoogN_9TSeD1fLzS-94IPxh_cyC-T82ZslJiebBWXW-H1_YIgnV_NaEzoT2uvNJ32YOrzruFod6M2tjGdY5zsfITSNK2wWIU-QD-kInhg1mHGE7oJ9et3BlojtXfgN8JDI_ZQis5hxGOV3KJdU0eKLmcEThBqaZ2PfKQ79JvKSL2Gpy-_J0Athi_7nU2jf4vWB0OEtWJP43uLpTeWWKXCfrdAZdBpxieeNhKPqhyWRlcnXU9ff5vHFXqCUgQGcIhBgalr2ssFRidMDb__4_GixYqANWODdEggSckavWyQjVPafFZqJ0t3FF3QLm_MFqsP25ee8ztYdjpKnntvaYtD0znfWrO26BxoU1tE0MZi2VlHT26vS9Cm07uAbaRS_RxrsDF2z_hUG40EkLqdF4VC2LpSKFxZodfI-JSqOUv5MO6ZCmupESpnVsqUQgXRFdbwjtbgmweZARumqz_yQtjvjE9lzo80oem8eWB8RKMMgoRVpwkCVitRVYSr8rSHaqi93R-qn_GRxTrxTUuICoQiXmCjOw5vIHPORmFCCiXXut9apVAKtkZWZBDh-OQQ_MR5YX3SUjOQ42EcvK23oI3WnVJn2gJy5Hnc7c4bi6QtzmbiOKc8qQ-9P_OFEHQPvYbxUShIwq4J9oCcXFmj73-SPKkLCpYD3e1ATq-rIZuDjyeD8jPKH_R8YhoUFpQoMPhdG5vQt1IiIzI6-50f-8Unf1SprcXKBQcvhQb9o5jbPrGrVYFrqdMkzfJEJrJ6C3ztRgZP2egO0nM-6JNNCz-yi6rqiU9aA8d1bSFmv56Gq-YfVKVBcviwohI9CfnRhs9SxPiobFqeaOPJnfweJEEOfA3ZJ12909u-2H6MStw4VRV80m73E_sP4ylqDUqvKL7K3od_erDa2CQ0VeJ0qLCMLeLcnRDLLH4IxcePeg9VGpd6kUR7LNvVHxk_R-aP8vtD8eSz8cdvxcfn1cPXfz6nqwTfpL-Q8bH0f1VmPv_pDvikN6Gu_npnsnglrIfPhDC86n5ykHWH7twvxWYazP7rXfYr7qz0NI6Xi3modRfGXmhNLklovC766Sa0UPt3BKdkiS58R590_1KPn8dhjZbmfEARPRe6UrRYQbE_4ae_c0eeD-bBYDAIOMhvpH6JZoQzoVZ4j-GT8YDdS9M0RoPUhzOFQEjyi85D2VmL2qt9gBc9UDniqQNG6F8Q2A9OIxT0XVRILew-yEvgESoTqV-02RHyKTfQCuvhEdzGdKqC2pSdg1AQEWx9QU-QCl5k-UIWorBKomX5MiKkpXwL2kLNbKWAp8XDN5YvgfHpSR9uUUelsiYTpYNiP6iQZj9VQ0ARjwGIgGhbS1-d8RRM1uBMg0YjuQpruUVoMLwD163X6AIApWCZkB2og0WSMKmj3pHdVLd5Nctn4gZvs8kwH8-m-Wxys7kdZZngownyajrDNM-mozrNh-lUlOMhZtXkRt7ylI_SYZpn43w2miQ4Go5mOWZ1nqFIJ1M2TLERUiVhtxq7vglKb7N8yCf8JjQRF04fOS9E-YKaINzh04xKjXEeDhXz-dOX378FTEod5LjYSOewGoRPdfkuwmb9wOit0K42tnG0Mrq_sbfhgLLo1o4aiXTenQz00qtwIEpWfGOj-wBAoySWzy8OMgiOXpwUnDC9ucTYN51Vt3_54DQEy8WjU4rX9pb_NwAA__8rwWAt">