<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/57210>57210</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[RISC-V] Auto-Vectorization Generates Epilogue Scalar Code Rather Than Stripmine
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
lidawei1226
</td>
</tr>
</table>
<pre>
A simple add loop:
```
void add(int n, int* restrict x, int* restrict y)
{
for (int i = 0; i < n; i++) {
x[i] = y[i] + 10;
}
}
```
compiled with command:
`clang --gcc-toolchain=/opt/riscv -march=rv64gcv -menable-experimental-extensions -mllvm --riscv-v-vector-bits-min=128 -O3 -S loop.c`
generates the following assembly:
```
.globl add # -- Begin function add
.p2align 1
.type add,@function
add: # @add
# %bb.0:
blez a0, .LBB0_8
# %bb.1:
li a3, 8
bgeu a0, a3, .LBB0_3
# %bb.2:
li a6, 0
j .LBB0_6
.LBB0_3:
andi a6, a0, -8
vsetivli zero, 4, e32, m1, ta, mu
mv a4, a6
mv a5, a1
mv a3, a2
.LBB0_4: # =>This Inner Loop Header: Depth=1
addi a7, a3, 16
vle32.v v8, (a3)
vle32.v v9, (a7)
addi a7, a5, 16
vadd.vi v8, v8, 10
vadd.vi v9, v9, 10
vse32.v v8, (a5)
vse32.v v9, (a7)
addi a3, a3, 32
addi a4, a4, -8
addi a5, a5, 32
bnez a4, .LBB0_4
# %bb.5:
beq a6, a0, .LBB0_8
.LBB0_6:
slli a3, a6, 2
add a1, a1, a3
add a2, a2, a3
sub a0, a0, a6
.LBB0_7: # =>This Inner Loop Header: Depth=1
lw a3, 0(a2)
addiw a3, a3, 10
sw a3, 0(a1)
addi a1, a1, 4
addi a0, a0, -1
addi a2, a2, 4
bnez a0, .LBB0_7
.LBB0_8:
ret
.Lfunc_end0:
.size add, .Lfunc_end0-add
# -- End function
```
where .LBB0_7 denotes the scalar epilogue code that slows down the code and increases code size at the same time (also we can see interleaving did not utilize LMUL).
>From my perspective, the optimal code with stripmine approach should be something like this:
```
.LBB0_4:
# a4: n
# a7: current vlen in this loop
# m8: LMUL=8
# ta: tail agnostic
# mu: mask undisturbed
vsetivli a7, a4, e32, m8, ta, mu
vle32.v v8, (a3)
sub a4, a4, a7
slli a7, a7, 2
add a3, a3, a7
vadd.vi v8, v8, 10
vse32.v v8, (a5)
add a5, a5, a7
bnez a4,.LBB1_4
```
What is missing in the vetorizer?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy1Fttuozj0a8iLRQSGBHjIQ9O0uyN1tdJ0ZvZxZOCUeMZgFhvS9uv32IRwSWbUlyWI2D53n2sq87fdHVG8rAUQludESFk7wZ3jHRzvztl659duO8lzg-TQmFeaVA69J7hw6B1pQOmGZ5q83jp8c2hy5hjt-wXB50U25MyLEyc4EM8J9nZ5j8zN0qF7-yZkRmieV2ez587mYAnfLhu6J75hMyI70WGQfbhpVibLmgvIyYnrI8Fdyap8egeZYFVBXLfIMldLKbIj46jfwaGPskZLHxuuso64JWuyI5433TYs7AFULBXgwmsNDcedZgI3GirFZaUQQYiuRM6WgYs_yLRs3JRr5ZZWhk9j4v4dEPfZumadXdQuoIKGaVBEHwHvUgh54qgnUwrKVLz9youOl6wLIVOBC-Px3z0ORcku2UPBK_LSVplGvW0IXFjVlAleVLj0x0P9VkPPHsPBCb2BtMcwx8HdbwVP5CP5KNDs6SZN195onpfgHb8bcZ4JvvXTfu99j5cE_pRAcIMeGPR45FJAe-HSA3tewZIXvea1NejjDf8wt2CJt_3ZwGlCiFE2kvZS3VGbToHmnWX-Do000NB8IKDmr_TNVzO7bi9UZWc4WkS2XZxu7Km_OLV2MjrVMvyAd-xlYAoED1-OXJFPFQYjecIIJX8Cy6ExLA5Qa5MPo0h0pDU5Gq_YH9XsBBq3Nlp1sQFhbTA4yS2EZECIpghz_pslfwSvO37h33997xaC5d9_pwjqWsPNTEP1QQ2D8QYCegXtHRguQmKAbkbrJrRp1SdBOAZuuAzczSxr4N9F-M1SZwjfCYUSk8TpCWe6m69_jrLevAWQnoNtDlRtOuadN43dXofof4tHcbqY4xlX0aWrTgtfTWJBLWj9W26e3EZ4BZwY7F7nyOSuwisnT9wVTa8qnrqrAT3ATAH-DlU-K5trxd_HMk0mWO6l4n6kSg_3j53iocrJvNgvus_pCA0MmpMcKjl0MJUxwRoC2Itl0QI24hwQwDRR2NkUyeWpsogWgMUTx4ysAaaQ3h4ZawiiW2asRGJsuTYDhZLkhISsIgrAjCfQCGCdaZc5zjSoBGk1F4bB019fn9CT617dx0aWpHwj2L9Vjc2Zd2DrLorA1s9LJnrZdnQw006NbRu1qOtGsgyPjrIVOUlRI1mCPhqJgv80hnH1qw49q8JzN5hrZhZS3QDYRMnapsFJg2CxrNBUK6kf664IShMuvcXBIb6GY3tBuGZcEFZUUmme3WDSGqSSqZ-kxY6mdNuksIieoZmZ9bk4z5pZPG1mC5PPRb9f3-gLAx5WETItnCxawLF4DcKjSeka4GYWmqb6kv7cHCZ6zPvHxNYb-m6u9LXyJqV8Kc-kujHl4dHZ3zvxvYkK_1LS5xHzj0kTdHPJlTIhxvtM6QCnSQxqrH6PK9j5260XJaEfRat8F-RJkLCV5lrADsfnz5-e791vZoa-a7V0v9lJlL8zO_H9cZk1H4b0fO7T9d5E_2eG0hry5YgJ9jwkwaptxO6odW3jnD7iW2CWtOkaR2zcmNH3_OditvxAgbhFA1pQuNhE1PdWx12UQrxJ_JDSwIt8FmbBNgXvhb4kOYQhy1aCpSCUMcGhtIITsSxwjbas-I56lHqxH3tBSDfbdZKkL0HE4pTFcZr7OU6XUGKAr40ea9kUq2ZnVUrbQiFQYECrEYjDNQ67YG_M8GetPkqk4Dk7Afcp3a6s-J1V_z-sEIaZ">