<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/56518>56518</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[RISCV] Missed oppurtunity in memory overlap check idiom
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:RISC-V,
performance
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
preames
</td>
</tr>
</table>
<pre>
The LoopVectorizer will emit a memory overlap check of the form:
```
define i1 @reduced(ptr %c, ptr %a, ptr %b) {
entry:
%b14 = ptrtoint ptr %b to i64
%a13 = ptrtoint ptr %a to i64
%c12 = ptrtoint ptr %c to i64
%vscale = call i64 @llvm.vscale.i64()
%sub2 = sub i64 %c12, %a13
%diff.check = icmp ult i64 %sub2, %vscale
%sub3 = sub i64 %c12, %b14
%diff.check15 = icmp ult i64 %sub3, %vscale
%conflict.rdx = or i1 %diff.check, %diff.check15
ret i1 %conflict.rdx
}
declare i64 @llvm.vscale.i64()
declare void @foo()
declare void @bar()
```
`./llc -march=riscv64 -mattr=+v,+m,+zba,+zbb < vector_overlap.ll -O3` currently results in:
```
csrr a3, vlenb
srli a3, a3, 3
sub a1, a0, a1
sltu a1, a1, a3
sub a0, a0, a2
sltu a0, a0, a3
or a0, a1, a0
ret
```
Unless I'm missing something, we should be able to rewrite this as:
```
csrr a3, vlenb
srli a3, a3, 3
sub a1, a0, a1
sub a0, a0, a2
minu a0, a0, a1
sltu a0, a0, a3
ret
```
This does require zbb, but the command line above explicitly includes that.
Separately, there appears to be something weird going on block placement and branch inversion. Compare the following inputs and outputs:
```
define void @test(ptr %c, ptr %a, ptr %b) {
entry:
%b14 = ptrtoint ptr %b to i64
%a13 = ptrtoint ptr %a to i64
%c12 = ptrtoint ptr %c to i64
%vscale = call i64 @llvm.vscale.i64()
%sub2 = sub i64 %c12, %a13
%diff.check = icmp ult i64 %sub2, %vscale
%sub3 = sub i64 %c12, %b14
%diff.check15 = icmp ult i64 %sub3, %vscale
%conflict.rdx = or i1 %diff.check, %diff.check15
br i1 %conflict.rdx, label %taken, label %untaken
taken:
call void @foo()
ret void
untaken:
call void @bar()
ret void
}
define void @test2(ptr %c, ptr %a, ptr %b) {
entry:
%b14 = ptrtoint ptr %b to i64
%a13 = ptrtoint ptr %a to i64
%c12 = ptrtoint ptr %c to i64
%vscale = call i64 @llvm.vscale.i64()
%sub2 = sub i64 %c12, %a13
%diff.check = icmp ult i64 %sub2, %vscale
%sub3 = sub i64 %c12, %b14
%diff.check15 = icmp ult i64 %sub3, %vscale
%conflict.rdx = or i1 %diff.check, %diff.check15
br i1 %conflict.rdx, label %taken, label %untaken
untaken:
call void @bar()
ret void
taken:
call void @foo()
ret void
}
declare i64 @llvm.vscale.i64()
declare void @foo()
declare void @bar()
```
Produces:
```
est: # @test
.cfi_startproc
# %bb.0: # %entry
addi sp, sp, -16
.cfi_def_cfa_offset 16
sd ra, 8(sp) # 8-byte Folded Spill
.cfi_offset ra, -8
csrr a3, vlenb
srli a3, a3, 3
sub a1, a0, a1
sltu a1, a1, a3
xori a1, a1, 1 # <-- huh?
sub a0, a0, a2
sltu a0, a0, a3
xori a0, a0, 1 # <-- huh?
and a0, a1, a0 # <-- huh?
bnez a0, .LBB0_2
# %bb.1: # %taken
call foo@plt
ld ra, 8(sp) # 8-byte Folded Reload
addi sp, sp, 16
ret
.LBB0_2: # %untaken
call bar@plt
ld ra, 8(sp) # 8-byte Folded Reload
addi sp, sp, 16
ret
.Lfunc_end0:
.size test, .Lfunc_end0-test
.cfi_endproc
# -- End function
.globl test2 # -- Begin function test2
.p2align 2
.type test2,@function
test2: # @test2
.cfi_startproc
# %bb.0: # %entry
addi sp, sp, -16
.cfi_def_cfa_offset 16
sd ra, 8(sp) # 8-byte Folded Spill
.cfi_offset ra, -8
csrr a3, vlenb
srli a3, a3, 3
sub a1, a0, a1
sltu a1, a1, a3
sub a0, a0, a2
sltu a0, a0, a3
or a0, a1, a0
beqz a0, .LBB1_2
# %bb.1: # %taken
call foo@plt
ld ra, 8(sp) # 8-byte Folded Reload
addi sp, sp, 16
ret
.LBB1_2: # %untaken
call bar@plt
ld ra, 8(sp) # 8-byte Folded Reload
addi sp, sp, 16
ret
```
I believe the second is a separate issue. I'm filing them together only because I'm not sure if this is related to the first one somehow. We can split it into its own bug if it turns out not.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJztGNtu2zb0a-QXwoJExbcHPzTJAhTosKHduseAIimLK0WqJOXU-fqdo4st23EaoMXWFTEEiea58RyeG5lbsVv_UUryztr6o-TBOvUoHXlQWhNZqUAYqWRl3Y7YrXSa1YSXkn8itiAByArrqih7EyW3UfImmif90_4VslBGEpWS6CpxUjRciogu6-BIRGc8ojekH7PROI_oikSL646HNMHt9gJIi5BekSi7RfxglQl7QhIsUfOrESpLsydR2TkqT-mTqPwcdes507LFhoFGKGqo9baKO1iMBHQJmozIfJN3ImDQ0bRSUfVuqSNcoYoi7gyNFIpXNWl0GMiQVU_XCTwWk10WA8Z7Ukw6uyQouySIW1NoxUPsxJeW2Lp2r8dse9qxnIGDk6FHHzPqHWlx2w96R-KaOfkCQw-YW6sEohbWPgfOmRuDT_wXBnFE77TmZFoxx0tQ0inPt7AMmAjBwUREr7egJXyq7vOYs2GQg1luyLaNqvs-fGJwmOlvGfAmvHEO_FvvwBYejO6JMpdiKUpW3DsHH9ZuyFZLk-9B3mm1B3Xv7ABscoSlLSxp3-kBqENzgKY9_QlpMiKlZ6Rj6IHUugNsED3AYOvH23ui6p9GS-_J24guKlIp75XZEG8rGUoYIasHSXxpGy1ILgnLIRYhRp18cCrAsFSeMH8w5L9qzucMVinTkCPY-T48bcyDwY7V-AN1FVZ60P5zo8CzwemQNm9Cm565rSpmBNGYiFkOPkjklxqCTaHbKcN1I4A6lCzEENQjg32QNXMsSL1DfsALmLO6lsx5tDYYfr8lsB_KCbKxOLaG5NpC3qo147IC_ya4gNwxw0uQCFHglTUg7cZWNQZjV0a0tg9Ir0zdQCQgjW0Cji-HxKjIDBEdpA-vNea1xnQccvdUiQF8zXKpERDYJ2mOZhrTzY18rJs4OEi7IxdKTFfYEDjmMDC9wOOkDp3xOC2HZy5PX33-1ee_q89_o8d-e9z8GC1g-_7dWTy8XKxEWHMy0OCrv4hm-yI11PaYF-reB-ZC7SzvpxEPnCqPkxcw7rG7kB7YMiGwh_E1bnT3nqbzY6mQSO55we5tUXgw_gjsBXYdbfoA2yyRfvWM9OU030HrdWe1kIJ8qOHweCypl9BxnC7_y572C5xwT6BpZ8LsZjolZQNt_t336IAHQSPoM4Kg4TlrmMll9NzIxz1-_O76Ormnp76Tvtx3jsJ_hXEKH4ycq6TWB1_V3-IW76W2TDznoCMH3He8g24vVuUklw3KYJz_AMoUjeH30ohkdD5ZxV49QgVYdc0r7ucebXqeK2D6kCm-ZpSxBuBFv0BbjbwDtOAHrhttc93Lp19nci03yuzZkK7_2DOrKdNqA9xXo8mwqwcFsWJiWj5aRQd4SQ4dpVB6bJfXHPp_vxfI5efjpJb-xEkt_VmS2kkz9JZAU6nktjvYewltqCB4JQPj7koB_vlGxqS_4imUxoM_YFdwJthIvGsg1ugdMOKs8bLHMzZA7409YNFd8ii89tDAUOBZor1GUM4HoO0uJ0r7AEL-ktB3Gli_VtB-w2Pw4BE8sQ-G5M0G2cF0aJzxeOWAcoBsItaZWGUrNgkqaLmOZtfv3364-RjNbsmvoAAItXXduNAYFfAq5elrciWUrSaN0-syhLrtIOkdPBsVyiaPua3aG8bt8JlCAvtbcigDd62VPAxm81m6nJTrfJ5xMSuuZEYzMZuvGJ0vFsn8SiwKIZbiatL28x6XGlGaMw4eI0Airnv6Eaa6UwOtpcNre2a4xMnZ7UStaUJpskizdEXTdBWv-GyRMLGEY00xy2kBXiYrpnTcNt7WbSZu3S4XDOixH1c--AOQeQ8lQLZWQ_6sCaV169pJVkk_aTVbt2r9A-tlogk">