<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/62278>62278</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
SLP horizontal reduction misses potential identity operands
</td>
</tr>
<tr>
<th>Labels</th>
<td>
llvm:SLPVectorizer
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
bubba
</td>
</tr>
</table>
<pre>
Consider a horizontal sum of four shifts:
```llvm
define i32 @f(ptr %p) {
%p.1 = getelementptr i32, ptr %p, i32 1
%p.2 = getelementptr i32, ptr %p, i32 2
%p.3 = getelementptr i32, ptr %p, i32 3
%x.0 = load i32, ptr %p
%x.1 = load i32, ptr %p.1
%x.2 = load i32, ptr %p.2
%x.3 = load i32, ptr %p.3
%a.0 = shl i32 %x.0, 0
%a.1 = shl i32 %x.1, 4
%a.2 = shl i32 %x.2, 8
%a.3 = shl i32 %x.3, 12
%r.0 = add i32 %a.0, %a.1
%r.1 = add i32 %r.0, %a.2
%r.2 = add i32 %r.1, %a.3
ret i32 %r.2
}
```
SLP correctly identifies this as a horizontal reduction across the adds and vectorizes this:
```
opt -passes=slp-vectorizer -mtriple=riscv64 -mattr=+v -riscv-v-slp-max-vf=0 scratch/shl_id_slp.ll -S -o -
```
```llvm
define i32 @f(ptr %p) #0 {
%1 = load <4 x i32>, ptr %p, align 4
%2 = shl <4 x i32> %1, <i32 0, i32 4, i32 8, i32 12>
%3 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %2)
ret i32 %3
}
```
However if we remove the redundant `shl i32 %x.0, 0`, then SLP ends up vectorizing it differently:
```llvm
define i32 @g(ptr %p) {
%p.1 = getelementptr i32, ptr %p, i32 1
%p.2 = getelementptr i32, ptr %p, i32 2
%p.3 = getelementptr i32, ptr %p, i32 3
%x.0 = load i32, ptr %p
%x.1 = load i32, ptr %p.1
%x.2 = load i32, ptr %p.2
%x.3 = load i32, ptr %p.3
%a.1 = shl i32 %x.1, 4
%a.2 = shl i32 %x.2, 8
%a.3 = shl i32 %x.3, 12
%r.0 = add i32 %x.0, %a.1
%r.1 = add i32 %r.0, %a.2
%r.2 = add i32 %r.1, %a.3
ret i32 %r.2
}
```
```
opt -passes=slp-vectorizer -mtriple=riscv64 -mattr=+v -riscv-v-slp-max-vf=0 scratch/shl_id_slp.ll -S -o - -slp-threshold=-99
```
```llvm
define i32 @g(ptr %p) #0 {
%p.1 = getelementptr i32, ptr %p, i32 1
%p.2 = getelementptr i32, ptr %p, i32 2
%x.0 = load i32, ptr %p, align 4
%x.1 = load i32, ptr %p.1, align 4
%a.1 = shl i32 %x.1, 4
%1 = load <2 x i32>, ptr %p.2, align 4
%2 = shl <2 x i32> %1, <i32 8, i32 12>
%r.0 = add i32 %x.0, %a.1
%3 = extractelement <2 x i32> %2, i32 0
%r.1 = add i32 %r.0, %3
%4 = extractelement <2 x i32> %2, i32 1
%r.2 = add i32 %r.1, %4
ret i32 %r.2
}
```
It would be nice if SLP could treat `%x.0` in the second example as an implicit `shl i32 %x.0, 0` so that it doesn't need to do any shuffling
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzsV8Fu4zYQ_Rr6MpAhkbJsHXRo4hot0EOBBXpd0OLIYkGRAkk5Tr--IGUnVmwnzu5h97BAENn0e-SQnPdmxJ2TO41YkcUDWaxnfPCtsdV22G75bGvEc_VotJMCLXBojZX_Ge25Ajd0YBpozGDBtbLxjrDfSLom6el_kY5_Su27cUhgIzWCZBRInjaErnpvgdBFT2gJZPkwwiAOzTMgbA079KiwQ-0DVjJK6CO80h7jbNmESO8n0gmR3U9kZ8TDPI1EZbi4wJ_DspuweTYB0tvAU8QRx27j2PldxGn5MUzXqvEKYuSBlE5Q2SUqC6h8gqKXqBjAaoJilygWUBmdhEfowh6j40KcsHyMbgzqbFp7DPEMas-gdAKll9DsBfrmkCz6V9QpwuX6TT6fc7789TfUxlqsvXoGKVB72Uh04FvpgLupZiyKofbSaOC1NS6gMITmgGsBe6x9wB7Zt-Q0fjW9h6TnzqEjbO1Un7zQLSSdt7JXSNjaSlfvixySjntvCVsT-rCHJA4n-yQQO35I9g1h6xRcbbmvW0I3rlVfpfjqVD9XCpIvkBhI3jmIz6udsvSN5M_kQdhjDoeY1Oz3t_rjSu70JB1fk3FCjJPGy2aPIYz0pN789GH14iBxodcZx8StuVKnDYSNzcdDnseLxDkXYr7Po_JWFwtTQssricXuSas_zBPu0YJs4AnBYmf2GJMlLKwF1x5IkV4TchGfvkUNITdRCwdD_5JbUu9AehCyadCi9ur5Y9e-dpu7X979Hd79Teb909jy4ee35TeD8EPcEiLetxZda5QgbJ2U5fdY6IXophb6Q4T3rn6uevUHUrrKuS_7p_WDXq0foxo-qCD0ZgW5WS8-oZNRfHjwlten075clZ5WSu-V2Lmt5Z9bI7tXm_m3a_JPD09mUAK2CFrWGIrb2D6FQW-Rx6J2PLgiBaljyXNYGy0AD7zrFcaeSoPseiVr-U4ZBGfAt9zHcmfQaUKXHjSiAG9AGOD6GVw7NI2SejcTFRMlK_kMq6xY0XRZ5sVq1lZ8mbM6zYtFuazTtNzm2aJMt8uSL1YsK-p6JiuaUpbmNMtotsyLeV4ygbUQZbYSqwKXJE-x41LNYwNh7G4mnRuwKihdrmaKb1G5-PJFaRQ-C03lPy_2RCgNr2W2Cj8m22HnQi8inXevE3rpFVbhMK82m50Mxge98aE95erYp_pnMD1aroWbDVZVrfd97DrphtDNTvp22M5r0xG6iYGNj6S35l-sPaGbuA9H6CZu5f8AAAD__549zFc">