<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/58358>58358</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
SLP vectorizer can't peek through additions narrowed by instcombine
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
dotdash
</td>
</tr>
</table>
<pre>
When instcombine sees an addition of two sign-extended values and can prove that the addition won't overflow if done with the original with, it transforms the code to perform the addition first and only sign-extend the result. In an unrolled loop, this can trip up the SLP vectorizer, because it doesn't recognize the first two elements anymore.
This IR function adds four `i8` that have values between -1 and 32 (inclusive), so the addition of two values won't overflow, but adding all four might overflow so the addition is performed using `i16` values.
```llvm
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
define i16 @fun(i8* noundef %0) {
%2 = load i8, i8* %0, align 4, !range !0
%3 = sext i8 %2 to i16
%4 = getelementptr inbounds i8, i8* %0, i16 1
%5 = load i8, i8* %4, align 4, !range !0
%6 = sext i8 %5 to i16
%7 = add nsw i16 %3, %6
%8 = getelementptr inbounds i8, i8* %0, i16 2
%9 = load i8, i8* %8, align 4, !range !0
%10 = sext i8 %9 to i16
%11 = add nsw i16 %7, %10
%12 = getelementptr inbounds i8, i8* %0, i16 3
%13 = load i8, i8* %12, align 4, !range !0
%14 = sext i8 %13 to i16
%15 = add nsw i16 %11, %14
ret i16 %15
}
!0 = !{i8 -1, i8 33}
```
Using `opt -slp-vectorizer -S -mcpu=tigerlake` we get the desired result:
```llvmtarget datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
define i16 @fun(i8* noundef %0) #0 {
%2 = getelementptr inbounds i8, i8* %0, i16 1
%3 = getelementptr inbounds i8, i8* %0, i16 2
%4 = getelementptr inbounds i8, i8* %0, i16 3
%5 = bitcast i8* %0 to <4 x i8>*
%6 = load <4 x i8>, <4 x i8>* %5, align 4
%7 = sext <4 x i8> %6 to <4 x i16>
%8 = call i16 @llvm.vector.reduce.add.v4i16(<4 x i16> %7)
ret i16 %8
}
```
but with `opt -instcombine -slp-vectorizer -S -mcpu=tigerlake` we get:
```llvm
define i16 @fun(i8* noundef %0) #0 {
%2 = load i8, i8* %0, align 4, !range !0
%3 = getelementptr inbounds i8, i8* %0, i64 1
%4 = load i8, i8* %3, align 4, !range !0
%narrow = add nsw i8 %2, %4
%5 = sext i8 %narrow to i16
%6 = getelementptr inbounds i8, i8* %0, i64 2
%7 = load i8, i8* %6, align 4, !range !0
%8 = sext i8 %7 to i16
%9 = add nsw i16 %5, %8
%10 = getelementptr inbounds i8, i8* %0, i64 3
%11 = load i8, i8* %10, align 4, !range !0
%12 = sext i8 %11 to i16
%13 = add nsw i16 %9, %12
ret i16 %13
}
```
Here, `instcombine` has narrowed the first addition and the SLP vectorizer no longer considers the first two elements for vectorization.
This has been derived from https://www.reddit.com/r/rust/comments/y2qah5/why_does_the_compiler_partially_vectorize_my_code/ where the addition works on values from Rust enums which provide the `!range` metadata on the loads.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzdV0uP2zYQ_jXyhZCht6WDD8luigbooUha9GhQ0khiQ5MqSa3X_fUdUvKuZGsBZ5NTAVumqXl9w3lwSlmf9391IAgT2lTyWDIBRANoQgWhdc0Mk4LIhpiTJJq1wodnA6KGmjxRPji6mlRI3Cv5BMR01OADXnlPUnjRzhB8qxouT4Q1pJao5cRM50ilYi0TlLsdL3ogDEUoKnQj1VE7kkrWKFuSHmXg5lJDw5Q2zg4p-HlupaNToAdutuSzsJgGoSTnaD6XsrfKTMe0A2AU68nQO56vv_1OnqAyaNq_oCxZCRUdNFjbagl6xKSgkq1AEsc02mEdBRyOIIx1zvkoFWy94NELPozPP6zCz19IM4jK2Y9ANGnkoIiXBSzHx-jGjqJDJy-XYE6Ax-SHDmgcES_Kmaj4oNkTeFFhbdRy6Zjp2CYR1wfhUA3GkYuWUM5HI46s7WbHdS0UjZ9OAZ2I2pHVmh1m1u5R1QIubo8fzp-O45ahqgX0IzWU07NEG7z4EQFF4B-9-AP4fbQLcBFH7mH_hsu_dpUl7uGzy6LJLVMY5b7I7SK78OC7r7iNChb67YFzuOh-zrMDEvaVz5kYnv1WDC8M47OGxmYHQiVeEuDx2SNAqR-IkAOmRINi0gDPgni7jyMLsVuRU8ElrYklfyAj00j8gI7HgCWJXXpRiHHfgl0EMwmxk6AxqJF3FInJYH3-SpM4GsQ1BV9vFCZ1aS3Tq3otjnAmIH3TzOROM7NrM9NbM3eOBqOJCH0afYn4RqnpnDB_D55oJqB4E09-J54wuAZU3AIKwzVEuwnRUl70HkzxXEL8JqgwuhdVco0Kpd7CStdgheEFV3KhVZhJl7fplC67x0UFiCZH4gIzA3X64Wg6ieNX2kudmLP-eakvsjfE17z3X6sy8b8S_1j1A4o2rAXF6TewRegExKU3Vq0aNFNYp8YegKVgpSb9_6tRFAerJekHikX8o9n5rnIV35SrkpmKajOjtaHsxQ8Jebab8SfcvylRLn-WRA_XTE7HPKWua5hLoDnTKH-m3p75p5uSVtlOOx2bjb_tGNJbjNOhgi1m3PYpsbwYI3NBU1kpVhIvv867tVyyvd7duS7pNL_yfV9urSfST4vLH2-V3xNXWbKI7ORNE-I7TRBUKbw4Lcrn2LWn4pnchPGsGE_cNwU5ew-w6CZq14BldwLLr23d3ZpZrLWNdAKe3zbX7wS0aIbh283w3nCZWvK8GYYrzTBeQ1VcemG0kpJhfE9O_goKnBS8Qb9mo020jmoyhgLUs-ni5RpOp-lmOalgjqE_EKfCkUloVoPSb80meIV_4aRW5u2UYo0o7diBcnDMqEmj5JF0xvTaloDoF_ycTidbutCsLZqPG8p-B23wBzecLlyeo39ol1r67nywE9QBzTogQc84qENPlWFYGM-HFzCH4_lg5z7kIacOHXU9V6pvGie-y3jjTPuCegmIAcfGU8eqzk2lrB5Zre-nMLAePoKhtulbGfa1jSO93cA-zLJ0l0a7ONjU-7gu4oJuDDMc9lfOxqnRTVQ9wDcUoeTQdi8Gzo6vPM_H682g-H7pwxbL8lBO_nOldPzx0fy_UR_-ZVojSlykeZzmm25fhEFeFTUUVVgVTRHQsqjiXZqFuyzLwoBuOC2B672XfsSbgwAMWyvC3iLSxw3bR0GEqRAmYRykcbANIpQCtGzSpIwbCli74UgZ37oWJVW7UXtnUjm02nYupo1-fUm1HbsBnDqUTwfTSbWvpamp7jZO9d6Z_h9Dzq_f">