<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/110740>110740</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
SLP Vectorizer could vectorize and/shr with constant on 3 out of 4 lanes
</td>
</tr>
<tr>
<th>Labels</th>
<td>
vectorization
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
MatzeB
</td>
</tr>
</table>
<pre>
We have a library (fbgemm) with kernels that break apart an 8-bit integer into 4, 2-bit integer values. Ideally we would like to vectorize this pattern which effectively looks like vecorizing this pattern/vector:
{x & 3, (x >> 2) & 3, (x >> 4) & 3, x >> 6 }
Unfortunately the pattern is not identical on each lane as the `>> 0` on the first lane and the `& 3` on the last lane was optimized away because they are not necessary. However when vectorizing the resulting could would be best when choosing `{x, x, x, x} >> {0, 2, 4, 6} & {3, 3, 3, 3}`.
Reproducer:
```
target triple = "x86_64--"
define dso_local void @_Z3fooPfPKhff(ptr %0, ptr %1, float %2, float %3) local_u
nnamed_addr #0 {
%5 = getelementptr inbounds float, ptr %0, i64 1
%6 = getelementptr inbounds float, ptr %0, i64 2
%7 = getelementptr inbounds float, ptr %0, i64 3
%8 = load i8, ptr %1, align 1
%9 = zext i8 %8 to i32
%10 = and i32 %9, 3
%11 = sitofp i32 %10 to float
%12 = lshr i32 %9, 2
%13 = and i32 %12, 3
%14 = sitofp i32 %13 to float
%15 = lshr i32 %9, 4
%16 = and i32 %15, 3
%17 = sitofp i32 %16 to float
%18 = lshr i32 %9, 6
%19 = sitofp i32 %18 to float
%20 = load float, ptr %0, align 4
%21 = fadd float %20, %3
%22 = tail call noundef float @llvm.fma.f32(float %2, float %11, float %21)
store float %22, ptr %0, align 4
%23 = load float, ptr %5, align 4
%24 = fadd float %23, %3
%25 = tail call noundef float @llvm.fma.f32(float %2, float %14, float %24)
store float %25, ptr %5, align 4
%26 = load float, ptr %6, align 4
%27 = fadd float %26, %3
%28 = tail call noundef float @llvm.fma.f32(float %2, float %17, float %27)
store float %28, ptr %6, align 4
%29 = load float, ptr %7, align 4
%30 = fadd float %29, %3
%31 = tail call noundef float @llvm.fma.f32(float %2, float %19, float %30)
store float %31, ptr %7, align 4
ret void
}
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memo
ry(none)
declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>) #1
attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memor
y(argmem: readwrite) uwtable "min-legal-vector-width"="0" "no-trapping-math"="tr
ue" "stack-protector-buffer-size"="8" "target-cpu"="haswell" "target-features"="
+avx,+avx2,+bmi,+bmi2,+cmov,+crc32,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpci
d,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4
.2,+ssse3,+x87,+xsave,+xsaveopt" }
attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memo
ry(none) }
!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"uwtable", i32 2}
!2 = !{!"clang version 20.0.0git"}
```
(tested with github/main 32ffc9fdc2cd422c88c926b862adb3de726e3888 from 2024-10-01)
Documenting this problem here while I am looking at the SLP vectorizer code to see what to do about this.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysWFFv47gR_jX0y8CGRMqS_OCHzeWCHtoCixZtgb4EFDWS2FCkQFJ2kl9fUJRtOavcLQ5ZLJSR9M038w1JcWjunGw14pHsH8j-ccNH3xl7_Dv37_iwqUz9dvwPQsdPCByUrCy3b0Bo2VQt9j2hBzhL38ELWo3Kge-4h8oifwE-cOuBayi3lfQgtccWbfhrICP0F6B3z09cjeh28FuNXKk3OCOczahqUPIFwRs4ofDGyncE30kHA_cerYZzJ0UH2DQovDyhegNlzIuLbicUwUXq9s6J0KfIRtg3kjyS5BuEf6R4eAVCc2AhPULLVyDsV8J-BRqErr7J7t5cH-dAisdI_S_dGOtHzX1Iznd4TV060MaDrFF7KbgCowG56EBxjcDdBCZ5MnMmJE8CJDxtpHV-xun6CpzyuKEUv4DO3IEZvOzlO9bAz_wNKhR8dKGa-Abc4pSLRoHOcfu2g7-YM57QwrlDfS1-rCSCRTcqH-7ENEhxqCqECp2PLqIzxgVESKx4eJ3qs7gUj5dikeIhmSZEuExTI5_e0lDFh6mwy0vxSPJkF4sbr__AwZp6FHgb0BA0_p9uPbctevBWDgqBsMBOX8v8Oc-2W0Lpkq3GRmqE2plnZcKwnIysgWTJ839ZY8z35vtfu6YhtBy8BUL3U-6znQa7UYb7cEfv7liYKhPj8xgjac17rJ95XQdnlgS5l_lI6H4_JdqiR4U9ah9iSF2ZUdcu0i4iT1nIPIN0wZD_KQa6YCj-FANbMJQTgzK8Bll-LBVXstV3KR8m-Du-epBl9PcGJFsmlSYTKMx8yejkFadGhAREOiGc9KYZLqA0CVQx6QUZjQm6zt6x3QVkHwOmdBlxepKthWSrIffrIbMlJv8h5P6HkMVayHw1ZLkeMl9iDmt05RodTW6jujoL4sAuBdE4JA2v68UKSeIHdb-UReOIeC4VCK4U6DDdsLm4ZYlSp37X9HzXMBo2o_UFl35YjSmhh0sY543FxTv6EwLY55r3n7hka5rZmub9F2nO7jVnv6N5_xMC8s8155-4FGua8zXN5RdpLu41F7-jufwJAYfPNRfrLixZ03xY0czSL9J8uN9bks81s_QPBFj00x43b4KXxmW-sgd4GrXw0mj45r11hH0DHbYxVXHxAto0FkP34N60mOScpa7BDShGxT2vFMJZKmXRj1ZDj72JzPaN0FIbjdfUaxQqdCKE_ZLB61z80CAsqnLKYl0-YKZS_8Sj0KyxdCmQe29lNXp08x4cmoPiAfrR-cGa1qJzN5EWxWjdj3I_KLSROyjktu2xD0WzyOuzlT4IhvEcS0Mo7aXeKmy52sY2a3uWte9CV8IeCQ2fyPAlptpsveXDIHW77fkC4OdoI85I57l42Q7W-MhXjU2DduvkO16dyhkbO6OtGMbrq467Myp1D2iQ-9Giu6LmGtIHfgodXTRotKpeXo35kejNabasYJeHr2l-scpoNGkuZqvns-FaV3GH892rs9GS-jQIOU-d-Ei9C-2j2fdzWr05VbPvIFQ_qtk2wxVra6tnBse7ZrYuEZ1DerXY1cp26c2OWeyuuCvwtSxmw_ETLkwz-KnCl-V2Pw_T6zz86qUGHxc4TafF1Zt6VLhrFG_d3CGnoR-l6bxHp-nNc_aZji73YPojfbJAhK4infkIpWfRcft8mZihe2QUsmWc9INzcXOel9DCky496X1ihFKhuG7hhNaFjxlNdskuaWUYhpvf_bnhQlZ6dB7reNhtpe_GitCnnksNjDaNODS1oKLOKBVlKQ40r8qc8rpiNRY0R1aWJTTW9EATmm3TZJvcmpF4fTRiDA327axqTaWwhw4thlOuQvgNeD8dbwOI--kk9s-_fb8djS0IU0-nZYfBKWAM1AZ4ZUY_8e429ZHVB3bgGzymBS3KNE_SctMd9zlNaMn3bH9oGBO8zOqMcUSasgMvE7qRx5B9miQpTfaMlbuqoTlr9nkpBMeKZSRLsOdS7abJYWy7kc6NeEzTpMiSjeIVKjf90kDpJWceNpYwAvvHjT0Gx201ti588qXz7kblpVd4DHL_vZQbzp23nwZ4WMZPocmdRkoY7TzXPhyIGYQSmAay6UjsNqNVx877IWxohD4R-hQHdidMT-hTCDz_CZ_S_6HwhD5NghyhT7Om05H-PwAA__8JZvWD">