<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/79779>79779</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Check if any bytes in vector are greater than 0 should be optimized
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Validark
</td>
</tr>
</table>
<pre>
I wrote 3 functions in Zig to check if there are any bytes in a vector greater than 0. ([Godbolt link](https://zig.godbolt.org/z/WWocv1Mf6))
```zig
const std = @import("std");
const VEC = @Vector(16, u8);
export fn foo(y: VEC) bool {
return 0 != @reduce(.Or, y);
}
export fn bar(y: VEC) bool {
return 0 != @reduce(.Max, y);
}
export fn baz(x: VEC) bool {
return @as(std.meta.Int(.unsigned, @sizeOf(VEC)), @bitCast(x > @as(VEC, @splat(0)))) != 0;
}
```
x86 (Zen2) assembly:
```asm
foo:
vptest xmm0, xmm0
setne al
ret
bar:
vpcmpeqd xmm1, xmm1, xmm1
vpxor xmm0, xmm0, xmm1
vpsrlw xmm1, xmm0, 8
vpminub xmm0, xmm0, xmm1
vphminposuw xmm0, xmm0
vmovd eax, xmm0
cmp al, -1
setne al
ret
baz:
vptest xmm0, xmm0
setne al
ret
```
We missed the optimization on `bar`.
aarch64 (apple_latest) assembly:
```asm
foo:
ext v1.16b, v0.16b, v0.16b, #8
orr v0.8b, v0.8b, v1.8b
fmov x8, d0
orr x8, x8, x8, lsr #32
lsr x9, x8, #16
orr w8, w8, w9
orr w8, w8, w8, lsr #8
tst w8, #0xff
cset w0, ne
ret
bar:
umaxv b0, v0.16b
fmov w8, s0
tst w8, #0xff
cset w0, ne
ret
baz:
cmtst v0.16b, v0.16b, v0.16b
umaxv b0, v0.16b
fmov w8, s0
and w0, w8, #0x1
ret
```
We missed the optimization on `foo`. Not sure whether `bar` or `baz` is preferred.
PowerPC (pwr10) assembly:
```asm
foo:
stwu 1, -32(1)
xxswapd 35, 34
xxlor 0, 34, 35
xxspltw 1, 0, 1
xxeval 34, 34, 35, 1, 127
vsplth 3, 2, 1
xxeval 36, 0, 1, 35, 127
vspltb 4, 4, 1
xxeval 0, 34, 35, 36, 127
stxv 0, 16(1)
lbz 3, 16(1)
cntlzw 3, 3
not 3, 3
rlwinm 3, 3, 27, 31, 31
addi 1, 1, 32
blr
bar:
stwu 1, -32(1)
xxswapd 35, 34
vmaxub 2, 2, 3
xxspltw 35, 34, 1
vmaxub 2, 2, 3
vsplth 3, 2, 1
vmaxub 2, 2, 3
vspltb 3, 2, 1
vmaxub 2, 2, 3
stxv 34, 16(1)
lbz 3, 16(1)
cntlzw 3, 3
not 3, 3
rlwinm 3, 3, 27, 31, 31
addi 1, 1, 32
blr
baz:
stwu 1, -48(1)
xxlxor 35, 35, 35
vcmpequb 2, 2, 3
xxlnor 34, 34, 34
stxv 34, 16(1)
lbz 3, 31(1)
lbz 4, 30(1)
clrlwi 3, 3, 31
rlwimi 3, 4, 1, 30, 30
lbz 4, 29(1)
rlwimi 3, 4, 2, 29, 29
lbz 4, 28(1)
rlwimi 3, 4, 3, 28, 28
lbz 4, 27(1)
rlwimi 3, 4, 4, 27, 27
lbz 4, 26(1)
rlwimi 3, 4, 5, 26, 26
lbz 4, 25(1)
rlwimi 3, 4, 6, 25, 25
lbz 4, 24(1)
rlwimi 3, 4, 7, 24, 24
lbz 4, 23(1)
rlwimi 3, 4, 8, 23, 23
lbz 4, 22(1)
rlwimi 3, 4, 9, 22, 22
lbz 4, 21(1)
rlwimi 3, 4, 10, 21, 21
lbz 4, 20(1)
rlwimi 3, 4, 11, 20, 20
lbz 4, 19(1)
rlwimi 3, 4, 12, 19, 19
lbz 4, 18(1)
rlwimi 3, 4, 13, 18, 18
lbz 4, 17(1)
rlwimi 3, 4, 14, 17, 17
lbz 4, 16(1)
rlwimi 3, 4, 15, 16, 16
cntlzw 3, 3
not 3, 3
rlwinm 3, 3, 27, 31, 31
addi 1, 1, 48
blr
```
Looks like we missed the optimization on `baz`. Not sure which one of these is the best though.
([Godbolt link again](https://zig.godbolt.org/z/WWocv1Mf6))
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzEWE2TmzwS_jXypWtcQnwYH3xIZt7ZStVmk1NSlcuWAGHYEYiVhGH867ckYMxnxkl2a6eYBtNPP93qbsmyqFL5uWTshPyPyH_a0VpnQp6-UZ4nVL7sIpG8nj5BI4Vm4EJal7HORakgL-FHfgYtIM5Y_AJ5CjpjkgE1_-UrRK-aWRiFC4u1kHCWjGomQWe0BLwHRELkf_ybSCLBNfC8fEH-EyJhpnWlkPsBkWdEnq_5eX_uMHshz-YNIs_fv4v44nxOA0SO5sJPCH_oZYC765qfuzexKJUGpRNA7hMgD-dFJaQ2_glROkGEGA734xj-7a_HAf7Nxo9I6ASIPEIdjtGdZK1hhLSEVAhEwlfkfjAUiBwhEoIDOvR4AADJdC1LwICI0_uQLKljhki4_yKNk9eJj8PTurOIyj9z9pm2v-LtikjY3uMNeZgqREKlk33BNN1_Kk2-93Vp-y0xTpGHVX5lX1JEwo7PXlYR5fqRKmPSAnL_euOzuM604tTo8WBnrmGMeGUwQ1uMx9aGgWnDH6w0HQBUKVZE3ORztaOoKro3psgDBvq_S6WZ0gBtUWATo713CMV0yQCA8qmJZHrsx1RzyRoXFft3Mnxui8Lp2W_3mUUrJMzi2EAqyZsJp0WGPexSFXlZR3cxZUVeVkLVzRDlPAeXQlzMKFjXcSNN_xcXlb1TbvQPMw-LFC5yd_2VirzPutYu3xkUuVIsMWsdiErnRX6lZkEEUQIKsKlggPdjG0plnAWeaTNaVZz9k1MT1Xq73dVorNXd8Jy9E0RmYBe8fELEDad2QsrODu_DAdw_OOZhAk4LcbGVDA0gwetUnXYsuZLGtUumBua1NTjeoIi4TtDDBsLGanp5XHc6wYydzsarle71iLi4TdNZvylm89jY3ijZ_VOzLmhrkhPhUcpXs9e5V3gZ2E07De4Xo1o2fVx09Gs9sRbqbw-Glt2i1EU6GouzFfNvTSkzAwK8h38IDaqWDJqMmZ3GbbqB6D9czYdcQSVZyqRkyWQifhUNk19NjGHVSAf_wQxUuqnBLpkPLjH7grctyIBoW9XQKgHXNyjXm2u5kIB7lZF-D2hbVXHddOQW4MxN2YXywWwwtjgjyGG2BBq6DFyjJD-hC0b-bpSrbBFYr96YrafB86A63gWP0u2l9xaspo9H1y7mDX1can5toMO4U10putm1ppO8yctiUJmcHOyj08sJmCZJ3qfVqmdLWsTl5grxpw1yKWhbR13FyHgcQ3-8mS1Lumn7XkvcZRht99KmvS13H-t_sd7_80IvKrxcbUeF9sKNQnOzH-sL5k9m-5A4u8X7WerblpeGZDzrvZU5dV-STRY29B01Xi8CN1kd5XSeR6MueoB3Syfu5bozs28fO1uQkAFl5QbJeu4XXO4AtnKD63Afl3drq_n6duNar8SCyx_AVm5w-e8kKhhQVm6QePcFdBjAVm5wufdxhQPYyg2u9VVywXUcwFZucDnvJMrBA8zKDZb1SbAkcwa0letkzvFOMjKgrezAN5Y7u9xxB7SVGyEd3svSG8zKDZY7G9zxB7SV__cvcy_c-jJf26T-XYgXBTx_YdC8_xPwOt-v5nEGomQg7BmZYmaPaqwj8-tUZ6I-Z5Od6vJgDOiZ5uVvH4_tkpObHN0j3bGTc8CBj4njOLvsdPDSIE0dklA_deNj4Dvh0Y0J81mQxL7v7_ITwcTDDjlijzi-s3domiZxGjOM3SD1XORhVtCc7zm_FCaCXa5UzU6H4-Fw3HEaMa7s4SIhJWvAKhEhyH_ayZOxeYjqs0Ie5rnS6saic83Z6XE4W5ycKPbniVSy2ZkiqEzUPIHorTIs2dWSn6Y5O-c6q6N9LApEno3H_vZQSfEvFmtEnm2cCpFnO47_BAAA___XE0Yl">