<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/88230>88230</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[arm/aarch64] 64-byte matching should use `ld4` when possible
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Validark
</td>
</tr>
</table>
<pre>
Same goes for 32-byte matching on 32-bit arm machines. [Godbolt link](https://zig.godbolt.org/z/Pfs51hbh6)
```zig
export fn maskForUnderscores(src: [*]const u8) usize {
return @bitCast(src[0..64].* == @as(@Vector(64, u8), @splat('_')));
}
```
On `apple_latest` this gives us:
```asm
.LCPI0_0:
.byte 1
.byte 2
.byte 4
.byte 8
.byte 16
.byte 32
.byte 64
.byte 128
.byte 1
.byte 2
.byte 4
.byte 8
.byte 16
.byte 32
.byte 64
.byte 128
maskForUnderscores:
ldp q1, q0, [x0]
ldp q3, q2, [x0, #32]
movi v4.16b, #95
cmeq v3.16b, v3.16b, v4.16b
adrp x8, .LCPI0_0
ldr q5, [x8, :lo12:.LCPI0_0]
and v3.16b, v3.16b, v5.16b
addv b6, v3.8b
fmov w8, s6
ext v3.16b, v3.16b, v3.16b, #8
addv b3, v3.8b
fmov w9, s3
orr w8, w8, w9, lsl #8
cmeq v2.16b, v2.16b, v4.16b
and v2.16b, v2.16b, v5.16b
addv b3, v2.8b
fmov w9, s3
ext v2.16b, v2.16b, v2.16b, #8
addv b2, v2.8b
fmov w10, s2
lsl w10, w10, #24
orr w9, w10, w9, lsl #16
bfxil w9, w8, #0, #16
cmeq v1.16b, v1.16b, v4.16b
and v1.16b, v1.16b, v5.16b
addv b2, v1.8b
fmov w8, s2
ext v1.16b, v1.16b, v1.16b, #8
addv b1, v1.8b
fmov w10, s1
orr w8, w8, w10, lsl #8
cmeq v0.16b, v0.16b, v4.16b
and v0.16b, v0.16b, v5.16b
addv b1, v0.8b
fmov w10, s1
ext v0.16b, v0.16b, v0.16b, #8
addv b0, v0.8b
fmov w11, s0
lsl w11, w11, #24
orr w10, w11, w10, lsl #16
bfxil w10, w8, #0, #16
orr x0, x10, x9, lsl #32
ret
```
With more manually optimized code, I get:
```asm
maskForUnderscoresBetter:
movi v0.16b, #95
ld4 { v1.16b, v2.16b, v3.16b, v4.16b }, [x0]
cmeq v5.16b, v1.16b, v0.16b
cmeq v6.16b, v2.16b, v0.16b
cmeq v7.16b, v3.16b, v0.16b
cmeq v0.16b, v4.16b, v0.16b
sri v6.16b, v5.16b, #1
sri v0.16b, v7.16b, #1
sri v0.16b, v6.16b, #2
sri v0.16b, v0.16b, #4
shrn v0.8b, v0.8h, #4
fmov x0, d0
ret
```
[Same Godbolt link](https://zig.godbolt.org/z/Pfs51hbh6)
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysV29v46gT_jTkDVoLD_77Ii-ajfLTSj_pVjrd3ssVtknMLTYp4G7aT38CO67r2E0rXRUFFx7PPMw8kwFmjDi1nG9RvEPxfsM6Wyu9_cGkqJj-tSlU9bz9kzUcnxQ3-Kg0pvCleLYcN8yWtWhPWLV-TljMdIMb5ma5CTCKd_9TVaGkxVK0v1C8R5DV1p4Nog8IDggOL-IUnHpMoPTJzSA4fD-aOKyLOkGQI7JH5GH4Tkj_eRGnfoZfzkpbfGxxw8yvg9J_tRXXplSaGwSZ0SWiD44IggcU70vVGou7DEGOOyNeOEbprreEMcaa2063GEWkEPYrM3YwEe9IECQRivcBggeM6B7RvYMx5wRF5AcvrdIIsiRC8LV34B5QRMxZMmcHQfoTQeoX_IcOjlG6n-1uuuU_WowSws5nyX9KZrmxKCHY1sLgk3jiBnc-mG8NMNP0M8H_v37_Rn6SEYKHv8BnEONweRqG6ev_0TIsWzGaLM1TWEYnK8ZDWDM_I7di9zOclxnjT3IeGS9ocZ4AWZ39-Bg6nTwSr5Z4dyGuSm6Qj9Sj4BXlHoBSuIE36km48SkKwqQYcHn8FlQ2_NGD6BU0eepffINnlfZ0L5lDjKqa8dT9juIrSw9G9EGqEBB9FeOcMmsrvMomXmJTPeEiGYDZbPXYqCc3_vbuzSyz_GLXfdFJyLIFnxjjgt5z6wvf0GFV6T4qPZvh20OkkQt-xszAyAoWMzMGbQm4FrTXDcDyBmbs50Fb8gV3glbAew5daEIvZzOrNRefyfIwIKAQzWObTwBvgjuv6-J4EfL1jWyweLU8wscshOM2w3fr45qMJfw0GWMWYADdUS-sJGLJT_gB9Yb33A65mPWFRRX30DsyJiM98qEALuHfU3M4YD-5oTGQS_7IBwJJ7vr1zMz8R9LI69IwTAU9j_ZV-eFtvJeVfX1jVdgzD30jufRvXaaVM3Y-ze07x5O_ha1xo7Q7DbYdk_IZq7MVjXjhFS5VxZ3Jb_jErT-HTc9yK0eW296549ZyfdNBx0ZH3ml0sor8iNLdtFRgre1hdxq7bcWjnOOFuiO38hzxyYLLKX4EpguM3jM8r6YVvNECz4jEk3CFK-hX6-mn0MkEDXfR08TNSsDUuh3Q2RWc1cvYa8n1Yq7Ih4SL4p2_1fxXF5RNtaVVTnO24dswDSGFNIZ8U29ZGkYRiVOewTGNj7wMqyzhNM2iLCwpRBuxBQIRiUJCYppFeRCWVV7kYclpnsMxAxQR3jAhAymfGkdiI4zp-DbLgJKNZAWXxl_fAFr-G_tFBO5kuNFb986XojsZFBEpjDWvVqyw0t_7mG4QHBjTZe0vOTiJZjc8U6tOVrgz3N1GZBW5S8jvmrf4rIwRheSbTsvt29CdhK27IiiVs-68DsOXs1b_8NIiOHiuBsHB7-XfAAAA__90hZwt">