<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/109122>109122</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[Aarch64] `clz` on a vector of 2 x u64 should be better optimized
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Validark
</td>
</tr>
</table>
<pre>
This code ([Godbolt link](https://zig.godbolt.org/z/4j538eG1P)):
```zig
export fn clz(x: @Vector(2, u64)) @Vector(2, u64) {
return @clz(x);
}
```
Gives me this emit for the Apple M3:
```asm
clz:
ushr v1.2d, v0.2d, #1
orr v0.16b, v0.16b, v1.16b
ushr v1.2d, v0.2d, #2
orr v0.16b, v0.16b, v1.16b
ushr v1.2d, v0.2d, #4
orr v0.16b, v0.16b, v1.16b
ushr v1.2d, v0.2d, #8
orr v0.16b, v0.16b, v1.16b
ushr v1.2d, v0.2d, #16
orr v0.16b, v0.16b, v1.16b
ushr v1.2d, v0.2d, #32
orr v0.16b, v0.16b, v1.16b
mvn v0.16b, v0.16b
cnt v0.16b, v0.16b
uaddlp v0.8h, v0.16b
uaddlp v0.4s, v0.8h
uaddlp v0.2d, v0.4s
ret
```
It seems to me we could combine `bitReverse`+`ctz` to get better emit for `clz` for vectors where each operand is a u64.
It's also conceivable that we could use `clz` with u32 granularity and combine adjacent elements.
I think it should do something like this:
```zig
export fn clz2(x: @Vector(2, u64)) @Vector(2, u64) {
const clz_with_u32_granularity: @Vector(4, u32) = @clz(@as(@Vector(4, u32), @bitCast(x)));
const base = @as(@Vector(2, u64), @bitCast(clz_with_u32_granularity)) >> @splat(32);
const mask = @select(u32, @as(@Vector(4, u32), @bitCast(base)) == @as(@Vector(4, u32), @splat(32)),
clz_with_u32_granularity,
@as(@Vector(4, u32), @splat(0)),
);
return base + @as(@Vector(2, u64), @bitCast(mask));
}
```
That gives us this assembly:
```asm
clz2:
clz v1.4s, v0.4s
ushr v0.2d, v1.2d, #32
movi v2.4s, #32
cmeq v0.4s, v0.4s, v2.4s
and v0.16b, v1.16b, v0.16b
usra v0.2d, v1.2d, #32
ret
```
Alternatively, the `usra` could probably have been an `add`.
Assuming I didn't mess anything up, Z3 seems to prove this is a correct transformation? https://alive2.llvm.org/ce/z/878QXU
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0VlFv6jgT_TXmZXSR44QQHnig5VLdh0_6dnX3arUvleNMiFvHztpO2vLrV04CLRTYVtpWqAnO8ZkzDnNmuHNyqxGXZHZDZusJb31l7PIXV7Lg9nGSm-Jl-bOSDoQpEAjLyOzmzhS5UR6U1I9ktiYsq7xvHIlXhG0I2-zkdrodMFNjt2GFsE3yMIszvIv-T9gifOIVoWtC9_9TOnx2cjus4HNjrIdSg1A7wrJnEq-AJPQXCm8sYRkj7BbaNBn4Lj0CMr8ZCAEALPrW6oDdkwYlI4DM1ydi3iq8kx06qBF8OA6spYfSWPAVwqppFML_4ks5cVcPKyHoHgPjX-sqG65dNGVF0N3R8YawODrGGmvD4yjNR-D-LurvPs7Lvog3-SLe7It4o_SLiA8nHBh71AdZ606fxw8oof21x3tpvChU06Oy6l9AHZ0mbsRk1UWeQ4qJO8ZY9Feq5ocHh1g78CbUzhOCMK0qQJg6lxqBpDSX_nfs0DoMe9kNSanwO5LSsGeLHnL0Hu1ryQWA6gHhW9cXvYOnCi0CclGBadByXYB0wIMNTI8VETZ3wJUzIIwWKDueq1DW3L_qax2-ifMkfQVtzGBruW4Vt9K_QIiwT4MXD1yg9oAKa9TeHYcMnqEfQXpwVU9fGHCmxrC8BSUfB1f5nCey_8oUhdHOB8b7kOd9G7P7N3meRkh6mpj1NPH61UtJQrkbrufAfWEk4W3fcuf33jv2gndqch5ewED_jvUoyRPWi2mMBxJ_J_H3sMU1iocNg7hDDzjRUXP3uNfhUKEIW_qEbs9Ku5ZwyOkgY30pufcMx0rH5dEOLmZ7e1yln4tED4FGlvNHNHbT4V2xm0-_q3C4J7-A6z34Z6jRbd-IWzc0Yu4c1rl6-UjvZe-ar1A7GD38YIIHgzsY_MH9ojMGv_dt08kezEamMxhR499HfjvesHemGrzlxOajq4bvLP-Yzut2vVIereZedqjCj6gfb0hKA38wwsEcG2tynqsXqHiHkCNq4DrAeFGQlB5538q5tg4u9wMKWWjC5h5qdA64fhnsr21CoL_i10bRWNONc1Zv4cJYi8KDt1y70tiae2k0iTdwPHhyJTtkU6W6ehw8BY7TZzbPfvvzj0mxjItFvOATXEZzlqazZBFnk2rJ54iCzWlasiLPk_miTEVZMk7zxayYs3wil4yyhC6iLKKMRdm05ElUJnGZUFqkHEuSUKy5VIfoE-lci8uILiLGJornqFw_ZjOm8Qn6p4SxMHXbZdj0LW-3jiRUSefdK42XXvXz-YpbUaUJma3fdCajgY8tEEwJDJ5Dse3bTI775mkaL2u5w2LSWrU8Prat9FWbT4WpCduEuOPlW2PNQ-94m16tI2wzptMt2T8BAAD__-6RUhI">