<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/110426>110426</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[Zen 4] Prefer to do work in k-registers instead of always moving over to general purpose registers
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Validark
</td>
</tr>
</table>
<pre>
Some code ([Godbolt link](https://zig.godbolt.org/z/cxE9xsTr1)):
```zig
const std = @import("std");
const Chunk = @Vector(64, u8);
export fn foo(chunk: Chunk, a: Chunk, b: Chunk) u64 {
const zeroes = @as(Chunk, @splat(0));
const bits_a: u64 = @bitCast(zeroes != (a & chunk));
const bits_b: u64 = @bitCast(zeroes != (b & chunk));
return bits_a +% bits_b;
}
```
LLVM IR (optimized):
```llvm
define dso_local i64 @foo(<64 x i8> %0, <64 x i8> %1, <64 x i8> %2) local_unnamed_addr {
Entry:
%3 = and <64 x i8> %1, %0
%4 = icmp ne <64 x i8> %3, zeroinitializer
%5 = bitcast <64 x i1> %4 to i64
%6 = and <64 x i8> %2, %0
%7 = icmp ne <64 x i8> %6, zeroinitializer
%8 = bitcast <64 x i1> %7 to i64
%9 = add i64 %8, %5
ret i64 %9
}
```
Compiled for Zen 4:
```asm
foo:
vptestmb k0, zmm1, zmm0
vptestmb k1, zmm2, zmm0
kmovq rcx, k0
kmovq rax, k1
add rax, rcx
vzeroupper
ret
```
Suggested:
```diff
foo:
vptestmb k0, zmm1, zmm0
vptestmb k1, zmm2, zmm0
- kmovq rcx, k0
- kmovq rax, k1
- add rax, rcx
+ kaddq k0, k0, k1
+ kmovq rax, k0
vzeroupper
ret
```
On Zen 4, almost all operations in k-registers have a latency of 1 cycle (except [KSHIFTRQ](https://uops.info/html-instr/KSHIFTRQ_K_K_I8.html)). That means that LLVM should be more aggressive than it currently is to keep things in k-registers and do computation in there.
However, hopefully the cost-model also considers that general-purpose registers have access to a far more powerful instruction set, like `lea`, which can fuse multiple operations together, which is not supported in k-registers.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysVk2TmzgT_jXypcsuLMDggw-Zcfwmlby1u0lqDntxCakBrYVEJOEZz6_fEuDP-dhs1bpc2Ki7n3766QaJOScrjbgi6R1J1xPW-drY1QNTUjC7mxRGHFbfTYPAjUAgNCfp3f-MKIzyoKTekXRNaF573zoSfyB0Q-jmWVazavCZGVuFFUI3_Onj8sn9sHNCl-EbfyDRmkTH6yIavs-yGla40c6D8wJIvAaSRLJpjfWBAqXOC0JpD3N36X5fd3p3DHhA7o0lNF8khN5Dl1_6D1d8CphQaiiNITTnIZ7EHwagEMWu7oqLuyV0iwRINgICAAwkntEadEcWzBGanwBIErlWsVBGdBTiBUAhvdv2ifsMA04h_T1zIfCIT-e9ieYMCF0AH2ldYV4AFr8OWLwNCAAWfWf1yBIIvSM0PaU4yputbxp7KfvXrw__h8_fQirTetnIZxTvzIRS-2ZYElhKjSCc2SrDmQIZ6kmioXskvl8k8AQyJ_FHIDSNeslvV-evroZpgh5022nNGhRbJoQ9N_ij9vZwogghJu61ZFq8lSRQOLsP0kvetKDxZUgcQkIvpJZeMiWf0V5Ep310IT1nzp-j52N0At4EOcYIQtPF2-zoK-yy99kt3mZHaJq_Ty67JheWlgM7IYYe0jQfOaVHJ4v-aFv-ylTdm6aVCgWUxsKfqCF5a6CYG-cpzM2po_vWo_NNAeNn14_Pc9PMx9_o_AyEzwv_ox-98d81Zv8zPDj8KVh2NzgnMxvM82tzkAjO5gByTSM0pWvbUzss-ndU-t5VFTqP4i1xhCzLV9S5Lfq_Umd6pcKNRNP3JZr-g0SE3sGOCfHz1M3xOj_bX09wW8y_0_g3Pc5f2EBUY5wHphSYFi3z0mgHUsNuarGSzqN1ULM9AgPFPGp-AFPCHPiBq37LxSeOrQeS3n35_unz5se3P17bdTvTupnUpSF0U_tGTaV23hK6OQZtv2y_bD_ns2Ac3uoz-FEzDw0y7cCHv_2L2dWmUwIKhMZYBFZVFp2Teww-GqQH3lmL2qsDSBee7B1iC76WunpRWXj9CAPcNG3n--KDh6_R4uxSsk_mEfdog2K1abHslDoEN-DG-WljBCpgypl-R5MiQPeMK9RomZq2nW2NQ7jVlHN0PUcGJbNDRa15RFt2CnqJOt6zcuhDciV3CGHPQRaaSu_hsZa8Bs40lJ1DaDrlZavwspveVBhKOrtLB9p4cF0bDhgobmSZTcQqFst4ySa4mmc0W8yzLFpO6lWWZDHNYp4lcRynNM0TkSecL9JlnJSULiZyRSOaREu6nMdJTpczkeRZERVplJfRMhIZSSJsmFSzsG2GA9hEOtfhaj6PErqYKFagcv2Bj1KNj9Bbw2kqXU_sKgRNi65yJImUdN6dYbz0qj8pDrOdruF3iyXaoK4w8Gjs7rb7QWBkIgw0U4_s4KAxe6krMPshbmwfvGjfpLNqdT3hlfR1V8y4aQjd9GeC4WfaWvMXck_opq_FEboZi92v6N8BAAD__0-ZHiI">