<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/110626>110626</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AVX-512] Moving from a GPR to a k-register just to spill to memory should just spill via `mov`, or just stay in a k-register
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Validark
</td>
</tr>
</table>
<pre>
My real code compiles like so on Zen 4 and Zen 5: ([Godbolt link](https://zig.godbolt.org/z/fEzPnWjcE), line 3158 in the source code, line 5165 in the assembly)
```asm
kmovq k1, rcx
mov rcx, rdi
and rcx, qword ptr [rsp + 56]
kmovq qword ptr [rsp + 16], k1
kmovq k2, qword ptr [rsp + 16]
kmovq k1, rcx
mov rcx, rdi
and rcx, qword ptr [rsp + 72]
kmovq qword ptr [rsp + 96], k1
vmovdqu8 zmm19 {k2} {z}, zmmword ptr [rip + .LCPI5_37]
kmovq k2, qword ptr [rsp + 96]
kmovq k3, rcx
```
As you can see, we move `rcx` to `k1`, and then spill that to `qword ptr [rsp + 16]`, which we then immediately read back into `k2`. Obviously `kmovq k1, rcx` + `kmovq qword ptr [rsp + 16], k1` => `mov qword ptr [rsp + 16], rcx`, or, even better, `kmovq k1, rcx` + `kmovq qword ptr [rsp + 16], k1` + `kmovq k2, qword ptr [rsp + 16]` => `kmovq k2, rcx`
Then we do it again with a newer version of `rcx` that we did an `and` with. :facepalm:
It should just be:
```asm
kmovq k2, rcx
mov rcx, rdi
and rcx, qword ptr [rsp + 56]
vmovdqu8 zmm19 {k2} {z}, zmmword ptr [rip + .LCPI5_37]
kmovq k2, rcx
mov rcx, rdi
and rcx, qword ptr [rsp + 72]
kmovq k3, rcx
```
I also think the GPR's that spilled to `qword ptr [rsp + 56]` and `qword ptr [rsp + 72]` could probably have been spilled to a k-register instead.
```asm
mov qword ptr [rsp + 72], r11
; ...
mov qword ptr [rsp + 56], rax
; ...
```
Could be:
```asm
kmovq k4, r11 ; formerly [rsp + 72]
; ...
kmovq k3, rax ; formerly [rsp + 56]
; ...
```
Then we do:
```asm
; Then we could move `rdi` to a k-register, since we use it so much.
kmovq k7, rdi
; Now the above code, transformed
kmovq k2, rcx
vmovdqu8 zmm19 {k2} {z}, zmmword ptr [rip + .LCPI5_37]
kandq k2, k3, k7 ; obviously now we could do a different register allocation than what we had before
kandq k3, k4, k7
```
Unfortunately I don't think I can make a small reproduction, because register spilling does not happen in trivial code.
Here is the unoptimized LLVM IR dump: https://gist.github.com/Validark/a19d2babb7955a54a456d0683e95f7d4
Here is the optimized LLVM IR dump: https://gist.github.com/Validark/fd231af0b28cf1bea193d07a18b6d52c
Thank you to whoever helps fix the register allocator!
‒ Validark
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy8V1tv27gS_jX0yyCGRFmS_eCHXHsCtOcUxdnuYl8WI3FksZZIh6TsOr9-QUp27NZ2E6BtEFiX0cw39xmitXKhiOYsvWHp3Qg7V2sz_4yNFGiWo0KL7fzDFgxhA6UWBKVuV7IhC41cElgNWsHfpGACqES4S1lyDYxPWXrzTotCNw4aqZYsvWN8Wju3siy5ZvyB8YdnuRgv-m_G2iz8G8Yfqvvnj-rPL-U94zPGbz03QRKnU5AKXO1RO1NS0GdPT-Ms3dHRWmqLZuv5ozsWXQ-_WdT_o237NzD8LVu9fvLX2Msz5ddjcqvX4eoJni7kMd1bfkB_2mgjYOUMsPTG2BUwfgNp5h1wEvXk93H4nt96nU7rys9ixeexfqGFOX-bhbNzFq5bvRZP3XT3_Ny28QxYfrPkLL_zN88sD5zPbXskWfaSx-9vPz6m_yT5eTecd97sgvOSI-ft8-kwya4tbHUHJSqwFNJzQ969BCyLPGsWgdP-YRl7Vn4bnOtqUmBXsmnA1eiGTy5Et2fd1LKsPULgl21LQqKjJlSsgALLJUg14HGWRWP4X7GWurPNNrwLlr0kRRYFiD3lh6npGZI7ltx7Hp9Flzl6DH-njf-lNSkoyDkKjz9Lo0OGH9bJkQmHPIOyB8H9v_fyhkBokA5wgVLBRroaEBRtyMCajJVaga4Ow-0D6rmkAFSegEp4gmcdA0uuKyxphU0bOucB3qMDW-uuEfClsw4K8p3zbR2N_86O9tsq93c3sNdU_iNgYzW4WqplGEPvPn5iPLd9_ENtk7hU2OkuHb2y5z7qlcwiKENerIwusGi2UOOaoKBdF-mREJZXhhbSOjIglXWEYgyvy6Czxdxr4N0Rf9O4WXID4_H4lXLSfVPArxfknPT1bTD-zfWwnAx6B4hKm5aMb4QXc-CkVYO8ZND_rLwTRfJj617azKvt80J3bH1m7GeOkMPMOcwGr7eVqiTP0Fny_cxqaLuyPm1pflRW34P_V2_63avwuLvVzBlUNvhFvKmcf2UjQSWedsB9AJd5MEHvJ6PSmxdHCu85IauKDCkH-3rCptElOt_uXY0KNkObr_3gpUobOg3cQ0564AtZ8IeqtHGd6uf5IwitGM_d0GAew4rR4pIAwbbYNGBoZbToSq-Sl15QiT60e41DZ5BqAUKTBaUd1Lha-cVBgTNyLYclf3yoxn_IEEgbgtspvXKylc8k4P37zx_g8ROIrl35uXW82XvE8UK6uivGpW4Zf9idKRh_wHgmeIFFkc_SFNMJTtJMRNk0oVla5WLyPfBPga0ET2KsooJPyyouCONZIqIc42mRiZSXxyWIahk2OadhU2tak4GampWFSn4NOn2bCH6jiY-q9Z6zacRmHPbnKTFPxCyZ4Yjmcc7zPE2SZDaq59mkLLHI0qgscToVk4qoTPNZUUxERUWJIznnEZ_EURRHCZ-mszHHOMoxycrpFFPMCzaJqEXZjJtm3frD1Eha29E8jqOMZ6MGC2psOOZxrmgDgcq4b3cjM_dMV0W3sGwSNdI6-yLGSdeE8-H157-u0tgzwAe99mlUGd0C-jn33bAJ-4rTu5VWQ0utNtujdaanrSUOu-N-MxzIDrc-Mw_ljjrTzL8N-UG0vdbD5Wpl9BcqHeMPwVbL-MPgjPWc_xsAAP__WkFHDg">