<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/72530>72530</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[x86] Work around slow compress store instruction on znver4
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
farnoy
</td>
</tr>
</table>
<pre>
Zen 4 has a microcoded, slow implementation of `V{,P}COMPRESS*` with a memory destination operand. It doesn't have this issue with register-to-register compress, and it also doesn't have this for any of the expand load instructions either.
I'd like to contribute this optimization but I don't have experience with LLVM and could use some pointers.
For the repro case, I've created an example on [Compiler Explorer](https://godbolt.org/z/9nTvrj6Tq).
After the fix, I would expect the output of `-mcpu=znver4` to use this instruction sequence instead of a compressstore:
1. compress into a vector register
2. `kmov` to move the mask into a GPR
3. `pext` to compress the mask
5. `kmov` to move the mask to the vector domain
6. masked store
References:
- https://www.uops.info/html-instr/VCOMPRESSPD_M512_ZMM.html#ZEN
- https://www.mersenneforum.org/showthread.php?p=614191
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx8lE1v4zgPxz-NciFi2HLtNAcf2qZ-UGD6bNEZdIFeBrJF15raokeik7SffiE5zUwXu3tJBIsvvz9JUXlvXixiJYprUexWauaeXNUpZ-lt1ZB-q57RwgX0yoOC0bSOWtKohbwBP9ABzDgNOKJlxYYsUAeiTJ_E5lrImwex2d38cf_wePv1q5BXokzhYLgPgXAk9wYaPRt78pzQKasTuGPQhN4KuWHo1R6Be-PBeD_j4u_wxXhGt2Zaf5yhpXFy6H0gU1aDYVCDp3-M1ZEDZd8CLfcIeJyCx0BKg7Ge3dwGJA9ouEeXiHQn0qvl907IjYbBvCIwQUuWnWlmPgWmic1o3hdFzcxwB5p-y47HCZ1B256UfPnydB9pW5oHDbNH8DQiTGQso_OfUtfkIq7DyRG0ymPQGoD2CK1DxahBWcCjCk0BsiCK6xsaJzOgg9vjNJBDJ4qdkJc98-RFfiVkLWT9QrqhgRNyL0LW70LWW_tt736U334Kuf1EcdWFageOzhwjABwifNDWcryhmaeZT8OwHttpFvnu3e7RXYQhYIpKl7b-qjd4_DnH2oSPqHQIoM6N9UwOA3HEyJLzBRjLBAr22DK583QsdjIJDK8j7U-ZR4pTgDAq__rh-r-Hx8U8j-YTHvlkfk7y4bLYFf8dlimeT0SaRmXs4lgm0QI1LHp-q-wjduiCfn9WuYbPfTocDslMk0-M7UjIuudxWMcSClk_fby1h933-yKT35_v75NgIWT-fPv_f484ovNoLXbk5vE0A76nA_cOlU6mfhJ5PYl8V2YX2TZb6SrX23yrVlhlmzTNCllcFqu-alQhU5SX5ba4LFutiq5rJKalLDdpi0W3MpVMZZ5lWZmVeSllUsgiU81Gyqztsi7V4iLFUZkhGYZ9JFnFZ19tZJGnq0E1OPi4q6S0eFh2gpAyrC5XBZ91M794cZEOxrP_FYUND3HJHS9LUezgT3KvoBzNVi9r7Nzm2JVPU0kWltFdzW6o_vZuDPdzk7Q0ClmHbKe_9eToB7YsZB0ZvZB11PBXAAAA__-ZRMom">