<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/137808>137808</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[SLP] -march=znver4 is slower than -march=znver3 on the znver4 series CPUs.
</td>
</tr>
<tr>
<th>Labels</th>
<td>
llvm:SLPVectorizer
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
dianqk
</td>
</tr>
</table>
<pre>
I tested the following code using `-O3 -march=znver4` and `-O3 -march=znver3` on an AMD 7950X, the `-march=znver3` version is approximately 20% slower.
<details><summary>Details</summary>
<p>
```c
#include <stddef.h>
#include <stdint.h>
#define rotate_left(val, shift) ((val << shift) | (val >> (64 - shift)))
const uint32_t RHO[24] = {
1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 2, 14,
27, 41, 56, 8, 25, 43, 62, 18, 39, 61, 20, 44,
};
const uint64_t PI[24] = {
10, 7, 11, 17, 18, 3, 5, 16, 8, 21, 24, 4,
15, 23, 19, 13, 12, 2, 20, 14, 22, 9, 6, 1,
};
const uint64_t RC[24] = {
0x0000000000000001ULL, 0x0000000000008082ULL, 0x800000000000808aULL,
0x8000000080008000ULL, 0x000000000000808bULL, 0x0000000080000001ULL,
0x8000000080008081ULL, 0x8000000000008009ULL, 0x000000000000008aULL,
0x0000000000000088ULL, 0x0000000080008009ULL, 0x000000008000000aULL,
0x000000008000808bULL, 0x800000000000008bULL, 0x8000000000008089ULL,
0x8000000000008003ULL, 0x8000000000008002ULL, 0x8000000000000080ULL,
0x000000000000800aULL, 0x800000008000000aULL, 0x8000000080008081ULL,
0x8000000000008080ULL, 0x0000000080000001ULL, 0x8000000080008008ULL,
};
void keccak_p(uint64_t state[25]) {
for (int i = 0; i < 24; ++i) {
uint64_t current_rc = RC[i];
uint64_t array[5] = {0};
// Theta
for (int x = 0; x < 5; ++x) {
for (int y = 0; y < 5; ++y) {
array[x] ^= state[5 * y + x];
}
}
for (int x = 0; x < 5; ++x) {
uint64_t t1 = array[(x + 4) % 5];
uint64_t t2 = rotate_left(array[(x + 1) % 5], 1);
for (int y = 0; y < 5; ++y) {
state[5 * y + x] ^= t1 ^ t2;
}
}
// Rho and pi
uint64_t last = state[1];
for (int x = 0; x < 24; ++x) {
array[0] = state[PI[x]];
state[PI[x]] = rotate_left(last, RHO[x]);
last = array[0];
}
// Chi
for (int y_step = 0; y_step < 5; ++y_step) {
int y = 5 * y_step;
for (int x = 0; x < 5; ++x) {
array[x] = state[y + x];
}
for (int x = 0; x < 5; ++x) {
uint64_t t1 = ~array[(x + 1) % 5];
uint64_t t2 = array[(x + 2) % 5];
state[y + x] = array[x] ^ (t1 & t2);
}
}
// Iota
state[0] ^= current_rc;
}
}
int main() {
uint64_t state[25] = {0};
for (int i = 0; i < 1000000; ++i) {
keccak_p(state);
}
return 0;
}
```
</p>
</details>
```
$ perf stat -r 5 ./znver3
Performance counter stats for './znver3' (5 runs):
408.52 msec task-clock:u # 0.999 CPUs utilized ( +- 0.36% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
49 page-faults:u # 119.946 /sec ( +- 0.41% )
5,107,180,232 instructions:u # 2.21 insn per cycle
# 0.00 stalled cycles per insn ( +- 0.00% )
2,310,598,646 cycles:u # 5.656 GHz ( +- 0.36% )
481,736 stalled-cycles-frontend:u # 0.02% frontend cycles idle ( +- 2.01% )
27,039,388 branches:u # 66.189 M/sec ( +- 0.00% )
5,051 branch-misses:u # 0.02% of all branches ( +- 0.59% )
0.40895 +- 0.00147 seconds time elapsed ( +- 0.36% )
$ perf stat -r 5 ./znver4
Performance counter stats for './znver4' (5 runs):
576.39 msec task-clock:u # 0.999 CPUs utilized ( +- 0.20% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
50 page-faults:u # 86.747 /sec ( +- 0.89% )
4,062,180,550 instructions:u # 1.25 insn per cycle
# 0.00 stalled cycles per insn ( +- 0.00% )
3,261,654,058 cycles:u # 5.659 GHz ( +- 0.20% )
630,199 stalled-cycles-frontend:u # 0.02% frontend cycles idle ( +- 0.44% )
27,039,720 branches:u # 46.912 M/sec ( +- 0.00% )
5,970 branch-misses:u # 0.02% of all branches ( +- 0.36% )
0.57676 +- 0.00113 seconds time elapsed ( +- 0.20% )
```
cc @RKSimon @alexey-bataev (as it relates to AMD and SLP)
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy0WFuT2joS_jXipQuXLFm-PPAwwLAntTl1UsmerX2bErYYtGNs1pInkIf97VuS78YmJCfrmgHjVl--VvcnS1wp-ZoJsUJsjdh2wUt9zItVInn2n7fFPk-uqw-ghdIiAX0UcMjTNP8qs1eI80RAqcwt8vHyDwrLEy_iI6Lbb9m7KDzkY-BZMi2lRppnwDN4-n0LQcTwvxDZWB9G4XbwuyiUzDOQCvj5XOQXeeJapFcgGBEGKs2_isJB-Mn80U0iNJepQvQZ0Y0qTydeXBF93jaPN4jsuseVzrm-w0_Ix9VfbH4QKrM4LRMBxpZOEnFwjvXYsUxmupVZcSIOMhNQ5Jpr8ZKKg0YkfOepQauO0vyMAJGwemqsINqXBBtoRc-IPptfvgfLdkj9Z_3FeaY0lDLTlLxo-PzbH4itiYfYFhDdAgrWCD8BALjGPVD76ZtPF9tPZj6JlZqQNkCt1LPPWSW1Sq6HyKY2RgI7xmoxO96qArHjPevFt2puZTOyTyov1q9XW0PBFtH1GIvvvWj49GEOirUQVFFZm27Q82WDsr_7kVW-Peu7A1LDtzquDdKt7kkLvIrXogdin0S9JH4XxefNDAp8wcPL_fPjR2NzKAhxSFpBOBTwStAaDFtR9T9ncH8jCAchzBgM3ZlAcDTpCd9GOBKHk4FMGqw9zhkMx8jCoadpQYjDaAZybZTOQZ6eFCO6AzlsAYySi-cEXdZnIgxvZ3k4mbd1EbYG-3X7nssE3kQc87eXMyJhW8HK8JgpYobYtmKoqoYPeWG4SWYapC1ujOja3tpOo2tAZI3IWvaVoGuNuCwKkemXIrbatlOk8UFvhvKi4FfE1qzXR7gfvRmNyA6RHfzjKDSvH_VCvHQhXmyIrIvwMoxwoHft9K5jvetYD9pILzZS9myUmwwyQOTJWCFruAyAAhgwNQx79xcBtJnTrlVrwkIkvFj_XrUKMWCjQDpNYjWH69iNHXdgp6LEqG_v51M5l7YmrQYaewZNHshjXRufj7l9RTnLcYWlXGnoT5bb5eXeJPQL_WYWmmzhpmwb43ZtsyXQOpkSTUyAidOkuVrpL1VL9hPQAuk77wZ0WalTsjnK21q7vigtzr3pan4P58w-HYPuZrqeuGpUv1N_vq7HDdbL6b2--gV-xx313--0Ap1WJtPtSCZ0b4ENdJtWMHhMTMQ3rTAshvvd8CGviLJxhHvN1ZFzY6-20NgxKTxxmdk32S5Z08vGLWffXz7casGaXUN661Tlp4NdQy6ELovM2uyibl7y200DIrtzuxtAZNdtImC0L7Dv9h6cRXGw0GBZAAMHkV29Y6mS-0kUh7w48SwWEOdlpkVhh6sabtDTIIHBz6AoM2UBPPUrtbo8HDqMwEmJGDRXb8s4zeM3RJ9KmLoQofYlwYmiCDaf_lRQapnKbyK5GRiazC4BO-aF33RqNHJtL1x9xXmmxUUv1Vep46NQEwG0rjHGpsCUiO8ZPJfLk3wtuJZ5NmXuUYNeVH2f-atYHniZ6mlrjUHXjZzI82uDk0nspwewY7Y5g_yYPYOLzZbDDc3mgFACMlO6KON5NC0e4hDXkKTKTDFBfI1TUdFxC9cufWkqkkqq7ECrAcOpw3gUmtmfULs_YpHZ-PieX-fbGpqtm158zPGZD3_77dudgV1yZorHC83OKKB-g2RZBbA8FKaSsqQfSYecGGPNkAa8TFJx45o4eDwtzb4U290mDcPq2b7g2UzJDgPwfccNI_j9bmX0sd9kv7lMgWDm1q6XJ6nUnP8R9vwAPE3bmKddY4dFneehc-x4OIxYVyCuF4AScZ4lCrQ8CRApPyvDB3PT-B2m836Y6byHmI4FvkOjH2K6H6A5wA6Zmy6bt19Kcb-M3Rh-mNgg9J3ACx4lNuyE0Sgfnilbe2xT8RpjNVk_Tm6uQ9gEt_3w1SfDh7hwqh2p4WZ76OQzi43VlNAjwx7pRQ-T3qCSfGpy5Ub1OvR_4jvT2N5dwgtIPVsDwqt8eb4TueRnqc3QWRQMrP8VTrvDOz0j2GGBH_g9InPpA0TWn5rbN744BuThz3__Ik95Zm55Ki7iutxzzcW7McQVSA2FSLkWCnRuD6vNjvHLx0-IRItkRZOIRnwhVm7gsYhSL_IWxxV2aRAeQkYDIhLXF-GeR9Q_RK4b7JPEixZyRTBh2CORS6lLqeN7buCzw4F6CSZeckAeFicuUydN309OXrwupFKlWBnDOFykfC9SZU_tCTFDEH368vHTP0Ws80J-EwUiBLHtolgZ4XJfvirk4VQqrTqLWurUnvwbNGw7PsAHqepjddBHno1P8CHP7HF9PViJQgpl6ddZlEW6Omp9NiVR7S5epT6WeyfOT4jsbMDV1_Jc5P8WsUZkZwEqRHY1xvcV-V8AAAD__yWK_0U">