<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/96701>96701</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
-march=native causes performance regression
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
jabraham17
</td>
</tr>
</table>
<pre>
I am noticing a performance regression when comparing `-march=none` to `-march=native` with the following kernel.
```c
// compiled with -ffast-math
int64_t nums[N];
float kernel() {
float sum = 0;
for (long i = 0; i < N; i ++) {
const float x = nums[i] % 3 == 0 ?
(nums[i] % 5 == 0 ? A : B) :
(nums[i] % 4 == 0 ? C : D);
const float rx = 1.0f / x;
sum += rx;
}
return sum;
}
```
Compiling with`-march=native` results in about a 1.3x slowdown on my system (AMD `znver3`).
See [this link](https://godbolt.org/z/3h8bo36Wq) for an assembly comparison. I compiled all with `-mavx2` so that it was more of an apples-to-apples comparison, this way both the versions run with VEX instructions (`vdivss`/`vaddss`). I think the issue is that with `-march=native`/`-march=znver3`, more aggressive unrolling is done that hurts performance.
<details>
<summary>Full kernel that runs timing</summary>
```c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <stdint.h>
#define A 1.0f
#define B 20.0f
#define C 25.6f
#define D 24.0f
#ifndef N
#define N 100000000
#endif
#ifndef iters
#define iters 100
#endif
#ifndef ARRAY_TYPE
#define ARRAY_TYPE int64_t
#endif
ARRAY_TYPE nums[N];
float kernel() {
float sum = 0;
for (long i = 0; i < N; i ++) {
const float x = nums[i] % 3 == 0 ?
(nums[i] % 5 == 0 ? A : B) :
(nums[i] % 4 == 0 ? C : D);
const float rx = 1.0f / x;
sum += rx;
}
return sum;
}
void c_version(int initArray, int printTime, int printCorrectness) {
if (initArray) {
#ifdef seed
srand(seed);
#endif
for (long i = N-1; i >= N; i--) {
nums[i] = rand();
}
}
float dest[iters];
struct timespec start_time, end_time;
clock_gettime(CLOCK_MONOTONIC_RAW, &start_time);
for (int i = 0; i < iters; i++) {
dest[i] = kernel();
}
clock_gettime(CLOCK_MONOTONIC_RAW, &end_time);
float elapsed = (end_time.tv_sec - start_time.tv_sec)+ (end_time.tv_nsec - start_time.tv_nsec) / 1000000000.0;
if (printTime) printf("c Time: %f\n", elapsed);
float sum = 0.0;
for (int i = 0; i < iters; i++) {
sum += dest[i];
}
if (printCorrectness) printf("%f\n", sum);
}
int main() {
c_version(1, 1, 1);
return 0;
}
```
</details>
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzcV02P4jgT_jXmUgIFh0A4cIDQSKP3HXo1O9rZOSEncYinHZu1HWjm169shxACjFp7241a6aS-XB9PVQqiNdsLShcoWqFoPSC1KaVa_CCpIiWpxrNBKvPz4hOQCoQ0LGNiDwQOVBVSVURkFBTdK6o1kwJOJRWQyepAlJVD02BYEZWVKFwLKSiaBmDkLZkYdnSMEzMlmJJCITmXJ6v_RpWgfISCNQqWzX0a-L-seccbhDfuTMZp7q0Mi4JoM6yIKb0UE2Y62RkQdaVRtNqiaI3ClecVXBLTHIVwjPAc0KzhAXiuritA4RqCVgugkAoQjrkUe2At1z0msG0e8cr93Zi0VyaFNo3xd6fcuMZQtAaEIwgt1VkFFG66uvZCOO4rRDcKsAQULsGfHS4_oD-51U-c_hrheSfmvuvK-z4eBQXYMrz3ZF3e8MrKqC4PzdaXR0VNrYSVbPktty12FwGJK7WFh631YywpqmtuNDABJJW1AQLjUfgOmstTLk8CpIDqDPqsDbUexsvPawvLn-JIVWgPxPMb2P1OKaBoZUqmgTPxZgGE49KYg7bZdSDcyzyV3Iyk2iO8-YnwJizjVIbTb3_ZKli8EAFEa1ql_HxpEy3FCD5dAUw49yD2kR3fsQ1ISzAlMcAMnIiGSioKsnD2DgdO9dDIoX_q2EU4AefxiZwhlU17HamyvapB1cKf9MfLn8CENqrOjOPYNpgGx5wdtXbJ2NhXkufN69x6bEom3pxFpnVt797FjvO9sng7Lb2T7MRHRPZ-khwp1EJJ7qrMNORSUG-8rJXR3elzOxzCJKeGMK5R-NKSdF1VRJ1R-LKpOW863ZtTtdBgWMXEHoUJwpur7K-HTshExuuc2mbXJmdyVF51-lzO0qdswyr6XPesEd78WsTkTJgu-yKU04IJCkvXnX3qCnDwgJwAjkbTO_Ia8OQq3XpRiJwWsO1Lb2EcNFfLoiJnRV-TGap0X9sRrYWHuj0Lyy9flt93X7__9nIXdsuBZvg_t9eRffSB-MBn4r_1lfiXfBzc_ShZDtmuGWoIx0wYYIKZpVLkbCeLJRwUE-Yrq-gNIZFK0cwIqvVN8i-nM-t33DHWK5DDoYWhpjS_krUiIkc4dtRehm7hZyn3ANkOxxeIvLh39zYcPgDITZlsHv3Jd6deM9pL32XByak21pDryDvsA_iPgx2VVB9oBtoQZXamySgVuX9uD824zN52e2q8SJz8_zX53-7z6_b16-v2U7L7svxmFRGedi3N7xvHlbPfN95N-_qkdS7hXPLSbduHSPu4v22st966LFJODprm7kiE44voyBx3mmYw7KStoVkreNUXFo-khRd3rdQO2GDUHTYesB2wzz3QCxc5zmxqHCO0TR4VKEoEwtiV0LvejerJCjx6MN7-WZU6A6BTsMeToBNar227EfaCsnOjG9At9q3TUBEm7rf-7kAZW0uXW7fmzXAKntl_uL36NaO_pQzyRZjPwzkZ0MV4Np6HURBHwaBcFNE4nExSPMazIkjzgBZRGmZpMMUzQop5PGALHOBJMMWRlQmDUUpwVEzicYancYzzGZoEtCKMjzg_VnY3Hbh1bTGfzoLxgJOUcu1-_GEs6MnvcjZ90XqgFlZnmNZ7jSYBZ9roqxXDDKeL3pIHGak11U9-Hg5qxRe9nZmZsk5HmawQ3ljbzb_hQckfNDMIb5xHdg_yHh8X-O8AAAD__wzfIAI">