<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/139448>139448</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Forcing Clang (via LLVM) to Maximize AVX512 (zmm) Register Usage
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            clang
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          TheBlackPlague
      </td>
    </tr>
</table>

<pre>
    I've a case where if a function is inlined into another function, the LLVM-generated assembly will use all SIMD registers on AVX512, from `zmm0` to `zmm31`. In another case, where it's not inlined, it will only use `zmm0` to `zmm4`. 

The issue is that, due to the way LLVM's generated assembly works, when all registers are available, it automatically splits some code into a more cache-friendly version. This is amazing and halves the time taken by the function.

However, it emits the cache-unfriendly version when it doesn't use all the registers.

I've tried reproducing the cache-friendly version using arrays of `__m512i`, but to no avail; it is always the cache-unfriendly version. I'm wondering if there's a way to force Clang to use all `zmm` registers (so that it can see a better assembly can be generated). 

I'm using LLVM 20.1.3 via `clang++` and compiling with `-O3 -DNDEBUG`.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJx8VNFu4zYQ_Br6ZWFBpGzZetBDcqnbAJe2aHOHvh1W1FpiQ5EGSdl1vr5YSbGB9nCAYcPS7s7MzpAYo-kcUS22j2L7tMIx9T7Urz09WtRvv1vsRlo1vr3Wz0LtzgQIGiPBpadAYI6AcBydTsY7MBGMs8ZRC8YlD-h86incCoT6BKkn-Pz568u6I0cBE7WAMdLQ2CtcjLUwRgK0Fv58fnmCQJ2JiUIE7-Dh619bqXjIMfgBRJm_D0MuyhySX_4VUpR5Bs_uhs1kuWXhm4TaRXA-fTDldybN0N7Z64T_vdGbabLIH0T-8NoTmBhH_obUY-Ip7UhczQIveJ1ETmDfE-rDW1xYuUntXSgGAjyjsdhYWtjhmPyAyWi09grxZE2KEP1AoH1Ly7Jh8IFAo-5pfQyGXGuvcKYQjXcZvPbsTgQc8N24DtC10KM9U5wYJzMQJHwjB811evJhWjZL_sVf6ExhIUQDM-CyGW90_0WcpZkErafohNqlm7PcdtO7jF_ClYKhFgKdgm9HzTzvGP9DGOMkJAS8RvBH9unbt2ErlRFlzkSbMbEjzs8LFcUjE-Il2Av3_Ih_BkxpgIt3LQUGMkduCDSZipPHycPRB03wySJz9TeNc2Y4QHdnhdpHP8WFaWh0EInPU0MpUbing980dM-NUNVH8GZOs3AOGKg8k1kBZ4MMqZmGUI_8KfPJY-2Hk7FcfzGp56L1bwWsn359-unxy8-c6VVbF21VVLiiWu42ZVXut1Ku-pp2Ras3uMN2n2tVSWoK3FKpi6qUe5TVytQqV9t8K6UscrmVWUW6aWUpqyPtNliQ2OQ0oLGZtech86FbTaemlkW12exXFhuycbp6lFq4K76FQs0N62bsotjk1sQU7yOSSZbqgw9TPubNC7XnFcxnrmIjXvAfM5h3Wm4NrmBDVAV_LIbAl4gdrcZg6z6lUxTFg1AHoQ6dSf3YZNoPQh0YdvlZn4L_m3QS6jDJiEIdFiXnWv0bAAD__y2YwSQ">