<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/59766>59766</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Very inefficient SIMD for Loop nest optimization in x86-64
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
kbuyukakyuz
</td>
</tr>
</table>
<pre>
Hello,
Here is a simple code for L1 cache optimization using tiling loops
```
// Loop tiling
void tiled_loop() {
for (int i = 0; i < N; i += blockSize) {
for (int j = 0; j < blockSize; j++) {
sum += array[i + j];
}
}
}
```
Here is the generated [assembly code](https://godbolt.org/z/P71M67hxs) with ` x86-64 clang 15.00-O3 -march=tigerlake` versus the corresponding GCC generated assembly code.
As one can see from the assembly code GCC is performing significantly better than clang.
Interestingly, clang is up to 10x slower with a tiled loop compared to the normal array-filling methods. Here are my benchmarks for `N = 1000000 and blocksize = 16`
```
gcc Original loop: 0.0422719 seconds
gcc Tiled loop: 0.0250076 seconds
clang Original loop: 0.0292406 seconds
clang Tiled loop: 0.173324 seconds
```
The specific values that I get in my local machine are not very important. But clang's assembly shouldn't be this bloated
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJxsVF1vozoQ_TXOyyjImK_kgYcmUe5W2o8r7eq-XhkzATfGRrbpNvn1K-N006AiBMY-PnNm5mDunOw0Yk2KHSkOKz753tj63EyX6czPl-m6akx7qb-gUoawPaEHQp_i8wtaBOmAg5PDqBCEaRFOxsLXFAQXPYIZvRzklXtpNExO6g68VOGljBndja2ktzt-siNhR_hqzHgDx_lXI9swge3_YTNhG8K2QKpdXIY5MmEbqT1IINkBKMl283AP329DtgsLjTLi_FNeccHwwPFy53iZOe67wlSgCveCAMBNw3scbi2_kGI3B4YXUhxI9gFLqsP7x9_hfbCoykPJfY_QoUbLPbZAih13DodGXeYehDhs03s_OpLd6tmZtjHKJ8Z2hB2vhB3_rdJvZdW_uZDDb-l7ICWFt025LnMQiusO0iKhdP0jg_XArehJdvCyQ6v4GQP2Fa2bohhhrEU3Gt2G5v6z33-Q96AtgY_5PDkwGkFwDQ4RTtYMM93DlplOOhjRnowdQoBgWnmSgmuvLtCg92jB91xH4cnHGM_ao0Xnpe7UhbD9LTfpYBrBG0jpGzhlfqONVeDRZLNDQZhh5BbbAAzCtLEDV7Gx65NUs5cH9L1pXQJze7hFGIIoLfqB27OLpirp99lQKZ0vAK7baCknrxiXykW7Fx7ohIAfVnZSczXLI9kT0ITmjFXpFhwKo1t3x_76m8cNyApKq_IRGJ-xJp-Rsy3L6WJPRC_p0yrLWL5g_8zFv3oEN6IIHYRXriYMJuIenqFDD1KH-ikjuIKBi17qWFRtfLDcBeQwGuu59gnAbvKxoYRV7u4b15tJtZqwykOD4HvpQrGDH6OGVVtn7Tbb8hXWaVmxfLuh22LV15ylJ1rkeV7xAqkQWVVykXPeYMaqbYUrWTPKWMqyNM3zTVokm02b0rKpGixOPM9bklMcuFSJUq9D-OFW0rkJ62JbleVK8QaVez9vbR1A62bqHMmpks67-zYvvcL6vzlljaeTFBK1h5_P3w7xmA0O1ej84zkr9e0vXk1W1YtjQPp-ahJhBsKOIdDttR6teUHhCTvOYh1hx1nvnwAAAP__Rujhgg">