<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/119506>119506</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
LLVM Loop Interchange memory dependency issue in the Legality phase
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
SubhamHaldar
</td>
</tr>
</table>
<pre>
@meghakrgowda
We are experimenting with `matrix multiplication` application, here's the code
```
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 3000
#define M 3000
#define K 3000
void matrixMultiply(int **A, int **B, int **C) {
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
for (int k = 0; k < N; k++) {
C[i][j] += A[i][k] * B[k][j];
}
}
}
}
int main() {
int **A, **B, **C;
A = (int **)malloc(N * sizeof(int *));
B = (int **)malloc(N * sizeof(int *));
C = (int **)malloc(N * sizeof(int *));
for (int i = 0; i < N; i++) {
A[i] = (int *)malloc(N * sizeof(int));
B[i] = (int *)malloc(N * sizeof(int));
C[i] = (int *)malloc(N * sizeof(int));
}
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
A[i][j] = i + j;
B[i][j] = i - j;
C[i][j] = 0;
}
}
clock_t start_time = clock();
matrixMultiply(A, B, C);
clock_t end_time = clock();
double execution_time = ((double) (end_time - start_time)) / CLOCKS_PER_SEC;
int allEleSum=0;
printf("Result Matrix:\n");
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
allEleSum=(allEleSum+C[i][j])%1024;
}
}
printf("Execution Time: %f seconds\n", execution_time);
printf("Sum of all elements of resultant matrix is: %d\n", allEleSum);
for (int i = 0; i < N; i++) {
free(A[i]);
free(B[i]);
free(C[i]);
}
free(A);
free(B);
free(C);
return 0;
}
```
While compiling this above with O3 optimization level with debug information we found this
```
Processing LoopList of size = 3
Found 7 Loads and Stores to analyze
Found anti dependency between Src and Dst
Src: %2 = load i32, ptr %arrayidx10, align 4, !tbaa !10
Dst: store i32 %add, ptr %arrayidx18, align 4, !tbaa !10
Found anti dependency between Src and Dst
Src: %4 = load i32, ptr %arrayidx14, align 4, !tbaa !10
Dst: store i32 %add, ptr %arrayidx18, align 4, !tbaa !10
Found output dependency between Src and Dst
Src: store i32 %add, ptr %arrayidx18, align 4, !tbaa !10
Dst: store i32 %add, ptr %arrayidx18, align 4, !tbaa !10
Found flow dependency between Src and Dst
Src: store i32 %add, ptr %arrayidx18, align 4, !tbaa !10
Dst: %arrayidx18.promoted = load i32, ptr %arrayidx18, align 4, !tbaa !10
Processing InnerLoopId = 2 and OuterLoopId = 1
Failed interchange InnerLoopId = 2 and OuterLoopId = 1 due to dependence
Not interchanging loops. Cannot prove legality.
Processing InnerLoopId = 1 and OuterLoopId = 0
Failed interchange InnerLoopId = 1 and OuterLoopId = 0 due to dependence
Not interchanging loops. Cannot prove legality.
Loop doesn't contain minimum nesting level.
Processing LoopList of size = 2
Found 6 Loads and Stores to analyze
Found output dependency between Src and Dst
Src: store i32 %9, ptr %arrayidx22, align 4, !tbaa !12
Dst: store i32 %9, ptr %arrayidx22, align 4, !tbaa !12
Found output dependency between Src and Dst
Src: store i32 %9, ptr %arrayidx22, align 4, !tbaa !12
Dst: store i32 %11, ptr %arrayidx26, align 4, !tbaa !12
Found output dependency between Src and Dst
Src: store i32 %9, ptr %arrayidx22, align 4, !tbaa !12
Dst: store i32 0, ptr %arrayidx30, align 4, !tbaa !12
Found output dependency between Src and Dst
Src: store i32 %11, ptr %arrayidx26, align 4, !tbaa !12
Dst: store i32 %11, ptr %arrayidx26, align 4, !tbaa !12
Found output dependency between Src and Dst
Src: store i32 %11, ptr %arrayidx26, align 4, !tbaa !12
Dst: store i32 0, ptr %arrayidx30, align 4, !tbaa !12
Found output dependency between Src and Dst
Src: store i32 0, ptr %arrayidx30, align 4, !tbaa !12
Dst: store i32 0, ptr %arrayidx30, align 4, !tbaa !12
Processing InnerLoopId = 1 and OuterLoopId = 0
Failed interchange InnerLoopId = 1 and OuterLoopId = 0 due to dependence
Not interchanging loops. Cannot prove legality.
Processing LoopList of size = 3
Found 7 Loads and Stores to analyze
Found anti dependency between Src and Dst
Src: %arrayidx18.promoted.i = load i32, ptr %arrayidx18.i, align 4, !tbaa !12
Dst: store i32 %add.i, ptr %arrayidx18.i, align 4, !tbaa !12
Found anti dependency between Src and Dst
Src: %5 = load i32, ptr %arrayidx10.i, align 4, !tbaa !12
Dst: store i32 %add.i, ptr %arrayidx18.i, align 4, !tbaa !12
Found anti dependency between Src and Dst
Src: %7 = load i32, ptr %arrayidx14.i, align 4, !tbaa !12
Dst: store i32 %add.i, ptr %arrayidx18.i, align 4, !tbaa !12
Found output dependency between Src and Dst
Src: store i32 %add.i, ptr %arrayidx18.i, align 4, !tbaa !12
Dst: store i32 %add.i, ptr %arrayidx18.i, align 4, !tbaa !12
Processing InnerLoopId = 2 and OuterLoopId = 1
Failed interchange InnerLoopId = 2 and OuterLoopId = 1 due to dependence
Not interchanging loops. Cannot prove legality.
Processing InnerLoopId = 1 and OuterLoopId = 0
Failed interchange InnerLoopId = 1 and OuterLoopId = 0 due to dependence
Not interchanging loops. Cannot prove legality.
Processing LoopList of size = 2
Found 2 Loads and Stores to analyze
Processing InnerLoopId = 1 and OuterLoopId = 0
Failed to recognize PHI as an induction or reduction.
Only outer loops with induction or reduction PHI nodes are supported currently.
Not legal because of current transform limitation
Not interchanging loops. Cannot prove legality.
Loop doesn't contain minimum nesting level.
```
The legality check is being failed at each of nested loop levels due to dependency as shown in the above output. The dependencies encountered are also including out and flow dependencies. The same issue is also raised in LLVM discourse (https://discourse.llvm.org/t/issues-with-the-loop-interchange-pass/81334).
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzsWVuP4rgS_jXmpQQKDtcHHrg02tH2XLS9Ovs4MnFB3O3Yke10D_Prj-wEEmim6aE5Z3qkRRHgS31V9VWV7STMWrFRiBPSn5H-osUKl2ozuStWKcv-YJIz01ppvp2QXpThJmUPZqOfOAMSTf9BYAYBv-VoRIbKCbWBJ-FSIIMoY86Ib5AV0olcioQ5oRUZRMDyuknnkKJBQocWXIqQaI4kmlbXIKquaEpoLFQiC45A4rl1XOhOSuKbU0NSrE6POZHhfiQMclwLhfAJ4iiKDro-Pu_6c98VTR-14FC6-LH0cEvoSCgHhE4JnU69a3VzdticEzoGMpyRaAprbaASFUDiBUQknoW_c_gU_hI6C9depPo0JO9ryfta8v6ZZEPkoRZ5qEUefqRs95mT_kyQ_oL0Z_ekvwA_OV7AtO5-KLunMKsa1VQSH-OR4aLRU7eqf-UPiabe3IwJReioNqyae0R5g-4d1XFj9jQ4fRApQscZk1InhI4-Bbut-I563Zw09tfe_Nk1QOZvBrk8cXbBembCS_qPHQhMXAtofg2gfb4cFcdr2bmgnhqf6XFhxAuvjnqx0sTZyRnt_fhLVVZZ80K5VI1E6uThqwPrmHFf_YoXhEN3WT8HBfFsCQtVFOpnfjx3B42KvwDsZ3JdrKTfGDAp_DpfTw8zR-V44JGO9nDthtFlbIHQJcxvP8__vPv65eavr3c381qJjxOT8kbiXZGReNHgJzdCuXXQRf9CW0gHH4OjJJ6S_lwRSq9RRz-ZMHAU5KbxhI7qJp0dxT-w0e9GtHc-B5q-3-wCAH97TuMpENpfg8VEK273TMyPInUQyybeXZGBXnvDASX6Hd_6tgkUs7BKh01f2EoXb-ho-NeAfwP7BjEkbEXV8ZpSjc-ejVcD81OCNaN7-ANrd5gnOucHnQZdYdSuaBvbWfNQs7v-SYX0h58sF9KfoVwqLLCVfsTyOPU5Bp07kYnv4dgEEh9RlkMcV8UGhFprk5WDTwhrXSheojw_Sn0xOkFrvZ5brfNbYZ0Pol9ZQwBiEk2XAWAIt5pxC0xxuHPaoAWngSkmt99xP4spJ4BjjoqjSrawQveEqODOJEFyYZ13_84kPil8VtCgR2rGQcQhN3Lns6DPjGFbwb91ozJhxEZBr9zLu27FmP_teh8CqEez3iyPEsQ5PwU2OgN2qRu9s270XunG9bzQhcsL9xN-XEP1_yAaa6mffpUXB0Kd3OhMO-Rng30Ov1F3H5RC44vvQwlLg0-fC3fQ2_VsMCGR-70OTZIytcHXCgMv0JfrnkRfsZ-0a2B5U6TWue3AnCmlHeTGrzkSN0wKt-28bHX3pOLodVb_QPhKVntI4BqtInToINHKMaEgE0pkRQYKbbhXDeto5xVrIt0n5uBVa-LbynB8Ir12e-jJ9KI_LMKLoN6ZD93uKazBe3eiRopOwMQvbXHXsv8y5t5bFK7mxa-Jw0Var2X977l8_9oj6un9vyPOnwA64qLSYpyXkj8PeOnZtX_-CP77ODM8fxD_Vc68-Tx-qfof3Vpcivfv6fX9LH91etEzy99b_HcaDCZ6o7zaL398AOb1gFC8SMKDBm3AYNXwZn9WcusTHk3pYfmA4vT8AKg0RxteIdkiz7Xxd1lJYQwqJwMRnrpADKwwYYVFT0M1A5xhyq61yUCKTLjyhdL_42bh5COcv9MaFJIUkwcQFlboJdclocwBsiT1PnhM5MGyEtc-S52t59um-slTHl6OlQ-EyhWlA17hfrJAC6gSXXjHvSqDwKTVUL798kbowoWIH95ZC7QllGUZgrC28N-lrGHChjqA29v_fAQubKILYxEIHaXO5ZbEU0KXhC73Qx0pH7OONhtCl47QZQC0bZ8IbZdi2zvcbhRWO2fWErocdeO4R-i4Ay0-ifk4HrMWTrrDOB6Ne-M-baWTwXC16uF4HfNBj44GvMd7jCfxkEa9VTfBYUtMaER7XX-33afjLu0MsD8a9ZBFuGbdFV2TXoQZE3JvZCuYN-l2x_1o0JJshdKG96CUKnwq2SCUkv6iZSZeqL0qNpb0IimsszWME07iJJAUculDY-XIMNNm2wxrRXIZ09tdyuQps9gqjJwcUrsRLi1WnURnhC69xuqnnRt9j0lNMqHLypHHCf1vAAAA__-NxOj8">