[llvm] [LoopInterchange] Consider forward/backward dependency in vectorize heuristic (PR #133672)

Wed Jul 30 09:38:37 PDT 2025

================
@@ -1334,21 +1405,34 @@ LoopInterchangeProfitability::isProfitablePerInstrOrderCost() {
 static bool canVectorize(const CharMatrix &DepMatrix, unsigned LoopId) {
   for (const auto &Dep : DepMatrix) {
     char Dir = Dep[LoopId];
-    if (Dir != 'I' && Dir != '=')
-      return false;
+    char DepType = Dep.back();
+    assert((DepType == '<' || DepType == '*') &&
+           "Unexpected element in dependency vector");
+
+    // There are no loop-carried dependencies.
+    if (Dir == '=' || Dir == 'I')
+      continue;
+
+    // DepType being '<' means that this direction vector represents a forward
+    // dependency. In principle, a loop with '<' direction can be vectorized in
+    // this case.
+    if (Dir == '<' && DepType == '<')
+      continue;
+
+    // We cannot prove that the loop is vectorizable.
+    return false;
   }
   return true;
 }
 
 std::optional<bool> LoopInterchangeProfitability::isProfitableForVectorization(
     unsigned InnerLoopId, unsigned OuterLoopId, CharMatrix &DepMatrix) {
-  // If the outer loop is not loop independent it is not profitable to move
-  // this to inner position, since doing so would not enable inner loop
-  // parallelism.
+  // If the outer loop cannot be vectorized, it is not profitable to move this
+  // to inner position.
   if (!canVectorize(DepMatrix, OuterLoopId))
     return false;
 
-  // If inner loop has dependence and outer loop is loop independent then it is
+  // If inner loop cannot be vectorized and outer loop can be then it is
----------------
kasuga-fj wrote:

> Whether it is for fusion is not yet decided when calling depends, but `FullDependence` stores the analysis for both.

IIUC, `FullDependence` objects are not cached anyware. `DependenceInfo` is nearly stateless. Furthermore, `DependenceInfo::depends` returns a `unique_ptr`, hence we cannot cache the result as it is.

> I am not sure refactoring helps. Big part of why it is difficult to understand is the math. The `Pair` also makes it look complex, but it is just matching the access subscript dimensions after delinearization. But I also am also not very happy about adding special cases to an already complex analysis. If you do loop fusion, you may want to support more cases than loops that have excactly the same trip count.

I agree that we can't do much about the mathematical complexity, but I believe the code could be made simpler. It looks to me like there's a fair amount of code duplication, especially when the same processes are executed for `SrcXXX` and `DstXXX` (e.g., [here](https://github.com/llvm/llvm-project/blob/eb9e526af050611ceafa6a53b73d72a7d3ea065c/llvm/lib/Analysis/DependenceAnalysis.cpp#L2522-L2554)). I'm not sure whether this duplication makes the code harder to understand, but I do think it hurts maintainability. I don't believe "Don't Repeat Yourself" is always the right principle, but in this case, I think there are parts of the logic where it does apply.

However, I think the most significant problem is that we don't take wrapping into account. The approach in https://github.com/llvm/llvm-project/pull/116632 seems incorrect to me. We probably need to be more pessimistic with respect to wrapping. I think it makes sense to insert checks for wrap flags where necessary, although that would complicate the code. In fact, there's a case where DependenceAnalysis misses a dependency, probably due to ignoring wraps, as shown below (godbolt: https://godbolt.org/z/hsxWve8s6).

```llvm
; for (i = 0; i < 4; i++)
;   a[i & 1][i & 1] = 0;
define void @f(ptr %a) {
entry:
  br label %loop

loop:
  %i = phi i64 [ 0, %entry ], [ %i.next, %loop ]
  %and = and i64 %i, 1
  %idx = getelementptr [4 x [4 x i8]], ptr %a, i64 0, i64 %and, i64 %and
  store i8 0, ptr %idx
  %i.next = add i64 %i, 1
  %exitcond.not = icmp slt i64 %i.next, 8
  br i1 %exitcond.not, label %loop, label %exit

exit:
  ret void
}
```

```
Printing analysis 'Dependence Analysis' for function 'f':
Src:  store i8 0, ptr %idx, align 1 --> Dst:  store i8 0, ptr %idx, align 1
  da analyze - none!
```



https://github.com/llvm/llvm-project/pull/133672