[llvm] [LoopInterchange] Consider forward/backward dependency in vectorize heuristic (PR #133672)

Wed Jul 30 08:07:58 PDT 2025

================
@@ -1334,21 +1405,34 @@ LoopInterchangeProfitability::isProfitablePerInstrOrderCost() {
 static bool canVectorize(const CharMatrix &DepMatrix, unsigned LoopId) {
   for (const auto &Dep : DepMatrix) {
     char Dir = Dep[LoopId];
-    if (Dir != 'I' && Dir != '=')
-      return false;
+    char DepType = Dep.back();
+    assert((DepType == '<' || DepType == '*') &&
+           "Unexpected element in dependency vector");
+
+    // There are no loop-carried dependencies.
+    if (Dir == '=' || Dir == 'I')
+      continue;
+
+    // DepType being '<' means that this direction vector represents a forward
+    // dependency. In principle, a loop with '<' direction can be vectorized in
+    // this case.
+    if (Dir == '<' && DepType == '<')
+      continue;
+
+    // We cannot prove that the loop is vectorizable.
+    return false;
   }
   return true;
 }
 
 std::optional<bool> LoopInterchangeProfitability::isProfitableForVectorization(
     unsigned InnerLoopId, unsigned OuterLoopId, CharMatrix &DepMatrix) {
-  // If the outer loop is not loop independent it is not profitable to move
-  // this to inner position, since doing so would not enable inner loop
-  // parallelism.
+  // If the outer loop cannot be vectorized, it is not profitable to move this
+  // to inner position.
   if (!canVectorize(DepMatrix, OuterLoopId))
     return false;
 
-  // If inner loop has dependence and outer loop is loop independent then it is
+  // If inner loop cannot be vectorized and outer loop can be then it is
----------------
Meinersbur wrote:

> After reading this comment, I noticed that the patch introduces additional analysis for loop fusion even though the client doesn't require it. I initially expected an argument to be added (such as `depends(Src, Dst, /*ForFusion=*/true)`), but that doesn't seem to be the case. Tough, controlling the analysis behavior via flags could complicate caching and reusing results across different passes.

Whether it is for fusion is not yet decided when calling depends, but `FullDependence` stores the analysis for both.


> By the way, I've recently been reading DependenceAnalysis.cpp, and noticed that; it's already quite complex and potentially buggy. I'm fairly certain it should be refactored before adding any new features.

The principle is straightforward; when processing one of the two fused loops, process them as the same. Since an expression can only be in one of the loops, no ambiguity arises. Only when processing the relationship between two statements you need to decide whether you want to treat them as the same or sequential loops.

I am not sure refactoring helps. Big part of why it is difficult to understand is the math. The `Pair` also makes it look complex, but it is just matching the access subscript dimensions after delinearization. But I also am also not very happy about adding special cases to an already complex analysis. If you do loop fusion, you may want to support more cases than loops that have excactly the same trip count.

https://github.com/llvm/llvm-project/pull/133672