[llvm] [LoopInterchange] Consider forward/backward dependency in vectorize heuristic (PR #133672)

Mon Jul 28 03:24:39 PDT 2025

================
@@ -1334,21 +1405,34 @@ LoopInterchangeProfitability::isProfitablePerInstrOrderCost() {
 static bool canVectorize(const CharMatrix &DepMatrix, unsigned LoopId) {
   for (const auto &Dep : DepMatrix) {
     char Dir = Dep[LoopId];
-    if (Dir != 'I' && Dir != '=')
-      return false;
+    char DepType = Dep.back();
+    assert((DepType == '<' || DepType == '*') &&
+           "Unexpected element in dependency vector");
+
+    // There are no loop-carried dependencies.
+    if (Dir == '=' || Dir == 'I')
+      continue;
+
+    // DepType being '<' means that this direction vector represents a forward
+    // dependency. In principle, a loop with '<' direction can be vectorized in
+    // this case.
+    if (Dir == '<' && DepType == '<')
+      continue;
+
+    // We cannot prove that the loop is vectorizable.
+    return false;
   }
   return true;
 }
 
 std::optional<bool> LoopInterchangeProfitability::isProfitableForVectorization(
     unsigned InnerLoopId, unsigned OuterLoopId, CharMatrix &DepMatrix) {
-  // If the outer loop is not loop independent it is not profitable to move
-  // this to inner position, since doing so would not enable inner loop
-  // parallelism.
+  // If the outer loop cannot be vectorized, it is not profitable to move this
+  // to inner position.
   if (!canVectorize(DepMatrix, OuterLoopId))
     return false;
 
-  // If inner loop has dependence and outer loop is loop independent then it is
+  // If inner loop cannot be vectorized and outer loop can be then it is
----------------
Meinersbur wrote:

What is "sufficiently complex"? If DA returns "confused" then `canVectorize` has to return false. If it returns `[< = *]` the dependency is carried by the outermost loop, it does not matter what the inner loop does. 
I actually don't know/undestand why `canVectorize` does not look at the parent loop dependencies. Possible because what the outer loops are changes with interchange. At least the loops that are surrounded by both, outer+inner could be considered.

The case you mention is interesting because it is a counterexample to the assumption that if `canVectorize` is pessimistic (never says a loop can be vectorized even though LoopVectorize will not for some reason), it will not cause loop exchanges that would not happen if it was not pessimistic. Anyway, in this case the j-loop looks more likely to be vectorized profitable because `f(k)`/`g(k)` indices would require more complex memory accesses. LoopVectorize can better handle `i` as a "strided access pattern".

I think the comment itself is correct: If the outer one could be vectorized (if moved to the inner position) but the current inner one cannot, swap the outer one to the vectorizable position. For "vectorizable" it just assumes the definition of `canVectorize`. Generally, even a loop is vectorizable in terms of dependencies, LoopVectorize may still consider it unprofitable to vectorize because of the instructions it contains, or the code may actually run slower after vectorization, so "profitable" was never in absolute term and hopefully understood as such by the reader. "can" does not add new information here unless we would mention such concrete situations.

https://github.com/llvm/llvm-project/pull/133672