[llvm] [LoopInterchange] Consider forward/backward dependency in vectorize heuristic (PR #133672)

Mon Jul 28 05:09:04 PDT 2025

================
@@ -1334,21 +1405,34 @@ LoopInterchangeProfitability::isProfitablePerInstrOrderCost() {
 static bool canVectorize(const CharMatrix &DepMatrix, unsigned LoopId) {
   for (const auto &Dep : DepMatrix) {
     char Dir = Dep[LoopId];
-    if (Dir != 'I' && Dir != '=')
-      return false;
+    char DepType = Dep.back();
+    assert((DepType == '<' || DepType == '*') &&
+           "Unexpected element in dependency vector");
+
+    // There are no loop-carried dependencies.
+    if (Dir == '=' || Dir == 'I')
+      continue;
+
+    // DepType being '<' means that this direction vector represents a forward
+    // dependency. In principle, a loop with '<' direction can be vectorized in
+    // this case.
+    if (Dir == '<' && DepType == '<')
+      continue;
+
+    // We cannot prove that the loop is vectorizable.
+    return false;
   }
   return true;
 }
 
 std::optional<bool> LoopInterchangeProfitability::isProfitableForVectorization(
     unsigned InnerLoopId, unsigned OuterLoopId, CharMatrix &DepMatrix) {
-  // If the outer loop is not loop independent it is not profitable to move
-  // this to inner position, since doing so would not enable inner loop
-  // parallelism.
+  // If the outer loop cannot be vectorized, it is not profitable to move this
+  // to inner position.
   if (!canVectorize(DepMatrix, OuterLoopId))
     return false;
 
-  // If inner loop has dependence and outer loop is loop independent then it is
+  // If inner loop cannot be vectorized and outer loop can be then it is
----------------
kasuga-fj wrote:

> What is "sufficiently complex"? If DA returns "confused" then `canVectorize` has to return false. If it returns `[< = *]` the dependency is carried by the outermost loop, it does not matter what the inner loop does.

I tried to say the latter one. Just as you mentioned, I was assuming a case where DA returns `[< = *]`.

I hadn't really been conscious of it, but as you pointed out, this is a case where pessimistic heuristics lead to an interchange that wouldn't have happened if they hadn't been pessimistic (and in this specific case, moving the j-loop would be profitable for vectorization because the memory access pattern is simpler) I personally think that the interchange should not happen in this case, since we currently don't take the vectorization cost into account. Checking dependencies of the surrounding loops seems basically like a good idea, but I'm not confident whether that might lead to other unintended transformations. Using the same cost model as LoopVectorize seems like an ideal solution, but it feels challenging.

> For "vectorizable" it just assumes the definition of `canVectorize`.

As for the comment here, this explanation made the most sense to me. Thanks for clarifying!

https://github.com/llvm/llvm-project/pull/133672