[Mlir-commits] [llvm] [mlir] andrzej/disable flat transpose scalable (PR #115338)

Thu Nov 7 08:29:53 PST 2024

https://github.com/banach-space created https://github.com/llvm/llvm-project/pull/115338

- **[ValueTracking] Don't special case depth for phi of select (#114996)**
- **[mlir][vector] Disable `vector.flat_transpose` for scalable vectors (#102573)**


>From 71328476021c8898df07f65a20880b7ef7c4dbda Mon Sep 17 00:00:00 2001
From: Nikita Popov <npopov at redhat.com>
Date: Thu, 7 Nov 2024 10:14:28 +0100
Subject: [PATCH 1/2] [ValueTracking] Don't special case depth for phi of
 select (#114996)

As discussed on
https://github.com/llvm/llvm-project/pull/114689#pullrequestreview-2411822612
and following, there is no principled reason why the phi of select case
should have a different recursion limit than the general case. There may
still be fan-out, and there may still be indirect recursion. Revert that
part of #113707.
---
 llvm/lib/Analysis/ValueTracking.cpp | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 37cd4caaca71df..ed3fa35c5b8610 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -1565,20 +1565,12 @@ static void computeKnownBitsFromOperator(const Operator *I,
         // Skip direct self references.
         if (IncValue == P) continue;
 
-        // Recurse, but cap the recursion to one level, because we don't
-        // want to waste time spinning around in loops.
-        // TODO: See if we can base recursion limiter on number of incoming phi
-        // edges so we don't overly clamp analysis.
-        unsigned IncDepth = MaxAnalysisRecursionDepth - 1;
-
         // If the Use is a select of this phi, use the knownbit of the other
         // operand to break the recursion.
         if (auto *SI = dyn_cast<SelectInst>(IncValue)) {
-          if (SI->getTrueValue() == P || SI->getFalseValue() == P) {
+          if (SI->getTrueValue() == P || SI->getFalseValue() == P)
             IncValue = SI->getTrueValue() == P ? SI->getFalseValue()
                                                : SI->getTrueValue();
-            IncDepth = Depth + 1;
-          }
         }
 
         // Change the context instruction to the "edge" that flows into the
@@ -1589,7 +1581,13 @@ static void computeKnownBitsFromOperator(const Operator *I,
         RecQ.CxtI = P->getIncomingBlock(u)->getTerminator();
 
         Known2 = KnownBits(BitWidth);
-        computeKnownBits(IncValue, DemandedElts, Known2, IncDepth, RecQ);
+
+        // Recurse, but cap the recursion to one level, because we don't
+        // want to waste time spinning around in loops.
+        // TODO: See if we can base recursion limiter on number of incoming phi
+        // edges so we don't overly clamp analysis.
+        computeKnownBits(IncValue, DemandedElts, Known2,
+                         MaxAnalysisRecursionDepth - 1, RecQ);
 
         // See if we can further use a conditional branch into the phi
         // to help us determine the range of the value.

>From 2dc8ee0ab6ec0efe33112703ceae4ea34c1d2867 Mon Sep 17 00:00:00 2001
From: Andrzej Warzynski <andrzej.warzynski at arm.com>
Date: Thu, 7 Nov 2024 16:15:51 +0000
Subject: [PATCH 2/2] [mlir][vector] Disable `vector.flat_transpose` for
 scalable vectors (#102573)

Disables `vector.flat_transpose` for scalable vectors. As per the docs:

>  This is the counterpart of llvm.matrix.transpose in MLIR

I'm not aware of any use of any matrix-multiply intrinsics in the
context of scalable vectors, hence disabling.

Note, this is a follow-on for #102573 in which I disabled
`vector.matrix_multiply`.
---
 mlir/include/mlir/Dialect/Vector/IR/VectorOps.td | 8 ++++++--
 mlir/test/Dialect/Vector/invalid.mlir            | 9 +++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
index 3f45d0804e0450..6fe897cca79926 100644
--- a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+++ b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
@@ -2770,11 +2770,11 @@ def Vector_FlatTransposeOp : Vector_Op<"flat_transpose", [Pure,
                  TCresVTEtIsSameAsOpBase<0, 0>>]>,
     Arguments<(
       // TODO: tighten vector element types that make sense.
-      ins VectorOfRankAndType<[1],
+      ins FixedVectorOfRankAndType<[1],
             [AnySignlessInteger, AnySignedInteger, Index, AnyFloat]>:$matrix,
           I32Attr:$rows, I32Attr:$columns)>,
     Results<(
-      outs VectorOfRankAndType<[1],
+      outs FixedVectorOfRankAndType<[1],
              [AnySignlessInteger, AnySignedInteger, Index, AnyFloat]>:$res)> {
   let summary = "Vector matrix transposition on flattened 1-D MLIR vectors";
   let description = [{
@@ -2789,6 +2789,10 @@ def Vector_FlatTransposeOp : Vector_Op<"flat_transpose", [Pure,
     a 2-D matrix with <rows> rows and <columns> columns, and returns the
     transposed matrix in flattened form in 'res'.
 
+    Note, the corresponding LLVM intrinsic, `@llvm.matrix.transpose.*`, does not
+    support scalable vectors. Hence, this Op is only available for fixed-width
+    vectors. Also see:
+
     Also see:
 
     http://llvm.org/docs/LangRef.html#llvm-matrix-transpose-intrinsic
diff --git a/mlir/test/Dialect/Vector/invalid.mlir b/mlir/test/Dialect/Vector/invalid.mlir
index 56039d04549aa5..d591c60acb64e7 100644
--- a/mlir/test/Dialect/Vector/invalid.mlir
+++ b/mlir/test/Dialect/Vector/invalid.mlir
@@ -1900,3 +1900,12 @@ func.func @matrix_multiply_scalable(%a: vector<[4]xf64>, %b: vector<4xf64>) {
 
   return
 }
+
+// -----
+
+func.func @flat_transpose_scalable(%arg0: vector<[16]xf32>) -> vector<[16]xf32> {
+  // expected-error @+1 {{'vector.flat_transpose' op operand #0 must be fixed-length vector of signless integer or signed integer or index or floating-point values of ranks 1, but got 'vector<[16]xf32>'}}
+  %0 = vector.flat_transpose %arg0 { rows = 4: i32, columns = 4: i32 }
+     : vector<[16]xf32> -> vector<[16]xf32>
+  return %0 : vector<[16]xf32>
+}