[Mlir-commits] [mlir] 2affcd6 - [MLIR] Fix affine fusion bug/efficiency issue / enable more fusion

Wed May 6 22:31:15 PDT 2020

Author: Uday Bondhugula
Date: 2020-05-07T10:51:34+05:30
New Revision: 2affcd664e6a1cab81dac284110bcf9a010d30b2

URL: https://github.com/llvm/llvm-project/commit/2affcd664e6a1cab81dac284110bcf9a010d30b2
DIFF: https://github.com/llvm/llvm-project/commit/2affcd664e6a1cab81dac284110bcf9a010d30b2.diff

LOG: [MLIR] Fix affine fusion bug/efficiency issue / enable more fusion

The list of destination load ops while evaluating producer-consumer
fusion wasn't being maintained as a set, and as such, duplicate load ops
were being added to it. Although this is harmless correctness-wise, it's
a killer efficiency-wise and it prevents interesting/useful fusions
(including for eg. reshapes into a matmul). The reason the latter
fusions would be missed is that a slice union would be unnecessarily
needed due to the duplicate load ops on a memref added to the 'dst
loads' list. Since slice union is unimplemented for the local var case,
a single destination load op that leads to local vars (like a floordiv /
mod producing fusion), a common case, would not get fused due to an
unnecessary union being tried with itself.  (The union would actually be
the same thing but we would bail out.)

Besides the above, this would also significantly speed up fusion as all
the unnecessary slice computations / unions, checks, etc. due to the
duplicates go away.

Differential Revision: https://reviews.llvm.org/D79547

Added: 
    

Modified: 
    mlir/lib/Transforms/LoopFusion.cpp
    mlir/test/Transforms/loop-fusion.mlir

Removed: 
    


################################################################################
diff  --git a/mlir/lib/Transforms/LoopFusion.cpp b/mlir/lib/Transforms/LoopFusion.cpp
index c8c33f345496..72dfc1d62faf 100644

--- a/mlir/lib/Transforms/LoopFusion.cpp
+++ b/mlir/lib/Transforms/LoopFusion.cpp
@@ -1625,7 +1625,10 @@ struct GreedyFusion {
             // continue fusing based on new operands.
             for (auto *loadOpInst : dstLoopCollector.loadOpInsts) {
               auto loadMemRef = cast<AffineLoadOp>(loadOpInst).getMemRef();
-              if (visitedMemrefs.count(loadMemRef) == 0)
+              // NOTE: Change 'loads' to a hash set in case efficiency is an
+              // issue. We still use a vector since it's expected to be small.
+              if (visitedMemrefs.count(loadMemRef) == 0 &&
+                  !llvm::is_contained(loads, loadOpInst))
                 loads.push_back(loadOpInst);
             }
 

diff  --git a/mlir/test/Transforms/loop-fusion.mlir b/mlir/test/Transforms/loop-fusion.mlir
index d19aa5e5558b..866925b4a9f7 100644
--- a/mlir/test/Transforms/loop-fusion.mlir
+++ b/mlir/test/Transforms/loop-fusion.mlir
@@ -2422,5 +2422,45 @@ func @should_fuse_producer_with_multi_outgoing_edges(%a : memref<1xf32>, %b : me
   // CHECK-NEXT: affine.store %{{.*}}, %[[A]]
   // CHECK-NEXT: affine.load %[[B]]
   // CHECK-NOT: affine.for %{{.*}}
+  // CHECK: return
   return
 }
+
+// -----
+
+// MAXIMAL-LABEL: func @reshape_into_matmul
+func @reshape_into_matmul(%lhs : memref<1024x1024xf32>,
+              %R: memref<16x64x1024xf32>, %out: memref<1024x1024xf32>) {
+  %rhs = alloc() :  memref<1024x1024xf32>
+
+  // Reshape from 3-d to 2-d.
+  affine.for %i0 = 0 to 16 {
+    affine.for %i1 = 0 to 64 {
+      affine.for %k = 0 to 1024 {
+        %v = affine.load %R[%i0, %i1, %k] : memref<16x64x1024xf32>
+        affine.store %v, %rhs[64*%i0 + %i1, %k] : memref<1024x1024xf32>
+      }
+    }
+  }
+
+  // Matmul.
+  affine.for %i = 0 to 1024 {
+    affine.for %j = 0 to 1024 {
+      affine.for %k = 0 to 1024 {
+        %0 = affine.load %rhs[%k, %j] : memref<1024x1024xf32>
+        %1 = affine.load %lhs[%i, %k] : memref<1024x1024xf32>
+        %2 = mulf %1, %0 : f32
+        %3 = affine.load %out[%i, %j] : memref<1024x1024xf32>
+        %4 = addf %3, %2 : f32
+        affine.store %4, %out[%i, %j] : memref<1024x1024xf32>
+      }
+    }
+  }
+  return
+}
+// MAXIMAL-NEXT: alloc
+// MAXIMAL-NEXT: affine.for
+// MAXIMAL-NEXT:   affine.for
+// MAXIMAL-NEXT:     affine.for
+// MAXIMAL-NOT:      affine.for
+// MAXIMAL:      return