[Mlir-commits] [mlir] [mlir][Transforms][NFC] Turn op/block arg replacements into `IRRewrite`s (PR #81757)

Mon Feb 26 06:49:07 PST 2024

================
@@ -1462,7 +1490,12 @@ void ConversionPatternRewriterImpl::notifyOperationInserted(
 void ConversionPatternRewriterImpl::notifyOpReplaced(Operation *op,
                                                      ValueRange newValues) {
   assert(newValues.size() == op->getNumResults());
-  assert(!replacements.count(op) && "operation was already replaced");
+#ifndef NDEBUG
----------------
jeanPerier wrote:

I can confirm that this commit slows down the above FIR to MLIR LLVM IR dialect conversion pass for the Fortran program that Slava gave (*) by 3.5x in release mode too, and that https://github.com/llvm/llvm-project/pull/81759 adds another 1.5x slowdown on that (from 1.8s to 8.8s on my machine in release mode in with the two patch).

Using perf, it seems the slowdown is caused by changes in `mlir::detail::ConversionPatternRewriterImpl::isOpIgnored` (probably related to the data structure changes).

After the two patches in release mode:
````
Samples: 158K of event 'cycles', Event count (approx.): 21633186643
  Children      Self  Command  Shared Object         Symbol
+   65.37%    64.59%  fir-opt  fir-opt               [.] mlir::detail::ConversionPatternRewriterImpl::isOpIgnored
+   65.22%     0.00%  fir-opt  [unknown]             [.] 0x0000000000000001
+   65.18%     0.00%  fir-opt  [unknown]             [.] 0x480029bc01058d48
+   65.18%     0.00%  fir-opt  fir-opt               [.] mlir::detail::ConversionPatternRewriterImpl::~ConversionPatternRewriterImpl
+   24.25%     0.00%  fir-opt  [unknown]             [.] 0x0000000100000001                                                
+   24.19%    23.65%  fir-opt  fir-opt               [.] mlir::detail::ConversionPatternRewriterImpl::notifyOpReplaced
+   24.05%     0.00%  fir-opt  [unknown]             [.] 0x26ee058d48fb8948                                
+   24.05%     0.00%  fir-opt  fir-opt               [.] mlir::RegisteredOperationName::Model<fir::InsertValueOp>::~Model              
+   24.05%     0.00%  fir-opt  [unknown]             [.] 0x000055a74703d488
+    0.82%     0.00%  fir-opt  [unknown]             [k] 0000000000000000                                                          
     0.78%     0.47%  fir-opt  fir-opt               [.] mlir::Lexer::lexBareIdentifierOrKeyword
....
````

Before the patches, `mlir::detail::ConversionPatternRewriterImpl::isOpIgnored` was nowhere to see in the perf report (MLIR IO dominated the run). 

You can reproduce if you have flang builds enabled, and with repro.f90 that is the Fortran source from Slava above with:

```
# First phase of compilation not impacted by patch
bin/bbc -emit-fir repro.f90 -o - | bin/fir-opt --cg-rewrite -o -input.fir
# FIR to LLVM dialect conversion pass impacted by pass
time bin/fir-opt --fir-to-llvm-ir -input.fir -o output.mlir
```

I will try to see if I can come with a pure MLIR reproducer.

(*) about the Fortran program: this program generates a global that is a 7d array of 40320 chars with an initial value.

So far, flang generates a chain of insert_value for character types, so the operation that is impacted by the slow-down is a fir.global where the body contains a chain of 40320 fir.insert_value + 3 other ops (the value being inserted, and the terminator). We are planning to move away from this and use attribute for global initializer as much as possible. However, the slow down will likely kick in for every functions with more than a few thousands ops, and this is quite easily reached with big Fortran programs. Global constants are just an easy way to create a lot of IR with a few lines of Fortran to reproduce the issue.

````
 fir.global internal @_QFECc717 constant : !fir.array<2x3x4x5x6x7x8x!fir.char<1>> {
    %0 = fir.undefined !fir.array<2x3x4x5x6x7x8x!fir.char<1>>
    %1 = fir.string_lit "a"(1) : !fir.char<1>
    %2 = fir.insert_value %0, %1, [0 : index, 0 : index, 0 : index, 0 : index, 0 : index, 0 : index, 0 : index] : (!fir.array<2x3x4x5x6x7x8x!fir.char<1>>, !fir.char<1>) -> !fir.array<2x3x4x5x6x7x8x!fir.char<1>>
   // ....
   %40321 = fir.insert_value %40320, %1, [1 : index, 2 : index, 3 : index, 4 : index, 5 : index, 6 : index, 7 : index] : (!fir.array<2x3x4x5x6x7x8x!fir.char<1>>, !fir.char<1>) -> !fir.array<2x3x4x5x6x7x8x!fir.char<1>>
    fir.has_value %40321 : !fir.array<2x3x4x5x6x7x8x!fir.char<1>>
  }
````

This is being rewritten to very similar LLVM dialect IR:
````
 llvm.mlir.global internal constant @_QFECc717() {addr_space = 0 : i32} : !llvm.array<8 x array<7 x array<6 x array<5 x array<4 x array<3 x array<2 x array<1 x i8>>>>>>>> {
    %0 = llvm.mlir.undef : !llvm.array<8 x array<7 x array<6 x array<5 x array<4 x array<3 x array<2 x array<1 x i8>>>>>>>>
    %1 = llvm.mlir.constant("a") : !llvm.array<1 x i8>
    %2 = llvm.insertvalue %1, %0[0, 0, 0, 0, 0, 0, 0] : !llvm.array<8 x array<7 x array<6 x array<5 x array<4 x array<3 x array<2 x array<1 x i8>>>>>>>>
    // ...
    %40321 = llvm.insertvalue %1, %40320[7, 6, 5, 4, 3, 2, 1] : !llvm.array<8 x array<7 x array<6 x array<5 x array<4 x array<3 x array<2 x array<1 x i8>>>>>>>>
    llvm.return %40321 : !llvm.array<8 x array<7 x array<6 x array<5 x array<4 x array<3 x array<2 x array<1 x i8>>>>>>>>
  }
````

https://github.com/llvm/llvm-project/pull/81757