[flang-commits] [flang] [mlir] [mlir][Transforms] Support 1:N mappings in `ConversionValueMapping` (PR #116524)

Wed Dec 25 04:43:54 PST 2024

================
@@ -1478,34 +1497,12 @@ Value ConversionPatternRewriterImpl::findOrBuildReplacementValue(
   }
   Value castValue = buildUnresolvedMaterialization(
       MaterializationKind::Source, computeInsertPoint(repl), value.getLoc(),
----------------
matthias-springer wrote:

Dominance problems can already appear within the same basic block.

> > The conversion driver generally processes ops top-to-bottom, but that is not guaranteed if the user inserts a new op in a pattern; that op can be inserted anywhere and the driver will immediately try to legalize it. (Admittedly, this is probably a quite rare edge case...)
> 
> If the user were to insert new ops wherever they want then there doesn't really exist an insertion point that works for all use-cases either way. I think it is reasonable to expect that users may only assume dominance relations that are already present in the input IR: (i.e. that all operands dominate a user operation, and all results of an operation dominates all uses). This is commonly already the case with all kinds of patterns in MLIR and is also what would make creating target materializations for operands valid if inserted right before the operation.

Here is an example:
```mlir
%orig = "original_op"()
%0 = "producer1"()
%1 = "producer2"()

"user1"(%orig)
"user2"(%orig)
```

Let's assume that `%orig` was replaced with `[%0, %1]`. Now we convert `user1` and `user2`:

- `user2` is converted first. A materialization from `[%0, %1]` to type `X` is inserted between `user1` and `user2`.
- `user1` is converted next. The driver finds the already existing materialization to type `X` and reuses it. Now we have a dominance error.

> Compile time I'd expect to still be fast due to not needing to compute any kind of insertion point logic

There is also the overhead of creating a new `unrealized_conversion_cast` op, which is a heap allocation. I have not measured this overhead myself, but that's what sparked the discussion about "pooling". (Although it may not help here...)

The overhead could be particularly large for MemRef -> LLVM. E.g.:
```mlir
func.func @foo(%m: memref<?x?xf32>) {
  memref.store ..., %m[%i, %j]
  memref.store ..., %m[%k, %l]
  ...
```

As part of target materializations, for every user of `%m` we would build an  `llvm.undef` and 6 x `llvm.insertvalue`. (Unless we CSE the `unrealized_conversion_cast` ops before they materialized; that could be another option, but we would need a filter for CSE.)



https://github.com/llvm/llvm-project/pull/116524