[flang-commits] [flang] [mlir] [mlir][Transforms] Support 1:N mappings in `ConversionValueMapping` (PR #116524)

Tue Dec 24 07:35:17 PST 2024

================
@@ -1478,34 +1497,12 @@ Value ConversionPatternRewriterImpl::findOrBuildReplacementValue(
   }
   Value castValue = buildUnresolvedMaterialization(
       MaterializationKind::Source, computeInsertPoint(repl), value.getLoc(),
----------------
zero9178 wrote:

(also continuing from the `remapvalues` issue above https://github.com/llvm/llvm-project/pull/116524#discussion_r1896576239)

> The conversion driver generally processes ops top-to-bottom, but that is not guaranteed if the user inserts a new op in a pattern; that op can be inserted anywhere and the driver will immediately try to legalize it. (Admittedly, this is probably a quite rare edge case...)

If the user were to insert new ops wherever they want then there doesn't really exist an insertion point that works for all use-cases either way. I think it is reasonable to expect that users may only assume dominance relations that are already present in the input IR: (i.e. that all operands dominate a user operation, and all results of an operation dominates all uses). This is commonly already the case with all kinds of patterns in MLIR and is also what would make creating target materializations for operands valid if inserted right before the operation. 

> One problem with this is that there could be multiple ConversionPattern::matchAndRewrite. The insertion point could be valid for the first matchAndRewrite (which creates the materialization), but invalid for the second matchAndRewrite (which reuses the existing materialization).

Just to ensure I understand. Assuming we have a CFG such as:
```
             ┌────────────┐             
             │            │             
             │   bb0      │             
      ┌──────┼            ┼──────┐      
      │      │ defs: v_ns │      │      
      │      │            │      │      
      │      └────────────┘      │      
      │                          │      
┌─────▼─────┐             ┌──────▼─────┐
│           │             │            │
│    bb1    │             │    bb2     │
│           │             │            │
│  op(v_ns) │             │   op(v_ns) │
└───────────┘             └────────────┘
```
and `bb1` is processed first, then it could create a target materialization for an operand using `v_n`s right? If the insertion point would then placed in `bb1` then `bb2` would nevertheless try to reuse it (since it was stored in the mapping) and create a dominance violation. 
I see how this could currently go wrong if we have a naive insertion point yeah.

> Both in this case, and in remapValues above, we could choose to not store the materialization value in the mapping. That would mean that we create a new materialization each time, which would mean there's some compile time overhead.

The way I see it there are 3 metrics we can optimize for: Simplicity, compile time and number of materializations

If a target materialization is always created then:
* Simplicity is optimized
* Number of materializations is unoptimized
* Compile time I'd expect to still be fast due to not needing to compute any kind of insertion point logic
* Additional negative: This is a regression for the 1:1 case where we have optimal number of materializations, compile time and simplicity in the current implementation.

If we always compute the optimal insertion point for a given set of materialization operands then:
* Simplicity decreses due to extra insertion point logic + reuse logic
* Number of materializations is optimal
* Compile time I'd think to be higher than the first solution due to the insertion point optimization. Maybe this can be improved by having an up-to-date dominance info, but at the cost of simplicity.
* Additional positive: Uniformity between 1:1 and 1:N cases and no regressions for 1:1

Alternatively, we could do a heuristic for when all values are within the same block, and disable the optimization (including storing in the mapping) when they aren't:
* Simplicitly decreases due to both conditional optimization logic and insertion point logic. Feels more like a heuristic.
* Number of materialization is non-optimal, but better than always creating them. Optimal for the 1:1 case!
* Compile time I'd expect to be the best here
* Additional positive: No regressions in the 1:1 case (guarantees optimal materializations)

Or lastly as you suggested, reuse for 1:1, do not for 1:N. Then:
* Simplicity is relatively high as the "always materialize" option (just a few conditionals right), but conceptually lower due to non-uniformity of 1:1 and 1:N. 
* Number of materailization is unoptimized, but optimal for the 1:1 case
* Compile I'd expect to be perfect for 1:1,  fast for 1:N
* Additional positives: No regression for 1:1

All of this to me seems like a pick your poison option. I am unconvinced that there is a large correlation between compile time and number of materializations *in the current dialect conversion*. I am also not actually sure how many times we create target materializations and how often we can reuse them. 
Unless we have more data I'd therefore opt for simpler changes to avoid premature optimizations.

I like your last suggestion a lot in that sense as it avoids regressions in the 1:N case, should have good compile time (even for 1:N *probably*, more data required) but is still rather simple. It could in the future ,once we do have data, be supplemented with heuristics getting closer to the 3rd option I mentioned.

re: The pooling on the RFC, to me it sounds rather unrelated to the optimizations here

> re: Using value as an insertion point. I just gave this a try, but it doesn't work. One problem is with block signature conversions. E.g.: a memref is replaced with the elements that make up a memref descriptor (during MemRef -> LLVM). In that case, computeInsertionPoint(value) would return the old block (that is going to be erased). The materialization must be inserted in the new block.

Is there some other way logic we can use to pass a valid insertion point to `findOrBuildReplacementValue`? The `ReplaceBlockArgRewrite` would have the old block as well sadly, right? With no real way to get a corresponding insertion point to what used to be the first operation of the block?

If there isn't really suitable context we can use for finding insertion points in the case of the block argument rewrites (and resulting source materializations), then your current implementation might actually just be the simplest solution for now I'm afraid :slightly_frowning_face: 

https://github.com/llvm/llvm-project/pull/116524