[Mlir-commits] [mlir] [MLIR][SCF] Add canonicalization pattern to fold away iter args of scf.forall (PR #90189)

Thu May 2 01:41:13 PDT 2024

================
@@ -1509,6 +1510,203 @@ class ForallOpControlOperandsFolder : public OpRewritePattern<ForallOp> {
   }
 };
 
+/// The following canonicalization pattern folds the iter arguments of
+/// scf.forall op if :-
+/// 1. The corresponding result has zero uses.
+/// 2. The iter argument is NOT being modified within the loop body.
+/// uses.
+///
+/// Example of first case :-
+///  INPUT:
+///   %res:3 = scf.forall ... shared_outs(%arg0 = %a, %arg1 = %b, %arg2 = %c)
+///            {
+///                ...
+///                <SOME USE OF %arg0>
----------------
Abhishek-Varma wrote:

Reg context:
>From the `.td` description :-
```
The actions of the in_parallel terminators specify how to combine the partial results
of all parallel invocations into a full value, in some unspecified order.
The “destination” of each such op must be a shared_out block argument of the scf.forall op.
```

So, my understanding is that "each such op" will have `shared_out block argument` and it is always supposed to be in the "Destination" operand.

> check that the only use of the iter_args is in tensor.insert_in_parallel ops within the scf.forall.in_parallel (from previous comment)

I don't think we should constrain the use of the iter_args to be ONLY that. Ideally the use would be that a slice of iter_arg is being extracted, performed some computation on and then stored back into the same iter_arg.

What I'm doing instead is -> as per your comment about having an API defined in `scf.forall` that'll return a unique `tensor.parallel_insert_slice` - I've added that and making use of that.

https://github.com/llvm/llvm-project/pull/90189