[Mlir-commits] [mlir] [MLIR][Linalg] Add aggregate ops decomposition pass and softmax decom… (PR #97582)

Wed Jul 24 03:35:20 PDT 2024

kurapov-peter wrote:

> 2. The change to the decomposition of softmax IMO is an inefficient lowering of softmax and will require "something" to get the state back. This should be part of the PR that is changing the decomposition. It is moving from a more succinct representation that Linalg allows to something that is (artifically) hamstrung with current definitions of the named ops. I dont expect the issue with named ops to be fixed as a precursor (though that would be the right thing to do IMO), but for this PR, I dont see how we can land it without having an option to chose how to decompose softmax (with default being what it is today, and an option to lower to sequence of named ops). On top of that adding a generalization to convert everything to `linalg.generic`s is a non-starter IMO. You will be forcing all downstream users to either use "recognizers" heavily to retrieve back the information that is lost by generalization and not giving downstream users control on when they want to generalize.

Ok, I see. So this position is the opposite of what I'm proposing: changing the default decomposition to target named ops (note that this has nothing to do with generalization). 

Here I'm summarizing the arguments for preserving the status quo and against it.

Cons of changing default:
1. Additional steps are required to reach the same IR state.
2. The IR is less succinct: explicit broadcasts are inserted due to named ops requirements.

Pro of changing default:
1. Current decomposition is a mix of an actual decomposition to exp, sum, sub, max, and div named ops + partial generalization + some fusion (I'll call the existing one a _mixed decomposition_ here to differentiate between the proposed approach and the existing one). The proposed approach limits decomposition to a single responsibility.
2. Separating these three stages is beneficial because you can control the behavior better. For example, after decomposing softmax into a sequence of named ops one can fuse and tile them together with another named op that was not part of softmax. With the current approach, you'd still need the fusion pass run after the mixed decomposition to reach the same state, so pipeline complexity is the same. Moreover, new possibilities open up for pipelines that don't want to generalize the result of the decomposition.

_Mitigating cons n.1_: Even though reaching the same result of the mixed decomposition requires additional steps, those are existing upstream transformations. Hence, the downstream changes won't be a significant burden.
_Mitigating cons n.2_: As for broadcasting, as I mentioned earlier, adding implicit casting for named ops is an option. Though I still don't see an actual problem with the "same rank" requirement other than it is "unnecessary", I'm willing to work on it if it proves valuable.

 I suggest we make a decision on the direction here @ftynse, @nicolasvasilache, @dcaballe, @rengolin.

https://github.com/llvm/llvm-project/pull/97582