[Mlir-commits] [mlir] [mlir][vector] Add vector.transpose with unit-dim to vector.shape_cast pattern (PR #72105)

Wed Nov 22 16:46:53 PST 2023

qedawkins wrote:

> Regarding #3, I think the problem is bigger than `vector.shape_cast` as we have too many ways of representing the same thing and no specific canonical form. For the particular case discussed here I can thing of `vector.shape_cast`, `vector.broadcast`, `vector.extract/insert`, `vector.extract_element/insert_element` and `vector.transpose`. All of these alternatives have different canonicalization patterns and gaps so depending on which one we choose and how lucky we are we may end up with a more or less efficient code at the end. We should think about how to improve all of these and unify both the representation of some of these special cases (e.g., unit dim collapse/expansion) and their respective canonicalization patterns/lowerings.

Big +1 to re-evaluating how we handle unit dims in general. That seems to be the root of a large portion of this discussion and also where a lot of these different ops start to converge. Part of the problem here might be that at the moment, there isn't a great way (that I know of) to reliably get rid of the inner unit dimension on the result of the shape_cast.
```
%1 = vector.shape_cast : vector<1x4xf32> to vector<4x1xf32>
```
There are patterns for dropping leading unit dimensions (https://github.com/llvm/llvm-project/blob/main/mlir/lib/Dialect/Vector/Transforms/VectorDropLeadUnitDim.cpp), but nothing for other unit dims. IIUC the problem in SPIR-V isn't shape_cast directly, but the fact that all vectors must be 1-D. Vector unrolling of transpose lets us handle the inner unit dim of the result because the unrolling pattern can easily associate the unit dim in the input to the one in the result using the permutation. https://github.com/llvm/llvm-project/blob/48f5855e4c48709d71d7aae4595232a3296c735e/mlir/lib/Dialect/Vector/Transforms/VectorUnroll.cpp#L518
But there's no equivalent for shape_cast, and as a result the unit dim on the result of the shape cast is sticky. The outcome of this shouldn't be to just leave the support for shape cast as is, say it can't be supported, and fork canonical representations across targets. At the same time, just as you're saying, I'm a bit worried about having to rely on throwing more canonicalization patterns at the IR and hope that we get lucky and the unit dimension eventually goes away. I agree here that in most of the cases I've seen, additional unit dimensions like this just complicate later analysis/patterns and it would be really nice to find ways to reliably simplify. In that respect, making this a canonicalization might be a band-aid on a larger problem.

> Seems like to me like the end results is exactly what's I'd expect: this the most canonical representation of the map (and is consistent with the shape_cast canonicalization).
If the particular analysis you mention intends to support well "the vector dialect" then it ought to do a better job on shape_cast...
Otherwise it feels like what you're presenting is a situation of: "we implemented our analysis for a subset of the dialect, so when you're canonicalizing toward operation we haven't spend time on it regresses our analysis".

No argument with whether that is the canonical map. We could just as easily have chosen that map as the initial layout and handling of transpose would produce the same thing. Part of what I wanted to illustrate is that for certain patterns, the transpose really is easier to implement/reason about (the same might be true for unrolling of transpose vs shape cast). I might have misunderstood this when said, but my understanding of canonicalizations was that they are supposed to make the IR easier for the compiler to reason about. That does not mean any particular analysis/pattern should fail on/perform excessively poorly on a particular choice of equivalent representation. In my example, to recover what happened for transpose, we need to apply non-trivial simplification to the layout map, and I don't see that as easier for the compiler to reason about. Similarly, IIUC general unrolling of shape cast would have to do the same kind of linearization/delinearization of the vector slices (which we should still do, just that it's harder than transpos). Maybe what I'm moving toward is
```
Here you seem to want a "subset of the vector dialect" and that is just not reasonable as-is (that is: if this is what you want, that is an RFC for a split of the vector dialect in high-level and lower-level parts).
```
but I'd really rather have to go to that extreme.

https://github.com/llvm/llvm-project/pull/72105