[PATCH] D81472: [Matrix] Update load/store intrinsics.

Wed Jun 10 17:14:50 PDT 2020

fhahn added a comment.

In D81472#2086283 <https://reviews.llvm.org/D81472#2086283>, @rjmccall wrote:

> In D81472#2086096 <https://reviews.llvm.org/D81472#2086096>, @fhahn wrote:
>
> > In D81472#2086062 <https://reviews.llvm.org/D81472#2086062>, @rjmccall wrote:
> >
> > > [snip]
> > >
> > > > Oh right, the 'special value' to indicate row/column major would be setting either stride to 1. As long as exactly one of those is  1, the layout of the result/operand should be clear. Personally I find including layout included in the name a bit easier to follow, as it is more explicit. But it might be preferable to have a single variant that handles row/column major depending on the strides (as long as we enforce that exactly one stride has to be 1.), once we add those variants.
> > >
> > > Why have the restriction that exactly one stride has to be 1?  If you can optimize that as a constant, great, do it, but otherwise just do the separate loads/stores, and impose an UB restriction that the strides have to make them non-overlapping.
> >
> >
> > Besides making assumptions about the layout of the access memory, the intrinsic also specifies the layout of the loaded/stored values (=layout in the flattened vector). If either stride is constant 1, we could use that to determine the layout of the loaded/stored value. I may be missing something, but if both are != 1 or arbitrary values, it is not clear what we should pick for the in-vector layout.
>
>
> Wait, what?  I assumed you used a canonical layout (presumably column-major) in the flattened vector.

Yes, currently we default to column major for the lowering.

> Are you planning to make all your intrinsics support either column-major or row-major layout?  That seems like a lot of complexity in the backend that you're mostly not actually going to be using because the frontend will use a canonical layout.

My main focus currently is to evolve the intrinsics & lowering so they work well with the matrix extension in Clang. We don't plan to propose/work towards mixing layouts or supporting switching the layout in the C/C++ extension.

On the other hand, I am also trying to ensure the intrinsics are useful for use-cases beyond Clang. For example, the intrinsics are also used by MLIR to lower matrix operations and for that use case, supporting row-major with the intrinsics and also mixing row/column major layouts makes things much easier. At the moment, it is already possible to switch the canonical layout to row-major for the lowering on the LLVM side. Supporting both layouts for most parts was relatively straight-forward (excluding the load/store intrinsics) and fits quite naturally. I am also looking into providing a more powerful way to describe additional properties of the inputs using operand bundles.

> Are you anticipating that it's going to be important to e.g. peephole a row-major load that's fed into a row-major store so that you don't unnecessarily shuffle the vector?

The main benefits I expect from making the layouts more flexible is 1) making IRGen easier for frontends with different underlying layouts and 2) potentially being able to optimise conversions/transposes for larger expressions. For small expressions I do not expect too much benefit in terms of optimisations, as LLVM is relatively good at eliminating the kinds of shuffles we emit, at least for small matrix sizes.

> Also, if you *are* trying to proactively support multiple flattened-vector layouts, I feel like stopping at row-major vs. column-major is probably unnecessarily limiting and you should really allow a whole enumeration's worth of possibilities in case you want to e.g. incorporate internal padding into the vector.

Unfortunately I am a bit limited in terms of bandwidth when it comes to evolving the intrinsics outside of the clang extension case. I try to focus on evolving them to make sure they work well for the existing users, but also try to make sure the whole system remains flexible enough to support additional cases as you mentioned. I am also happy to quite aggressively adjust the intrinsics when we encounter issues/missing pieces, as in the patch for now. But I suppose we have to be a bit more careful about backwards-compatibility once released frontends out there support the intrinsics.

I hope that makes sense and please let me know if you have any concerns.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81472/new/

https://reviews.llvm.org/D81472