[PATCH] D81472: [Matrix] Update load/store intrinsics.

Thu Jun 11 11:33:09 PDT 2020

rjmccall added a comment.

In D81472#2087558 <https://reviews.llvm.org/D81472#2087558>, @fhahn wrote:

> In D81472#2086652 <https://reviews.llvm.org/D81472#2086652>, @rjmccall wrote:
>
> > My immediate concern is just that I think the memory layout of the matrix should be orthogonal to the component layout of the vector.  If you want the matrix intrinsics to support a variety of vector layouts, you should pass down the expected layout as a constant argument to the intrinsic rather than picking it up purely from whether the matrix is being loaded from a row-major or column-major layout in memory.  I would guess that making that constant an  i32 is probably sufficiently future-proof; if you ever need more structure than that, you'd probably be better off biting the bullet and adding an `llvm::MatrixType`.
>
>
> Hm I understand the appeal of having a single very powerful intrinsic. Selecting the different variants by a single parameter is convenient in terms of maintaining backwards compatibility, but personally I find it more readable to include some of the variant information in the name. Of course there's a limit to the number of variants for which that approach is feasible. I think it is important to have this discussion, but I am not sure if it is in scope for this patch (which only adds a few smallish improvements to the naming/arguments of the intrinsics) and it might be better to discuss that once work on row-major versions of the intrinsics starts?

If you're willing to rework the intrinsics later, then I have no objections to committing this now, yeah.

>> My thinking here is that, for example, you're adding a builtin to Clang to do a column-major load.  You want that to produce a column-major vector layout because that's your canonical layout within Clang.  But you should also eventually add a builtin to do a row-major load, because there are quite a few reasons people might have a matrix in memory in row-major order: for example, if they've declared a nested C array, they'll naturally get row-major order.  That builtin *also* needs to produce a column-major vector layout.  So tying the two things together is bad intrinsic design.
> 
> Having intrinsics that can apply such conversions directly certainly is a convenient option here. But alternatively it should also be possible to have a small set of load/store intrinsics (e.g. load row-major from row-major, load column-major from column-major) and get the other variants by composing conversion functions (e.g. transpose (load row major() to load a row-major matrix from memory and convert it to column-major).

This is definitely a feasible alternative intrinsic design, where you have a small number of basic operations and the backend combines them to emit the operation more efficiently.  My intuition is that how well it works in practice depends on the performance gap between emitting the combined operation and emitting them separately, because the backend can be quite bad at actually emitting the combined operation reliably.  I don't have a sense of how that applies here; the naive approach for loading row-major as column-major is essentially to load and then shuffle, i.e. to essentially emit them separately anyway, but maybe there's a reasonable alternative that I don't know because I'm less familiar with vector instruction sets.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81472/new/

https://reviews.llvm.org/D81472