[PATCH] D81472: [Matrix] Update load/store intrinsics.

Wed Jun 10 13:55:54 PDT 2020

rjmccall added a comment.

In D81472#2086031 <https://reviews.llvm.org/D81472#2086031>, @fhahn wrote:

> In D81472#2085724 <https://reviews.llvm.org/D81472#2085724>, @rjmccall wrote:
>
> > In D81472#2085129 <https://reviews.llvm.org/D81472#2085129>, @fhahn wrote:
> >
> > > In D81472#2083075 <https://reviews.llvm.org/D81472#2083075>, @rjmccall wrote:
> > >
> > > > I like the name change, although I wonder if you could just have a single intrinsic that takes both a row stride and a column stride and recognizes the common patterns.  Presumably even with column-major ordering you already want to optimize the case where the stride is a constant equal to the row count, so this would just be a generalization of that.
> > >
> > >
> > > I am not sure about having a single intrinsic. The `column.major` part in the name signifies that the resulting matrix is in column-major layout (whic, but  is then used internally during he lowering). I suppose it would be possible to have both row & column strides and use a special value to indicate what the leading dimension is, but it seems to me that having dedicated intrinsics would be more explicit.
> >
> >
> > This is just Fortran array slices.  You don't need a special value, the two strides are sufficient.  `M[i][j]` is at `p[i *  rowStride + j * columnStride]`.  To not have overlapping storage, you need either  `rowStride * rowCount <= columnStride` or vice-versa.  Row-major means `rowStride >= columnCount && columnStride == 1`; column-major means `rowStride == 1 && columnStride >= rowCount`.  You get better locality from doing the smaller stride in the inner loop (which may not actually be a loop, of course), but it's not wrong to do either way.
> >
> > Anyway, it's up to you, but I think the two-stride representation is more flexible and avoids ultimately needing three separate intrinsics with optimizations that turn the general one into the more specific ones.  And it may have benefits for frontends like Flang that have to support these strided multi-dimensional slices.
>
>
> Oh right, the 'special value' to indicate row/column major would be setting either stride to 1. As long as exactly one of those is  1, the layout of the result/operand should be clear. Personally I find including layout included in the name a bit easier to follow, as it is more explicit. But it might be preferable to have a single variant that handles row/column major depending on the strides (as long as we enforce that exactly one stride has to be 1.), once we add those variants.

Why have the restriction that exactly one stride has to be 1?  If you can optimize that as a constant, great, do it, but otherwise just do the separate loads/stores, and impose an UB restriction that the strides have to make them non-overlapping.

> This patch already clumps together a bunch of changes and I think it would be good to have the discussion once row major support for those is added. It should be easy to auto-upgrade to the more general intrinsics, if desired.

Doing fewer signature changes is probably best, but I won't insist.

>>>> Are you planning to actually support alignment in a follow-up patch?  I don't see anything here that propagates alignment to the lowering routines.
>>> 
>>> Yes, this patch just adjusts the intrinsics definitions/langref. Respecting both `IsVolatile` & the alignment attribute will be done in follow-up patches.\
>> 
>> Okay.
> 
> I've put up D81498 <https://reviews.llvm.org/D81498>. It just adds `volatile` to the generated loads/stores, if `IsVolatile` is true. Do you think `volatile` should/has to prevent splitting the memory operation into multiple loads/stores?

No, I think the semantics here are more like a volatile memcpy: we're demanding that the access be done, but not guaranteeing anything about access widths.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81472/new/

https://reviews.llvm.org/D81472