[PATCH] D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass.

Fri Apr 3 06:57:34 PDT 2020

rengolin added a comment.

In D70456#1959386 <https://reviews.llvm.org/D70456#1959386>, @fhahn wrote:

> Yes, the lowering as done in this patch could also have been done exclusively by the frontend without functional difference.

Right, that makes sense. Do you expect front-ends to detect code patterns (like nested loops over i, j) or just to lower from existing "matmul" operations?

LLVM already does that for libc calls (ex. llvm.memcpy) and if languages have matmul intrinsics, then this would be a trivial lowering. But detecting patterns, especially in C/C++ code, can end up horribly wrong or slow. :)

> Currently I am working on adding initial tiling support for multiplies directly to the lowering pass: D75566 <https://reviews.llvm.org/D75566>.

Sure, and I expect that this loop would already be "vectorised", sith safety guaranteed by construction and widths extracted from TTI, so "pragma clang vectorise" disabled and the vectoriser won't even look at it.

I'm not sure how VPlan handles partially vectorised nested loops, but it would be interesting if we could re-vectorise after loop fusion or outer-loop vectorisation.

> One advantage of doing it in the lowering pass is that we have all information necessary available there and it is very specific to the intrinsic. Without lowering the intrinsic, there is no loop at all. (Even with the proposed tiling, there won't be any loops as the lowering effectively unrolls the tiled loops, but that will be improved in the future, as this approach is not practical for larger matrixes).

That was my point in letting the LV "know" about the intrinsic. To recognise it as a loop and work on it.

> I think currently the general direction of the work is towards making the lowering pass better, rather than teaching other passes about the matrix intrinsics.

That sounds very sensible. :)

> I've also been thinking about using the infrastructure in the lowering pass to optimize large vector operations, even if no matrix intrinsics are involved. At the moment I am not sure how supporting matrix intrinsics would fit into passes like the loop vectorizer, but the lowering pass might be a good candidate to use VPlan for code generation/cost-modeling, once the infrastructure is there.

Indeed, what I thought would be a way into the LV. I don't mind if we teach the LV about matmul or if we export the VPlan machinery and let other passes use it, as long as we don't duplicate the work.

> Another direction to explore would be to detect loops that perform a matrix multiply and replacing them with a call to the intrinsic, which then gets further optimized.

That's curious. Do you mean tracing a path from (weird loop) to (llvm.matmul) to (matmul loop), in a way to canonicalise loops?

> Sorry for the somewhat lengthy response, but does the overall direction make sense to you?

No problems at all. Also, bear in mind I don't want to delay the approval/merge of this patch. Glad to continue discussing it after it's committed.

cheers,
--renato

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70456/new/

https://reviews.llvm.org/D70456