[PATCH] D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass.

Tue Nov 19 12:24:52 PST 2019

fhahn created this revision.
fhahn added reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma.
Herald added subscribers: jdoerfert, hiraditya, mgorny, mehdi_amini.
Herald added a project: LLVM.

This is the first patch adding an initial set of matrix intrinsics and a
corresponding lowering pass. This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html

The first patch introduces four new intrinsics (transpose, multiply,
columnwise load and store) and a LowerMatrixIntrinsics pass, that
lowers those intrinsics to vector operations.

Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix
embedded in a <16 x float> vector) and the intrinsics take the dimension
information as parameters. Those parameters need to be ConstantInt.
For the memory layout, we initially assume column-major, but in the RFC
we also described how to extend the intrinsics to support row-major as
well.

For the initial lowering, we split the input of the intrinsics into a
set of column vectors, transform those column vectors and concatenate
the result columns to a flat result vector.

This allows us to lower the intrinsics without any shape propagation, as
mentioned in the RFC. In follow-up patches, we plan to submit the
following improvements:

- Shape propagation to eliminate the embedding/splitting for each intrinsic.
- Fused & tiled lowering of multiply and other operations.
- Optimization remarks highlighting matrix expressions and costs.
- Generate loops for operations on large matrixes.
- More general block processing for operation on large vectors, exploiting shape information.

We would like to add dedicated transpose, columnwise load and store
intrinsics, even though they are not strictly necessary. For example, we
could instead emit a large shufflevector instruction instead of the
transpose. But we expect that to

  (1) become unwieldy for larger matrixes (even for 16x16 matrixes,
      the resulting shufflevector masks would be huge),
  (2) risk instcombine making small changes, causing us to fail to
      detect the transpose, preventing better lowerings

For the load/store, we are additionally planning on exploiting the
intrinsics for better alias analysis.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D70456

Files:
  llvm/docs/LangRef.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/InitializePasses.h
  llvm/include/llvm/Transforms/Scalar.h
  llvm/include/llvm/Transforms/Scalar/LowerMatrixIntrinsics.h
  llvm/lib/Passes/PassBuilder.cpp
  llvm/lib/Passes/PassRegistry.def
  llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
  llvm/lib/Transforms/Scalar/CMakeLists.txt
  llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
  llvm/lib/Transforms/Scalar/Scalar.cpp
  llvm/test/Transforms/LowerMatrixIntrinsics/multiply.ll
  llvm/test/Transforms/LowerMatrixIntrinsics/strided-load.ll
  llvm/test/Transforms/LowerMatrixIntrinsics/strided-store.ll
  llvm/test/Transforms/LowerMatrixIntrinsics/transpose.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D70456.230121.patch
Type: text/x-patch
Size: 75640 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20191119/42901331/attachment-0001.bin>