[PATCH] D77549: [Matrix] Use aarch64.udot for 4x4 tiling for i8 matrixes (WIP).

Mon Apr 6 07:33:04 PDT 2020

fhahn created this revision.
Herald added subscribers: danielkiss, tschuett, hiraditya, kristof.beyls.
Herald added a project: LLVM.
fhahn updated this revision to Diff 255328.
fhahn added a comment.

Add tests to illustrate the generated IR.

This processes matrix multiplies of i8 matrixes in 4x4 tiles and use
aarch64.udot to compute the result of the 4x4 multiplies.

This patch lowers store(matrix.multiply(transpose(load()), load())) as
described above. As the first operand is transposed we can access the
rows of the transposed operands by loading the columns  of the original
load directly.

Note that @llvm.matrix.multiply does not make a distinction between
unsigned & signed multiplication for integers and this patch arbitrarily
use udot. We probably have to add integer multiply variants for signed &
unsigned in the future. Also, the way this is currently integrated needs
a bit of more work. It would probably be good to expose a hook where
targets can be queried which kernels can be implemented efficiently on
the target.

Finally, the shuffles generated for the current lowering seems to
generate awful code for now, but the main goal of the patch is to
illustrate how target specific instructions can be used when lowering
matrix intrsinics.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D77549

Files:
  llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
  llvm/test/Transforms/LowerMatrixIntrinsics/aarch64-udot-4x4.ll
  llvm/test/Transforms/LowerMatrixIntrinsics/aarch64-udot-8x8.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D77549.255328.patch
Type: text/x-patch
Size: 28397 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200406/ad55be88/attachment.bin>