[PATCH] D99433: [Matrix] Including __builtin_matrix_multiply_add for the matrix type extension.

Florian Hahn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 31 07:30:02 PDT 2021


fhahn added a comment.

In D99433#2661357 <https://reviews.llvm.org/D99433#2661357>, @everton.constantino wrote:

> @fhahn When I mentioned the splats I was talking about the IR, not the final code. On the Godbolts links you sent, its the same that I see. However take a look into the IR your example generates:

Sorry for not being clearer. I meant the IR *before* LowerMatrixIntrinisics is run (which should be on the righthand side of the Godbolt view). I'm also posting it below. Unless I am missing something, we should be able to easily match `fadd (llvm.matrix.multiply(A, B), C) ` before the actual lowering of `llvm.matrix.multiply`. I think we do something similar already for combing load->multiply->store chains: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp#L703 . Basically try to fuse all multiplies before the 'normal' lowering. Would it be possible to deal with  `fadd (llvm.matrix.multiply(A, B), C) ` similarly?

  lang-13: warning: argument unused during compilation: '--gcc-toolchain=/opt/compiler-explorer/gcc-snapshot' [-Wunused-command-line-argument]
  *** IR Dump Before Lower the matrix intrinsics (lower-matrix-intrinsics) ***
  ; Function Attrs: nofree nounwind uwtable willreturn mustprogress
  define dso_local void @_Z3fooRu11matrix_typeILm2ELm2EfES0_S0_([4 x float]* nocapture nonnull readonly align 4 dereferenceable(16) %0, [4 x float]* nocapture nonnull align 4 dereferenceable(16) %1, [4 x float]* nocapture nonnull readonly align 4 dereferenceable(16) %2) local_unnamed_addr #0 {
    %4 = bitcast [4 x float]* %0 to <4 x float>*
    %5 = load <4 x float>, <4 x float>* %4, align 4, !tbaa !6
    %6 = bitcast [4 x float]* %2 to <4 x float>*
    %7 = load <4 x float>, <4 x float>* %6, align 4, !tbaa !6
    %8 = tail call <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4f32(<4 x float> %5, <4 x float> %7, i32 2, i32 2, i32 2)
    %9 = bitcast [4 x float]* %1 to <4 x float>*
    %10 = load <4 x float>, <4 x float>* %9, align 4, !tbaa !6
    %11 = fadd <4 x float> %8, %10
    store <4 x float> %11, <4 x float>* %9, align 4, !tbaa !6
    ret void
  }


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99433/new/

https://reviews.llvm.org/D99433



More information about the llvm-commits mailing list