[PATCH] D155995: [AMDGPU] WIP: Allow matching into v_dot4

Jeffrey Byrnes via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 26 16:53:52 PDT 2023


jrbyrnes planned changes to this revision.
jrbyrnes added a comment.

Nothing necessarily planned at the moment, just want to block the review for now.

It may make more sense to tune vectorization cost model (for i8 and potentially i16) to produce something like

  %m = mul < n x i8> v0, v1
  %o = llvm.vector.reduce.add.vni8(%m)
  %op.rdx = add %o, %scalar

Then lower to mfma or v_dot in CodeGenPrepare.

e.g.

  %op.rdx = v_dot4_i32_i8 %m, %o, %scalar

Instead of scalarizing the sequence and trying to combine all possible variants.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155995/new/

https://reviews.llvm.org/D155995



More information about the llvm-commits mailing list