[PATCH] D155995: [AMDGPU] WIP: Allow matching into v_dot4

Wed Jul 26 16:53:52 PDT 2023

jrbyrnes planned changes to this revision.
jrbyrnes added a comment.

Nothing necessarily planned at the moment, just want to block the review for now.

It may make more sense to tune vectorization cost model (for i8 and potentially i16) to produce something like

  %m = mul < n x i8> v0, v1
  %o = llvm.vector.reduce.add.vni8(%m)
  %op.rdx = add %o, %scalar

Then lower to mfma or v_dot in CodeGenPrepare.

e.g.

  %op.rdx = v_dot4_i32_i8 %m, %o, %scalar

Instead of scalarizing the sequence and trying to combine all possible variants.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155995/new/

https://reviews.llvm.org/D155995