[PATCH] D155995: [AMDGPU] WIP: Allow matching into v_dot4
Jeffrey Byrnes via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 26 16:53:52 PDT 2023
jrbyrnes planned changes to this revision.
jrbyrnes added a comment.
Nothing necessarily planned at the moment, just want to block the review for now.
It may make more sense to tune vectorization cost model (for i8 and potentially i16) to produce something like
%m = mul < n x i8> v0, v1
%o = llvm.vector.reduce.add.vni8(%m)
%op.rdx = add %o, %scalar
Then lower to mfma or v_dot in CodeGenPrepare.
e.g.
%op.rdx = v_dot4_i32_i8 %m, %o, %scalar
Instead of scalarizing the sequence and trying to combine all possible variants.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D155995/new/
https://reviews.llvm.org/D155995
More information about the llvm-commits
mailing list