[PATCH] D155995: [AMDGPU] WIP: Allow matching into v_dot4
Jeffrey Byrnes via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 4 13:25:20 PDT 2023
jrbyrnes added a comment.
SLP vectorization should be tuned but that seems like a separate issue. Trees corresponding to v_dot4 often have s/zext as the final dest is 32 bit, but the arithmetic operations involve 8 bit operands. By introducing s/zext into the tree, we confuse the SLP vectorization cost model as it thinks it is vectorizing 32bit operands. The main issue is that cost model only looks at one node of the vectorizable tree at a time to calculate cost, instead of also considering the sequence as a whole. If we were to vectorize, codegen may be significantly less complex for these.
Plan is to move forward with this patch, and potentially tune vectorization in later work.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D155995/new/
https://reviews.llvm.org/D155995
More information about the llvm-commits
mailing list