[PATCH] D155995: [AMDGPU] WIP: Allow matching into v_dot4

Fri Aug 4 13:25:20 PDT 2023

jrbyrnes added a comment.

SLP vectorization should be tuned but that seems like a separate issue. Trees corresponding to v_dot4 often have s/zext as the final dest is 32 bit, but the arithmetic operations involve 8 bit operands. By introducing s/zext into the tree, we confuse the SLP vectorization cost model as it thinks it is vectorizing 32bit operands. The main issue is that cost model only looks at one node of the vectorizable tree at a time to calculate cost, instead of also considering the sequence as a whole. If we were to vectorize, codegen may be significantly less complex for these.

Plan is to move forward with this patch, and potentially tune vectorization in later work.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155995/new/

https://reviews.llvm.org/D155995