[llvm] [AMDGPU] Enable vectorization of i8 values. (PR #134934)
Jeffrey Byrnes via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 23 17:06:31 PDT 2025
================
@@ -344,9 +344,10 @@ unsigned GCNTTIImpl::getMinVectorRegisterBitWidth() const {
unsigned GCNTTIImpl::getMaximumVF(unsigned ElemWidth, unsigned Opcode) const {
if (Opcode == Instruction::Load || Opcode == Instruction::Store)
return 32 * 4 / ElemWidth;
- return (ElemWidth == 16 && ST->has16BitInsts()) ? 2
- : (ElemWidth == 32 && ST->hasPackedFP32Ops()) ? 2
- : 1;
+ return ElemWidth == 8 ? 4
+ : (ElemWidth == 16 && ST->has16BitInsts()) ? 2
----------------
jrbyrnes wrote:
Probably we only wanted to vectorize i16s if we had the instruction support for vector i16 instructions. But if we are going to vectorize types that have limited / no vectorized instruction support then we should do it consistently, which implies removing the `ST->has16BitInsts()` check and inserting it in the right places for the cost queries.
https://github.com/llvm/llvm-project/pull/134934
More information about the llvm-commits
mailing list