[llvm] [AMDGPU] Enable vectorization of i8 values. (PR #134934)

Wed Apr 23 16:47:01 PDT 2025

================
@@ -344,9 +344,10 @@ unsigned GCNTTIImpl::getMinVectorRegisterBitWidth() const {
 unsigned GCNTTIImpl::getMaximumVF(unsigned ElemWidth, unsigned Opcode) const {
   if (Opcode == Instruction::Load || Opcode == Instruction::Store)
     return 32 * 4 / ElemWidth;
-  return (ElemWidth == 16 && ST->has16BitInsts()) ? 2
-       : (ElemWidth == 32 && ST->hasPackedFP32Ops()) ? 2
-       : 1;
+  return ElemWidth == 8                                ? 4
----------------
jrbyrnes wrote:

The TTI::get*Cost queries are called in many places. Can you please double check that the ones we have implemented in AMDGPUTargetTransformInfo either revert to the base handling or are explicitly customized for i8

https://github.com/llvm/llvm-project/pull/134934