[llvm] [AMDGPU] Enable vectorization of i8 values. (PR #134934)

Mon Jun 16 23:11:13 PDT 2025

================
@@ -1443,3 +1447,31 @@ void GCNTTIImpl::collectKernelLaunchBounds(
   LB.push_back({"amdgpu-waves-per-eu[0]", WavesPerEU.first});
   LB.push_back({"amdgpu-waves-per-eu[1]", WavesPerEU.second});
 }
+
+InstructionCost GCNTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
+                                            Align Alignment,
+                                            unsigned AddressSpace,
+                                            TTI::TargetCostKind CostKind,
+                                            TTI::OperandValueInfo OpInfo,
+                                            const Instruction *I) const {
+  if (VectorType *VecTy = dyn_cast<VectorType>(Src)) {
+    if ((Opcode == Instruction::Load || Opcode == Instruction::Store) &&
+        VecTy->getElementType()->isIntegerTy(8)) {
+      return ((DL.getTypeSizeInBits(VecTy) - 1) /
+              getLoadStoreVecRegBitWidth(AddressSpace)) +
+             1;
----------------
arsenm wrote:

This is treating each load/store as a cost of 1. We ought to be treating memory operations as more expensive? We weren't already overriding those costs? 

https://github.com/llvm/llvm-project/pull/134934