[llvm] [AMDGPU] Enable vectorization of i8 values. (PR #134934)

Thu Jun 12 12:27:28 PDT 2025

================
@@ -344,9 +344,12 @@ unsigned GCNTTIImpl::getMinVectorRegisterBitWidth() const {
 unsigned GCNTTIImpl::getMaximumVF(unsigned ElemWidth, unsigned Opcode) const {
   if (Opcode == Instruction::Load || Opcode == Instruction::Store)
     return 32 * 4 / ElemWidth;
-  return (ElemWidth == 16 && ST->has16BitInsts()) ? 2
-       : (ElemWidth == 32 && ST->hasPackedFP32Ops()) ? 2
-       : 1;
+  // For a given width return the max 0number of elements that can be combined
+  // into a wider bit value:
+  return ElemWidth == 8                                ? 4
----------------
jrbyrnes wrote:

I think we should add `ST->isGFX8Plus()` / `ST->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS` -- v_perm is only available on these architectures and is the reason why vectorized shuffles have a cost benefit

https://github.com/llvm/llvm-project/pull/134934