[llvm] [AMDGPU] Vectorize i8 Shuffles (PR #95840)

Wed Jul 3 09:34:36 PDT 2024

================
@@ -337,9 +349,11 @@ unsigned GCNTTIImpl::getMinVectorRegisterBitWidth() const {
 unsigned GCNTTIImpl::getMaximumVF(unsigned ElemWidth, unsigned Opcode) const {
   if (Opcode == Instruction::Load || Opcode == Instruction::Store)
     return 32 * 4 / ElemWidth;
-  return (ElemWidth == 16 && ST->has16BitInsts()) ? 2
-       : (ElemWidth == 32 && ST->hasPackedFP32Ops()) ? 2
-       : 1;
+
+  return (ElemWidth == 8)                              ? 4
----------------
jrbyrnes wrote:

Only a local branch -- when I was working on it (a long while ago) I couldn't find a way around making i8s legal, but I could always take a look. This whole chain of work is a workaround for v4i8 not being legal, and there are still other areas that need work arounds -- mfmas using i32s in place of v4i8 and the weird isel patterns that causes come to mind.

Even if we were to reinvestigate that though, I think we should implement the workaround and delete later.

https://github.com/llvm/llvm-project/pull/95840