[llvm] [AMDGPU] Vectorize i8 Shuffles (PR #95840)
    Jeffrey Byrnes via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Mon Jul  1 16:23:30 PDT 2024
    
    
  
================
@@ -337,9 +349,11 @@ unsigned GCNTTIImpl::getMinVectorRegisterBitWidth() const {
 unsigned GCNTTIImpl::getMaximumVF(unsigned ElemWidth, unsigned Opcode) const {
   if (Opcode == Instruction::Load || Opcode == Instruction::Store)
     return 32 * 4 / ElemWidth;
-  return (ElemWidth == 16 && ST->has16BitInsts()) ? 2
-       : (ElemWidth == 32 && ST->hasPackedFP32Ops()) ? 2
-       : 1;
+
+  return (ElemWidth == 8)                              ? 4
----------------
jrbyrnes wrote:
Not really, no -- 
SLP will only attempt to vectorize candidates which have a vectorization factor greater than the number of parts (https://github.com/llvm/llvm-project/blob/ffca4ef5b1a8eff6097454df4b0f212e2393e41e/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L16246). So, while we could theoretically increase the maximum VF to 8 or 16, this would only vectorize shuffles of size 8 or 16, and miss vectorization on the most important case of v4i8.
https://github.com/llvm/llvm-project/pull/95840
    
    
More information about the llvm-commits
mailing list