[llvm] [AMDGPU] Vectorize more 16 bit shuffles (PR #90648)

Tue May 7 13:01:11 PDT 2024

================
@@ -1135,22 +1135,35 @@ InstructionCost GCNTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
   if (IsExtractSubvector)
     Kind = TTI::SK_PermuteSingleSrc;
 
+  if (!isa<FixedVectorType>(VT))
+    return BaseT::getShuffleCost(Kind, VT, Mask, CostKind, Index, SubTp);
+
+  unsigned NumVectorElts = cast<FixedVectorType>(VT)->getNumElements();
+
   if (ST->hasVOP3PInsts()) {
-    if (cast<FixedVectorType>(VT)->getNumElements() == 2 &&
+    if (!(NumVectorElts % 2) &&
         DL.getTypeSizeInBits(VT->getElementType()) == 16) {
       // With op_sel VOP3P instructions freely can access the low half or high
       // half of a register, so any swizzle is free.
 
       switch (Kind) {
       case TTI::SK_Broadcast:
       case TTI::SK_Reverse:
-      case TTI::SK_PermuteSingleSrc:
-        return 0;
+      case TTI::SK_PermuteSingleSrc: {
----------------
arsenm wrote:

Should we be specifically handling the other ShuffleKinds? I don't understand what's going on below with the IsExtractSubvector override. 



https://github.com/llvm/llvm-project/pull/90648