[llvm] [AMDGPU] Vectorize more 16 bit shuffles (PR #90648)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Tue May 7 13:01:11 PDT 2024
================
@@ -1135,22 +1135,35 @@ InstructionCost GCNTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
if (IsExtractSubvector)
Kind = TTI::SK_PermuteSingleSrc;
+ if (!isa<FixedVectorType>(VT))
+ return BaseT::getShuffleCost(Kind, VT, Mask, CostKind, Index, SubTp);
+
+ unsigned NumVectorElts = cast<FixedVectorType>(VT)->getNumElements();
+
if (ST->hasVOP3PInsts()) {
- if (cast<FixedVectorType>(VT)->getNumElements() == 2 &&
+ if (!(NumVectorElts % 2) &&
DL.getTypeSizeInBits(VT->getElementType()) == 16) {
// With op_sel VOP3P instructions freely can access the low half or high
// half of a register, so any swizzle is free.
switch (Kind) {
case TTI::SK_Broadcast:
case TTI::SK_Reverse:
- case TTI::SK_PermuteSingleSrc:
- return 0;
+ case TTI::SK_PermuteSingleSrc: {
----------------
arsenm wrote:
Should we be specifically handling the other ShuffleKinds? I don't understand what's going on below with the IsExtractSubvector override.
https://github.com/llvm/llvm-project/pull/90648
More information about the llvm-commits
mailing list