[llvm] AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (PR #168818)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 20 09:38:43 PST 2025
Nicolai =?utf-8?q?Hähnle?= <nicolai.haehnle at amd.com>,
Nicolai =?utf-8?q?Hähnle?= <nicolai.haehnle at amd.com>,
Nicolai =?utf-8?q?Hähnle?= <nicolai.haehnle at amd.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/168818 at github.com>
================
@@ -1241,46 +1241,108 @@ InstructionCost GCNTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
(ScalarSize == 16 || ScalarSize == 8)) {
// Larger vector widths may require additional instructions, but are
// typically cheaper than scalarized versions.
- unsigned NumVectorElts = cast<FixedVectorType>(SrcTy)->getNumElements();
- unsigned RequestedElts =
- count_if(Mask, [](int MaskElt) { return MaskElt != -1; });
- unsigned EltsPerReg = 32 / ScalarSize;
- if (RequestedElts == 0)
+ //
+ // We assume that shuffling at a register granularity can be done for free.
+ // This is not true for vectors fed into memory instructions, but it is
+ // effectively true for all other shuffling. The emphasis of the logic here
+ // is to assist generic transform in cleaning up / canonicalizing those
+ // shuffles.
+ unsigned NumDstElts = cast<FixedVectorType>(DstTy)->getNumElements();
+ unsigned NumSrcElts = cast<FixedVectorType>(SrcTy)->getNumElements();
----------------
arsenm wrote:
Can you keep this in terms of ElementCount to avoid crashing on scalable vectors
https://github.com/llvm/llvm-project/pull/168818
More information about the llvm-commits
mailing list