[llvm] [AArch64][ARM] Optimize more `tbl`/`tbx` calls into `shufflevector` (PR #169748)
David Green via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 4 10:11:24 PST 2025
================
@@ -737,42 +737,122 @@ static Instruction *foldCtpop(IntrinsicInst &II, InstCombinerImpl &IC) {
return nullptr;
}
-/// Convert a table lookup to shufflevector if the mask is constant.
-/// This could benefit tbl1 if the mask is { 7,6,5,4,3,2,1,0 }, in
-/// which case we could lower the shufflevector with rev64 instructions
-/// as it's actually a byte reverse.
-static Value *simplifyNeonTbl1(const IntrinsicInst &II,
- InstCombiner::BuilderTy &Builder) {
+/// Convert `tbl`/`tbx` intrinsics to shufflevector if the mask is constant, and
+/// at most two source operands are actually referenced.
+static Instruction *simplifyNeonTbl(IntrinsicInst &II, InstCombiner &IC,
+ bool IsExtension) {
// Bail out if the mask is not a constant.
- auto *C = dyn_cast<Constant>(II.getArgOperand(1));
+ auto *C = dyn_cast<Constant>(II.getArgOperand(II.arg_size() - 1));
if (!C)
return nullptr;
- auto *VecTy = cast<FixedVectorType>(II.getType());
- unsigned NumElts = VecTy->getNumElements();
+ auto *RetTy = cast<FixedVectorType>(II.getType());
+ unsigned NumIndexes = RetTy->getNumElements();
- // Only perform this transformation for <8 x i8> vector types.
- if (!VecTy->getElementType()->isIntegerTy(8) || NumElts != 8)
+ // Only perform this transformation for <8 x i8> and <16 x i8> vector types.
+ if (!(RetTy->getElementType()->isIntegerTy(8) &&
+ (NumIndexes == 8 || NumIndexes == 16)))
return nullptr;
- int Indexes[8];
+ // For tbx instructions, the first argument is the "fallback" vector, which
+ // has the same length as the mask and return type.
+ unsigned int StartIndex = (unsigned)IsExtension;
+ auto *SourceTy =
+ cast<FixedVectorType>(II.getArgOperand(StartIndex)->getType());
+ // Note that the element count of each source vector does *not* need to be the
+ // same as the element count of the return type and mask! All source vectors
+ // must have the same element count as each other, though.
+ unsigned NumElementsPerSource = SourceTy->getNumElements();
+
+ // There are no tbl/tbx intrinsics for which the destination size exceeds the
+ // source size. However, our definitions of the intrinsics, at least in
+ // IntrinsicsAArch64.td, allow for arbitrary destination vector sizes, so it
+ // *could* technically happen.
+ if (NumIndexes > NumElementsPerSource) {
----------------
davemgreen wrote:
The general rule for intrinsics is that we only support the types produced from the frontend. But it's good to be safe.
Blocks in llvm with a single statement can drop the {} brackets.
https://github.com/llvm/llvm-project/pull/169748
More information about the llvm-commits
mailing list