[llvm] [AArch64][ARM] Optimize more `tbl`/`tbx` calls into `shufflevector` (PR #169748)
via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 26 15:49:04 PST 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-aarch64
Author: None (valadaptive)
<details>
<summary>Changes</summary>
Resolves https://github.com/llvm/llvm-project/issues/169701. This PR depends on https://github.com/llvm/llvm-project/pull/169589; the last two commits are new.
This PR extends the existing InstCombine operation which folds `tbl1` intrinsics to `shufflevector` if the mask operand is constant. Before this change, it only handled 64-bit `tbl1` intrinsics with no out-of-bounds indices. I've extended it to support both 64-bit and 128-bit vectors, and it now handles the full range of `tbl1`-`tbl4` and `tbx1`-`tbx4`, as long as at most two of the input operands are actually indexed into.
For the purposes of `tbl`, we need a dummy vector of zeroes if there are any out-of-bounds indices, and for the purposes of `tbx`, we use the "fallback" operand. Both of those take up an operand for the purposes of `shufflevector`.
This works a lot like https://github.com/llvm/llvm-project/pull/169110, with some added complexity because we need to handle multiple operands. I raised a couple questions in that PR that still need to be answered:
- Is it correct to check `IsA<UndefValue>` for each mask index, and set the output mask index to -1 if so? This is later folded to a poison value, and I'm not sure about the subtle differences between poison and undef and when you can substitute one for the other. As I mentioned in #<!-- -->169110, the existing x86 pass (`simplifyX86vpermilvar`) already behaves this way when it comes to undef.
- How can I write an Alive2 proof for this? It's very hard to find good documentation or tutorials about Alive2.
As with #<!-- -->169110, most of the regression test cases were generated using Claude. Everything else was written by me.
---
Patch is 57.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169748.diff
16 Files Affected:
- (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+21)
- (modified) llvm/lib/Target/AArch64/CMakeLists.txt (+1)
- (modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp (+23)
- (modified) llvm/lib/Target/ARM/CMakeLists.txt (+1)
- (added) llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.cpp (+219)
- (added) llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.h (+56)
- (added) llvm/lib/Target/ARMCommon/CMakeLists.txt (+8)
- (modified) llvm/lib/Target/CMakeLists.txt (+5)
- (modified) llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp (-104)
- (modified) llvm/test/Transforms/InstCombine/AArch64/aes-intrinsics.ll (+1-1)
- (added) llvm/test/Transforms/InstCombine/AArch64/tbl.ll (+269)
- (removed) llvm/test/Transforms/InstCombine/AArch64/tbl1.ll (-65)
- (modified) llvm/test/Transforms/InstCombine/ARM/2012-04-23-Neon-Intrinsics.ll (+1-1)
- (modified) llvm/test/Transforms/InstCombine/ARM/aes-intrinsics.ll (+1-1)
- (added) llvm/test/Transforms/InstCombine/ARM/tbl.ll (+215)
- (removed) llvm/test/Transforms/InstCombine/ARM/tbl1.ll (-35)
``````````diff
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 0bae00bafee3c..4a53e5bd49c70 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -7,6 +7,7 @@
//===----------------------------------------------------------------------===//
#include "AArch64TargetTransformInfo.h"
+#include "../ARMCommon/ARMCommonInstCombineIntrinsic.h"
#include "AArch64ExpandImm.h"
#include "AArch64PerfectShuffle.h"
#include "AArch64SMEAttributes.h"
@@ -2856,6 +2857,26 @@ AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
case Intrinsic::aarch64_neon_fmaxnm:
case Intrinsic::aarch64_neon_fminnm:
return instCombineMaxMinNM(IC, II);
+ case Intrinsic::aarch64_neon_tbl1:
+ case Intrinsic::aarch64_neon_tbl2:
+ case Intrinsic::aarch64_neon_tbl3:
+ case Intrinsic::aarch64_neon_tbl4:
+ return ARMCommon::simplifyNeonTbl(II, IC, /*IsExtension=*/false);
+ case Intrinsic::aarch64_neon_tbx1:
+ case Intrinsic::aarch64_neon_tbx2:
+ case Intrinsic::aarch64_neon_tbx3:
+ case Intrinsic::aarch64_neon_tbx4:
+ return ARMCommon::simplifyNeonTbl(II, IC, /*IsExtension=*/true);
+ case Intrinsic::aarch64_neon_smull:
+ case Intrinsic::aarch64_neon_umull: {
+ bool IsSigned = IID == Intrinsic::aarch64_neon_smull;
+ return ARMCommon::simplifyNeonMultiply(II, IC, IsSigned);
+ }
+ case Intrinsic::aarch64_crypto_aesd:
+ case Intrinsic::aarch64_crypto_aese:
+ case Intrinsic::aarch64_sve_aesd:
+ case Intrinsic::aarch64_sve_aese:
+ return ARMCommon::simplifyAES(II, IC);
case Intrinsic::aarch64_sve_convert_from_svbool:
return instCombineConvertFromSVBool(IC, II);
case Intrinsic::aarch64_sve_dup:
diff --git a/llvm/lib/Target/AArch64/CMakeLists.txt b/llvm/lib/Target/AArch64/CMakeLists.txt
index 285d646293eb7..d27a698ee9e4a 100644
--- a/llvm/lib/Target/AArch64/CMakeLists.txt
+++ b/llvm/lib/Target/AArch64/CMakeLists.txt
@@ -101,6 +101,7 @@ add_llvm_target(AArch64CodeGen
AArch64Desc
AArch64Info
AArch64Utils
+ ARMCommon
Analysis
AsmPrinter
CFGuard
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index fdb0ec40cb41f..99d57b00315b1 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -7,6 +7,7 @@
//===----------------------------------------------------------------------===//
#include "ARMTargetTransformInfo.h"
+#include "../ARMCommon/ARMCommonInstCombineIntrinsic.h"
#include "ARMSubtarget.h"
#include "MCTargetDesc/ARMAddressingModes.h"
#include "llvm/ADT/APInt.h"
@@ -182,6 +183,28 @@ ARMTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
break;
}
+ case Intrinsic::arm_neon_vtbl1:
+ case Intrinsic::arm_neon_vtbl2:
+ case Intrinsic::arm_neon_vtbl3:
+ case Intrinsic::arm_neon_vtbl4:
+ return ARMCommon::simplifyNeonTbl(II, IC, /*IsExtension=*/false);
+
+ case Intrinsic::arm_neon_vtbx1:
+ case Intrinsic::arm_neon_vtbx2:
+ case Intrinsic::arm_neon_vtbx3:
+ case Intrinsic::arm_neon_vtbx4:
+ return ARMCommon::simplifyNeonTbl(II, IC, /*IsExtension=*/true);
+
+ case Intrinsic::arm_neon_vmulls:
+ case Intrinsic::arm_neon_vmullu: {
+ bool IsSigned = IID == Intrinsic::arm_neon_vmulls;
+ return ARMCommon::simplifyNeonMultiply(II, IC, IsSigned);
+ }
+
+ case Intrinsic::arm_neon_aesd:
+ case Intrinsic::arm_neon_aese:
+ return ARMCommon::simplifyAES(II, IC);
+
case Intrinsic::arm_mve_pred_i2v: {
Value *Arg = II.getArgOperand(0);
Value *ArgArg;
diff --git a/llvm/lib/Target/ARM/CMakeLists.txt b/llvm/lib/Target/ARM/CMakeLists.txt
index eb3ad01a54fb2..9fc9bc134e5cc 100644
--- a/llvm/lib/Target/ARM/CMakeLists.txt
+++ b/llvm/lib/Target/ARM/CMakeLists.txt
@@ -73,6 +73,7 @@ add_llvm_target(ARMCodeGen
Thumb2SizeReduction.cpp
LINK_COMPONENTS
+ ARMCommon
ARMDesc
ARMInfo
ARMUtils
diff --git a/llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.cpp b/llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.cpp
new file mode 100644
index 0000000000000..df58dbc6df38f
--- /dev/null
+++ b/llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.cpp
@@ -0,0 +1,219 @@
+//===- ARMCommonInstCombineIntrinsic.cpp -
+// instCombineIntrinsic opts for both ARM and AArch64 ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file contains optimizations for ARM and AArch64 intrinsics that
+/// are shared between both architectures. These functions can be called from:
+/// - ARM TTI's instCombineIntrinsic (for arm_neon_* intrinsics)
+/// - AArch64 TTI's instCombineIntrinsic (for aarch64_neon_* and aarch64_sve_*
+/// intrinsics)
+///
+//===----------------------------------------------------------------------===//
+
+#include "ARMCommonInstCombineIntrinsic.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/DerivedTypes.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Value.h"
+#include "llvm/Transforms/InstCombine/InstCombiner.h"
+
+using namespace llvm;
+using namespace llvm::PatternMatch;
+
+namespace llvm {
+namespace ARMCommon {
+
+/// Convert `tbl`/`tbx` intrinsics to shufflevector if the mask is constant, and
+/// at most two source operands are actually referenced.
+Instruction *simplifyNeonTbl(IntrinsicInst &II, InstCombiner &IC,
+ bool IsExtension) {
+ // Bail out if the mask is not a constant.
+ auto *C = dyn_cast<Constant>(II.getArgOperand(II.arg_size() - 1));
+ if (!C)
+ return nullptr;
+
+ auto *RetTy = cast<FixedVectorType>(II.getType());
+ unsigned NumIndexes = RetTy->getNumElements();
+
+ // Only perform this transformation for <8 x i8> and <16 x i8> vector types.
+ // Even the language-level intrinsics that operate on u8/p8 should lower to an
+ // LLVM intrinsic that operates on i8.
+ if (!(RetTy->getElementType()->isIntegerTy(8) &&
+ (NumIndexes == 8 || NumIndexes == 16)))
+ return nullptr;
+
+ // For tbx instructions, the first argument is the "fallback" vector, which
+ // has the same length as the mask and return type.
+ unsigned int StartIndex = (unsigned)IsExtension;
+ auto *SourceTy =
+ cast<FixedVectorType>(II.getArgOperand(StartIndex)->getType());
+ // Note that the element count of each source vector does *not* need to be the
+ // same as the element count of the return type and mask! All source vectors
+ // must have the same element count as each other, though.
+ unsigned NumElementsPerSource = SourceTy->getNumElements();
+
+ // There are no tbl/tbx intrinsics for which the destination size exceeds the
+ // source size. However, our definitions of the intrinsics, at least in
+ // IntrinsicsAArch64.td, allow for arbitrary destination vector sizes, so it
+ // *could* technically happen.
+ if (NumIndexes > NumElementsPerSource) {
+ return nullptr;
+ }
+
+ // The tbl/tbx intrinsics take several source operands followed by a mask
+ // operand.
+ unsigned int NumSourceOperands = II.arg_size() - 1 - (unsigned)IsExtension;
+
+ // Map input operands to shuffle indices. This also helpfully deduplicates the
+ // input arguments, in case the same value is passed as an argument multiple
+ // times.
+ SmallDenseMap<Value *, unsigned, 2> ValueToShuffleSlot;
+ Value *ShuffleOperands[2] = {PoisonValue::get(SourceTy),
+ PoisonValue::get(SourceTy)};
+
+ int Indexes[16];
+ for (unsigned I = 0; I < NumIndexes; ++I) {
+ Constant *COp = C->getAggregateElement(I);
+
+ if (!COp || (!isa<UndefValue>(COp) && !isa<ConstantInt>(COp)))
+ return nullptr;
+
+ if (isa<UndefValue>(COp)) {
+ Indexes[I] = -1;
+ continue;
+ }
+
+ uint64_t Index = cast<ConstantInt>(COp)->getZExtValue();
+ // The index of the input argument that this index references (0 = first
+ // source argument, etc).
+ unsigned SourceOperandIndex = Index / NumElementsPerSource;
+ // The index of the element at that source operand.
+ unsigned SourceOperandElementIndex = Index % NumElementsPerSource;
+
+ Value *SourceOperand;
+ if (SourceOperandIndex >= NumSourceOperands) {
+ // This index is out of bounds. Map it to index into either the fallback
+ // vector (tbx) or vector of zeroes (tbl).
+ SourceOperandIndex = NumSourceOperands;
+ if (IsExtension) {
+ // For out-of-bounds indices in tbx, choose the `I`th element of the
+ // fallback.
+ SourceOperand = II.getArgOperand(0);
+ SourceOperandElementIndex = I;
+ } else {
+ // Otherwise, choose some element from the dummy vector of zeroes (we'll
+ // always choose the first).
+ SourceOperand = Constant::getNullValue(SourceTy);
+ SourceOperandElementIndex = 0;
+ }
+ } else {
+ SourceOperand = II.getArgOperand(SourceOperandIndex + StartIndex);
+ }
+
+ // The source operand may be the fallback vector, which may not have the
+ // same number of elements as the source vector. In that case, we *could*
+ // choose to extend its length with another shufflevector, but it's simpler
+ // to just bail instead.
+ if (cast<FixedVectorType>(SourceOperand->getType())->getNumElements() !=
+ NumElementsPerSource) {
+ return nullptr;
+ }
+
+ // We now know the source operand referenced by this index. Make it a
+ // shufflevector operand, if it isn't already.
+ unsigned NumSlots = ValueToShuffleSlot.size();
+ // This shuffle references more than two sources, and hence cannot be
+ // represented as a shufflevector.
+ if (NumSlots == 2 && !ValueToShuffleSlot.contains(SourceOperand)) {
+ return nullptr;
+ }
+ auto [It, Inserted] =
+ ValueToShuffleSlot.try_emplace(SourceOperand, NumSlots);
+ if (Inserted) {
+ ShuffleOperands[It->getSecond()] = SourceOperand;
+ }
+
+ unsigned RemappedIndex =
+ (It->getSecond() * NumElementsPerSource) + SourceOperandElementIndex;
+ Indexes[I] = RemappedIndex;
+ }
+
+ Value *Shuf = IC.Builder.CreateShuffleVector(
+ ShuffleOperands[0], ShuffleOperands[1], ArrayRef(Indexes, NumIndexes));
+ return IC.replaceInstUsesWith(II, Shuf);
+}
+
+/// Simplify NEON multiply-long intrinsics (smull, umull).
+/// These intrinsics perform widening multiplies: they multiply two vectors of
+/// narrow integers and produce a vector of wider integers. This function
+/// performs algebraic simplifications:
+/// 1. Multiply by zero => zero vector
+/// 2. Multiply by one => zero/sign-extend the non-one operand
+/// 3. Both operands constant => regular multiply that can be constant-folded
+/// later
+Instruction *simplifyNeonMultiply(IntrinsicInst &II, InstCombiner &IC,
+ bool IsSigned) {
+ Value *Arg0 = II.getArgOperand(0);
+ Value *Arg1 = II.getArgOperand(1);
+
+ // Handle mul by zero first:
+ if (isa<ConstantAggregateZero>(Arg0) || isa<ConstantAggregateZero>(Arg1)) {
+ return IC.replaceInstUsesWith(II, ConstantAggregateZero::get(II.getType()));
+ }
+
+ // Check for constant LHS & RHS - in this case we just simplify.
+ VectorType *NewVT = cast<VectorType>(II.getType());
+ if (Constant *CV0 = dyn_cast<Constant>(Arg0)) {
+ if (Constant *CV1 = dyn_cast<Constant>(Arg1)) {
+ Value *V0 = IC.Builder.CreateIntCast(CV0, NewVT, IsSigned);
+ Value *V1 = IC.Builder.CreateIntCast(CV1, NewVT, IsSigned);
+ return IC.replaceInstUsesWith(II, IC.Builder.CreateMul(V0, V1));
+ }
+
+ // Couldn't simplify - canonicalize constant to the RHS.
+ std::swap(Arg0, Arg1);
+ }
+
+ // Handle mul by one:
+ if (Constant *CV1 = dyn_cast<Constant>(Arg1))
+ if (ConstantInt *Splat =
+ dyn_cast_or_null<ConstantInt>(CV1->getSplatValue()))
+ if (Splat->isOne())
+ return CastInst::CreateIntegerCast(Arg0, II.getType(), IsSigned);
+
+ return nullptr;
+}
+
+/// Simplify AES encryption/decryption intrinsics (AESE, AESD).
+///
+/// ARM's AES instructions (AESE/AESD) XOR the data and the key, provided as
+/// separate arguments, before performing the encryption/decryption operation.
+/// We can fold that "internal" XOR with a previous one.
+Instruction *simplifyAES(IntrinsicInst &II, InstCombiner &IC) {
+ Value *DataArg = II.getArgOperand(0);
+ Value *KeyArg = II.getArgOperand(1);
+
+ // Accept zero on either operand.
+ if (!match(KeyArg, m_ZeroInt()))
+ std::swap(KeyArg, DataArg);
+
+ // Try to use the builtin XOR in AESE and AESD to eliminate a prior XOR
+ Value *Data, *Key;
+ if (match(KeyArg, m_ZeroInt()) &&
+ match(DataArg, m_Xor(m_Value(Data), m_Value(Key)))) {
+ IC.replaceOperand(II, 0, Data);
+ IC.replaceOperand(II, 1, Key);
+ return &II;
+ }
+
+ return nullptr;
+}
+
+} // namespace ARMCommon
+} // namespace llvm
diff --git a/llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.h b/llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.h
new file mode 100644
index 0000000000000..319aee48ccb0d
--- /dev/null
+++ b/llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.h
@@ -0,0 +1,56 @@
+//===- ARMCommonInstCombineIntrinsic.h -
+// instCombineIntrinsic opts for both ARM and AArch64 -----------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file contains optimizations for ARM and AArch64 intrinsics that
+/// are shared between both architectures. These functions can be called from:
+/// - ARM TTI's instCombineIntrinsic (for arm_neon_* intrinsics)
+/// - AArch64 TTI's instCombineIntrinsic (for aarch64_neon_* and aarch64_sve_*
+/// intrinsics)
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIB_TARGET_ARMCOMMON_ARMCOMMONINSTCOMBINEINTRINSIC_H
+#define LLVM_LIB_TARGET_ARMCOMMON_ARMCOMMONINSTCOMBINEINTRINSIC_H
+
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Value.h"
+#include "llvm/Transforms/InstCombine/InstCombiner.h"
+
+namespace llvm {
+
+namespace ARMCommon {
+
+/// Convert `tbl`/`tbx` intrinsics to shufflevector if the mask is constant, and
+/// at most two source operands are actually referenced.
+Instruction *simplifyNeonTbl(IntrinsicInst &II, InstCombiner &IC,
+ bool IsExtension);
+
+/// Simplify NEON multiply-long intrinsics (smull, umull).
+/// These intrinsics perform widening multiplies: they multiply two vectors of
+/// narrow integers and produce a vector of wider integers. This function
+/// performs algebraic simplifications:
+/// 1. Multiply by zero => zero vector
+/// 2. Multiply by one => zero/sign-extend the non-one operand
+/// 3. Both operands constant => regular multiply that can be constant-folded
+/// later
+Instruction *simplifyNeonMultiply(IntrinsicInst &II, InstCombiner &IC,
+ bool IsSigned);
+
+/// Simplify AES encryption/decryption intrinsics (AESE, AESD).
+///
+/// ARM's AES instructions (AESE/AESD) XOR the data and the key, provided as
+/// separate arguments, before performing the encryption/decryption operation.
+/// We can fold that "internal" XOR with a previous one.
+Instruction *simplifyAES(IntrinsicInst &II, InstCombiner &IC);
+
+} // namespace ARMCommon
+} // namespace llvm
+
+#endif // LLVM_LIB_TARGET_ARMCOMMON_ARMCOMMONINSTCOMBINEINTRINSIC_H
diff --git a/llvm/lib/Target/ARMCommon/CMakeLists.txt b/llvm/lib/Target/ARMCommon/CMakeLists.txt
new file mode 100644
index 0000000000000..1805a5df2f053
--- /dev/null
+++ b/llvm/lib/Target/ARMCommon/CMakeLists.txt
@@ -0,0 +1,8 @@
+add_llvm_component_library(LLVMARMCommon
+ ARMCommonInstCombineIntrinsic.cpp
+
+ LINK_COMPONENTS
+ Core
+ Support
+ TransformUtils
+ )
diff --git a/llvm/lib/Target/CMakeLists.txt b/llvm/lib/Target/CMakeLists.txt
index bcc13f942bf96..e3528014a4be2 100644
--- a/llvm/lib/Target/CMakeLists.txt
+++ b/llvm/lib/Target/CMakeLists.txt
@@ -31,6 +31,11 @@ if (NOT BUILD_SHARED_LIBS AND NOT APPLE AND
set(CMAKE_CXX_VISIBILITY_PRESET hidden)
endif()
+# Add shared ARM/AArch64 utilities if either target is being built
+if("ARM" IN_LIST LLVM_TARGETS_TO_BUILD OR "AArch64" IN_LIST LLVM_TARGETS_TO_BUILD)
+ add_subdirectory(ARMCommon)
+endif()
+
foreach(t ${LLVM_TARGETS_TO_BUILD})
message(STATUS "Targeting ${t}")
add_subdirectory(${t})
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
index 8e4edefec42fd..8a54c0dde6be6 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -737,44 +737,6 @@ static Instruction *foldCtpop(IntrinsicInst &II, InstCombinerImpl &IC) {
return nullptr;
}
-/// Convert a table lookup to shufflevector if the mask is constant.
-/// This could benefit tbl1 if the mask is { 7,6,5,4,3,2,1,0 }, in
-/// which case we could lower the shufflevector with rev64 instructions
-/// as it's actually a byte reverse.
-static Value *simplifyNeonTbl1(const IntrinsicInst &II,
- InstCombiner::BuilderTy &Builder) {
- // Bail out if the mask is not a constant.
- auto *C = dyn_cast<Constant>(II.getArgOperand(1));
- if (!C)
- return nullptr;
-
- auto *VecTy = cast<FixedVectorType>(II.getType());
- unsigned NumElts = VecTy->getNumElements();
-
- // Only perform this transformation for <8 x i8> vector types.
- if (!VecTy->getElementType()->isIntegerTy(8) || NumElts != 8)
- return nullptr;
-
- int Indexes[8];
-
- for (unsigned I = 0; I < NumElts; ++I) {
- Constant *COp = C->getAggregateElement(I);
-
- if (!COp || !isa<ConstantInt>(COp))
- return nullptr;
-
- Indexes[I] = cast<ConstantInt>(COp)->getLimitedValue();
-
- // Make sure the mask indices are in range.
- if ((unsigned)Indexes[I] >= NumElts)
- return nullptr;
- }
-
- auto *V1 = II.getArgOperand(0);
- auto *V2 = Constant::getNullValue(V1->getType());
- return Builder.CreateShuffleVector(V1, V2, ArrayRef(Indexes));
-}
-
// Returns true iff the 2 intrinsics have the same operands, limiting the
// comparison to the first NumOperands.
static bool haveSameOperands(const IntrinsicInst &I, const IntrinsicInst &E,
@@ -3155,72 +3117,6 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
Intrinsic::getOrInsertDeclaration(II->getModule(), NewIntrin);
return CallInst::Create(NewFn, CallArgs);
}
- case Intrinsic::arm_neon_vtbl1:
- case Intrinsic::aarch64_neon_tbl1:
- if (Value *V = simplifyNeonTbl1(*II, Builder))
- return replaceInstUsesWith(*II, V);
- break;
-
- case Intrinsic::arm_neon_vmulls:
- case Intrinsic::arm_neon_vmullu:
- case Intrinsic::aarch64_neon_smull:
- case Intrinsic::aarch64_neon_umull: {
- Value *Arg0 = II->getArgOperand(0);
- Value *Arg1 = II->getArgOperand(1);
-
- // Handle mul by zero first:
- if (isa<ConstantAggregateZero>(Arg0) || isa<ConstantAggregateZero>(Arg1)) {
- return replaceInstUsesWith(CI, ConstantAggregateZero::get(II->getType()));
- }
-
- // Check for constant LHS & RHS - in this case we just simplify.
- bool Zext = (IID == Intrinsic::arm_neon_vmullu ||
- IID == Intrinsic::aarch64_neon_umull);
- VectorType *NewVT = cast<VectorType>(II->getType());
- if (Constant *CV0 = dyn_cast<Constant>(Arg0)) {
- if (Constant *CV1 = dyn_cast<Constant>(Arg1)) {
- Value *V0 = Builder.CreateIntCast(CV0, NewVT, /*isSigned=*/!Zext);
- Value *V1 = Builder.CreateIntCast(CV1, NewVT, /*isSigned=*/!Zext);
- return replaceInstUsesWith(CI, Builder.CreateMul(V0, V1));
- }
-
- // Couldn't simplify - canonicalize constant to the RHS.
- std::swap(Arg0, Arg1);
- }
-
- // Handle mul by one:
- if (Constant *CV1 = dyn_cast<Constant>(Arg1))
- if (ConstantInt *Splat =
- dyn_cast_or_null<ConstantInt>(CV1->getSplatValue()))
- if (Splat->isOne())
- return CastInst::CreateIn...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/169748
More information about the llvm-commits
mailing list