[llvm] [Intrinsics][AArch64] Add intrinsic to mask off aliasing vector lanes (PR #117007)
Sam Tebbs via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 10 06:36:54 PDT 2025
https://github.com/SamTebbs33 updated https://github.com/llvm/llvm-project/pull/117007
>From dbdd9e9133a188621f67543026f3ae40c1f3f7d3 Mon Sep 17 00:00:00 2001
From: Sam Tebbs <samuel.tebbs at arm.com>
Date: Fri, 15 Nov 2024 10:24:46 +0000
Subject: [PATCH 01/20] [Intrinsics][AArch64] Add intrinsic to mask off
aliasing vector lanes
It can be unsafe to load a vector from an address and write a vector to
an address if those two addresses have overlapping lanes within a
vectorised loop iteration.
This PR adds an intrinsic designed to create a mask with lanes disabled
if they overlap between the two pointer arguments, so that only safe
lanes are loaded, operated on and stored.
Along with the two pointer parameters, the intrinsic also takes an
immediate that represents the size in bytes of the vector element
types, as well as an immediate i1 that is true if there is a write
after-read-hazard or false if there is a read-after-write hazard.
This will be used by #100579 and replaces the existing lowering for
whilewr since that isn't needed now we have the intrinsic.
---
llvm/docs/LangRef.rst | 84 ++++
llvm/include/llvm/CodeGen/TargetLowering.h | 7 +
llvm/include/llvm/IR/Intrinsics.td | 5 +
.../SelectionDAG/SelectionDAGBuilder.cpp | 50 +++
.../Target/AArch64/AArch64ISelLowering.cpp | 76 +++-
llvm/lib/Target/AArch64/AArch64ISelLowering.h | 7 +
.../lib/Target/AArch64/AArch64SVEInstrInfo.td | 11 +-
llvm/lib/Target/AArch64/SVEInstrFormats.td | 10 +-
llvm/test/CodeGen/AArch64/alias_mask.ll | 421 ++++++++++++++++++
.../CodeGen/AArch64/alias_mask_scalable.ll | 195 ++++++++
10 files changed, 852 insertions(+), 14 deletions(-)
create mode 100644 llvm/test/CodeGen/AArch64/alias_mask.ll
create mode 100644 llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index deb87365ae8d7..ba317c7c8640b 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -23733,6 +23733,90 @@ Examples:
%active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %elem0, i64 429)
%wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> poison)
+.. _int_experimental_get_alias_lane_mask:
+
+'``llvm.experimental.get.alias.lane.mask.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+ declare <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64.i64(i64 %ptrA, i64 %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %ptrA, i64 %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64.i32(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.nxv16i1.i64.i32(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+
+
+Overview:
+"""""""""
+
+Create a mask representing lanes that do or not overlap between two pointers
+across one vector loop iteration.
+
+
+Arguments:
+""""""""""
+
+The first two arguments have the same scalar integer type.
+The final two are immediates and the result is a vector with the i1 element type.
+
+Semantics:
+""""""""""
+
+The intrinsic will return poison if ``%ptrA`` and ``%ptrB`` are within
+VF * ``%elementSize`` of each other and ``%ptrA`` + VF * ``%elementSize`` wraps.
+In other cases when ``%writeAfterRead`` is true, the
+'``llvm.experimental.get.alias.lane.mask.*``' intrinsics are semantically
+equivalent to:
+
+::
+
+ %diff = (%ptrB - %ptrA) / %elementSize
+ %m[i] = (icmp ult i, %diff) || (%diff <= 0)
+
+When the return value is not poison and ``%writeAfterRead`` is false, the
+'``llvm.experimental.get.alias.lane.mask.*``' intrinsics are semantically
+equivalent to:
+
+::
+
+ %diff = abs(%ptrB - %ptrA) / %elementSize
+ %m[i] = (icmp ult i, %diff) || (%diff == 0)
+
+where ``%m`` is a vector (mask) of active/inactive lanes with its elements
+indexed by ``i``, and ``%ptrA``, ``%ptrB`` are the two i64 arguments to
+``llvm.experimental.get.alias.lane.mask.*`` and ``%elementSize`` is the first
+immediate argument. The ``%writeAfterRead`` argument is expected to be true if
+``%ptrB`` is stored to after ``%ptrA`` is read from.
+The above is equivalent to:
+
+::
+
+ %m = @llvm.experimental.get.alias.lane.mask(%ptrA, %ptrB, %elementSize, %writeAfterRead)
+
+This can, for example, be emitted by the loop vectorizer in which case
+``%ptrA`` is a pointer that is read from within the loop, and ``%ptrB`` is a
+pointer that is stored to within the loop.
+If the difference between these pointers is less than the vector factor, then
+they overlap (alias) within a loop iteration.
+An example is if ``%ptrA`` is 20 and ``%ptrB`` is 23 with a vector factor of 8,
+then lanes 3, 4, 5, 6 and 7 of the vector loaded from ``%ptrA``
+share addresses with lanes 0, 1, 2, 3, 4 and 5 from the vector stored to at
+``%ptrB``.
+
+
+Examples:
+"""""""""
+
+.. code-block:: llvm
+
+ %alias.lane.mask = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64.i32(i64 %ptrA, i64 %ptrB, i32 4, i1 1)
+ %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison)
+ [...]
+ call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %alias.lane.mask)
.. _int_experimental_vp_splice:
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index a4c3d042fe3a4..7bb04593d34e5 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -469,6 +469,13 @@ class TargetLoweringBase {
return true;
}
+ /// Return true if the @llvm.experimental.get.alias.lane.mask intrinsic should
+ /// be expanded using generic code in SelectionDAGBuilder.
+ virtual bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT,
+ unsigned EltSize) const {
+ return true;
+ }
+
virtual bool shouldExpandGetVectorLength(EVT CountVT, unsigned VF,
bool IsScalable) const {
return true;
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 14ecae41ff08f..7625a501a596e 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2379,6 +2379,11 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<1>>
llvm_i32_ty]>;
}
+def int_experimental_get_alias_lane_mask:
+ DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+ [llvm_anyint_ty, LLVMMatchType<1>, llvm_anyint_ty, llvm_i1_ty],
+ [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>]>;
+
def int_get_active_lane_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_anyint_ty, LLVMMatchType<1>],
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 1c58a7f05446c..a98dfda2b6621 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8291,6 +8291,56 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
visitVectorExtractLastActive(I, Intrinsic);
return;
}
+ case Intrinsic::experimental_get_alias_lane_mask: {
+ SDValue SourceValue = getValue(I.getOperand(0));
+ SDValue SinkValue = getValue(I.getOperand(1));
+ SDValue EltSize = getValue(I.getOperand(2));
+ bool IsWriteAfterRead =
+ cast<ConstantSDNode>(getValue(I.getOperand(3)))->getZExtValue() != 0;
+ auto IntrinsicVT = EVT::getEVT(I.getType());
+ auto PtrVT = SourceValue->getValueType(0);
+
+ if (!TLI.shouldExpandGetAliasLaneMask(
+ IntrinsicVT, PtrVT,
+ cast<ConstantSDNode>(EltSize)->getSExtValue())) {
+ visitTargetIntrinsic(I, Intrinsic);
+ return;
+ }
+
+ SDValue Diff = DAG.getNode(ISD::SUB, sdl, PtrVT, SinkValue, SourceValue);
+ if (!IsWriteAfterRead)
+ Diff = DAG.getNode(ISD::ABS, sdl, PtrVT, Diff);
+
+ Diff = DAG.getNode(ISD::SDIV, sdl, PtrVT, Diff, EltSize);
+ SDValue Zero = DAG.getTargetConstant(0, sdl, PtrVT);
+
+ // If the difference is positive then some elements may alias
+ auto CmpVT =
+ TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), PtrVT);
+ SDValue Cmp = DAG.getSetCC(sdl, CmpVT, Diff, Zero,
+ IsWriteAfterRead ? ISD::SETLE : ISD::SETEQ);
+
+ // Splat the compare result then OR it with a lane mask
+ SDValue Splat = DAG.getSplat(IntrinsicVT, sdl, Cmp);
+
+ SDValue DiffMask;
+ // Don't emit an active lane mask if the target doesn't support it
+ if (TLI.shouldExpandGetActiveLaneMask(IntrinsicVT, PtrVT)) {
+ EVT VecTy = EVT::getVectorVT(*DAG.getContext(), PtrVT,
+ IntrinsicVT.getVectorElementCount());
+ SDValue DiffSplat = DAG.getSplat(VecTy, sdl, Diff);
+ SDValue VectorStep = DAG.getStepVector(sdl, VecTy);
+ DiffMask = DAG.getSetCC(sdl, IntrinsicVT, VectorStep, DiffSplat,
+ ISD::CondCode::SETULT);
+ } else {
+ DiffMask = DAG.getNode(
+ ISD::INTRINSIC_WO_CHAIN, sdl, IntrinsicVT,
+ DAG.getTargetConstant(Intrinsic::get_active_lane_mask, sdl, MVT::i64),
+ Zero, Diff);
+ }
+ SDValue Or = DAG.getNode(ISD::OR, sdl, IntrinsicVT, DiffMask, Splat);
+ setValue(&I, Or);
+ }
}
}
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 50be082777835..232c2227a3b51 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2053,6 +2053,25 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
return false;
}
+bool AArch64TargetLowering::shouldExpandGetAliasLaneMask(
+ EVT VT, EVT PtrVT, unsigned EltSize) const {
+ if (!Subtarget->hasSVE2())
+ return true;
+
+ if (PtrVT != MVT::i64)
+ return true;
+
+ if (VT == MVT::v2i1 || VT == MVT::nxv2i1)
+ return EltSize != 8;
+ if (VT == MVT::v4i1 || VT == MVT::nxv4i1)
+ return EltSize != 4;
+ if (VT == MVT::v8i1 || VT == MVT::nxv8i1)
+ return EltSize != 2;
+ if (VT == MVT::v16i1 || VT == MVT::nxv16i1)
+ return EltSize != 1;
+ return true;
+}
+
bool AArch64TargetLowering::shouldExpandPartialReductionIntrinsic(
const IntrinsicInst *I) const {
if (I->getIntrinsicID() != Intrinsic::experimental_vector_partial_reduce_add)
@@ -2835,6 +2854,8 @@ const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
MAKE_CASE(AArch64ISD::LS64_BUILD)
MAKE_CASE(AArch64ISD::LS64_EXTRACT)
MAKE_CASE(AArch64ISD::TBL)
+ MAKE_CASE(AArch64ISD::WHILEWR)
+ MAKE_CASE(AArch64ISD::WHILERW)
MAKE_CASE(AArch64ISD::FADD_PRED)
MAKE_CASE(AArch64ISD::FADDA_PRED)
MAKE_CASE(AArch64ISD::FADDV_PRED)
@@ -6033,6 +6054,18 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
EVT PtrVT = getPointerTy(DAG.getDataLayout());
return DAG.getNode(AArch64ISD::THREAD_POINTER, dl, PtrVT);
}
+ case Intrinsic::aarch64_sve_whilewr_b:
+ case Intrinsic::aarch64_sve_whilewr_h:
+ case Intrinsic::aarch64_sve_whilewr_s:
+ case Intrinsic::aarch64_sve_whilewr_d:
+ return DAG.getNode(AArch64ISD::WHILEWR, dl, Op.getValueType(),
+ Op.getOperand(1), Op.getOperand(2));
+ case Intrinsic::aarch64_sve_whilerw_b:
+ case Intrinsic::aarch64_sve_whilerw_h:
+ case Intrinsic::aarch64_sve_whilerw_s:
+ case Intrinsic::aarch64_sve_whilerw_d:
+ return DAG.getNode(AArch64ISD::WHILERW, dl, Op.getValueType(),
+ Op.getOperand(1), Op.getOperand(2));
case Intrinsic::aarch64_neon_abs: {
EVT Ty = Op.getValueType();
if (Ty == MVT::i64) {
@@ -6492,18 +6525,45 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
return DAG.getNode(AArch64ISD::USDOT, dl, Op.getValueType(),
Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
}
+ case Intrinsic::experimental_get_alias_lane_mask:
case Intrinsic::get_active_lane_mask: {
- SDValue ID =
- DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, dl, MVT::i64);
+ unsigned IntrinsicID = Intrinsic::aarch64_sve_whilelo;
+ if (IntNo == Intrinsic::experimental_get_alias_lane_mask) {
+ uint64_t EltSize = Op.getOperand(3)->getAsZExtVal();
+ bool IsWriteAfterRead = Op.getOperand(4)->getAsZExtVal() == 1;
+ switch (EltSize) {
+ case 1:
+ IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_b
+ : Intrinsic::aarch64_sve_whilerw_b;
+ break;
+ case 2:
+ IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_h
+ : Intrinsic::aarch64_sve_whilerw_h;
+ break;
+ case 4:
+ IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_s
+ : Intrinsic::aarch64_sve_whilerw_s;
+ break;
+ case 8:
+ IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_d
+ : Intrinsic::aarch64_sve_whilerw_d;
+ break;
+ default:
+ llvm_unreachable("Unexpected element size for get.alias.lane.mask");
+ break;
+ }
+ }
+ SDValue ID = DAG.getTargetConstant(IntrinsicID, dl, MVT::i64);
EVT VT = Op.getValueType();
if (VT.isScalableVector())
return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, VT, ID, Op.getOperand(1),
Op.getOperand(2));
- // We can use the SVE whilelo instruction to lower this intrinsic by
- // creating the appropriate sequence of scalable vector operations and
- // then extracting a fixed-width subvector from the scalable vector.
+ // We can use the SVE whilelo/whilewr/whilerw instruction to lower this
+ // intrinsic by creating the appropriate sequence of scalable vector
+ // operations and then extracting a fixed-width subvector from the scalable
+ // vector.
EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);
EVT WhileVT = ContainerVT.changeElementType(MVT::i1);
@@ -19672,7 +19732,10 @@ static bool isPredicateCCSettingOp(SDValue N) {
N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilels ||
N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilelt ||
// get_active_lane_mask is lowered to a whilelo instruction.
- N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask)))
+ N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask ||
+ // get_alias_lane_mask is lowered to a whilewr/rw instruction.
+ N.getConstantOperandVal(0) ==
+ Intrinsic::experimental_get_alias_lane_mask)))
return true;
return false;
@@ -27609,6 +27672,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
return;
}
case Intrinsic::experimental_vector_match:
+ case Intrinsic::experimental_get_alias_lane_mask:
case Intrinsic::get_active_lane_mask: {
if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
return;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index b26f28dc79f88..bcb3c21e17535 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -298,6 +298,10 @@ enum NodeType : unsigned {
SMAXV,
UMAXV,
+ // Alias lane masks
+ WHILEWR,
+ WHILERW,
+
SADDV_PRED,
UADDV_PRED,
SMAXV_PRED,
@@ -1003,6 +1007,9 @@ class AArch64TargetLowering : public TargetLowering {
bool shouldExpandGetActiveLaneMask(EVT VT, EVT OpVT) const override;
+ bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT,
+ unsigned EltSize) const override;
+
bool
shouldExpandPartialReductionIntrinsic(const IntrinsicInst *I) const override;
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 28aecd14e33fa..84fe1bf4acc05 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -140,6 +140,11 @@ def AArch64st1q_scatter : SDNode<"AArch64ISD::SST1Q_PRED", SDT_AArch64_SCATTER_V
// AArch64 SVE/SVE2 - the remaining node definitions
//
+// Alias masks
+def SDT_AArch64Mask : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<2, 1>, SDTCVecEltisVT<0,i1>]>;
+def AArch64whilewr : SDNode<"AArch64ISD::WHILEWR", SDT_AArch64Mask>;
+def AArch64whilerw : SDNode<"AArch64ISD::WHILERW", SDT_AArch64Mask>;
+
// SVE CNT/INC/RDVL
def sve_rdvl_imm : ComplexPattern<i64, 1, "SelectRDVLImm<-32, 31, 16>">;
def sve_cnth_imm : ComplexPattern<i64, 1, "SelectRDVLImm<1, 16, 8>">;
@@ -3928,9 +3933,9 @@ let Predicates = [HasSVE2_or_SME] in {
defm WHILEHI_PXX : sve_int_while8_rr<0b101, "whilehi", int_aarch64_sve_whilehi, int_aarch64_sve_whilelo>;
// SVE2 pointer conflict compare
- defm WHILEWR_PXX : sve2_int_while_rr<0b0, "whilewr", "int_aarch64_sve_whilewr">;
- defm WHILERW_PXX : sve2_int_while_rr<0b1, "whilerw", "int_aarch64_sve_whilerw">;
-} // End HasSVE2_or_SME
+ defm WHILEWR_PXX : sve2_int_while_rr<0b0, "whilewr", AArch64whilewr>;
+ defm WHILERW_PXX : sve2_int_while_rr<0b1, "whilerw", AArch64whilerw>;
+} // End HasSVE2orSME
let Predicates = [HasSVEAES, HasNonStreamingSVE2_or_SSVE_AES] in {
// SVE2 crypto destructive binary operations
diff --git a/llvm/lib/Target/AArch64/SVEInstrFormats.td b/llvm/lib/Target/AArch64/SVEInstrFormats.td
index e443c5ab150bd..fbd095a363024 100644
--- a/llvm/lib/Target/AArch64/SVEInstrFormats.td
+++ b/llvm/lib/Target/AArch64/SVEInstrFormats.td
@@ -5895,16 +5895,16 @@ class sve2_int_while_rr<bits<2> sz8_64, bits<1> rw, string asm,
let isWhile = 1;
}
-multiclass sve2_int_while_rr<bits<1> rw, string asm, string op> {
+multiclass sve2_int_while_rr<bits<1> rw, string asm, SDPatternOperator op> {
def _B : sve2_int_while_rr<0b00, rw, asm, PPR8>;
def _H : sve2_int_while_rr<0b01, rw, asm, PPR16>;
def _S : sve2_int_while_rr<0b10, rw, asm, PPR32>;
def _D : sve2_int_while_rr<0b11, rw, asm, PPR64>;
- def : SVE_2_Op_Pat<nxv16i1, !cast<SDPatternOperator>(op # _b), i64, i64, !cast<Instruction>(NAME # _B)>;
- def : SVE_2_Op_Pat<nxv8i1, !cast<SDPatternOperator>(op # _h), i64, i64, !cast<Instruction>(NAME # _H)>;
- def : SVE_2_Op_Pat<nxv4i1, !cast<SDPatternOperator>(op # _s), i64, i64, !cast<Instruction>(NAME # _S)>;
- def : SVE_2_Op_Pat<nxv2i1, !cast<SDPatternOperator>(op # _d), i64, i64, !cast<Instruction>(NAME # _D)>;
+ def : SVE_2_Op_Pat<nxv16i1, op, i64, i64, !cast<Instruction>(NAME # _B)>;
+ def : SVE_2_Op_Pat<nxv8i1, op, i64, i64, !cast<Instruction>(NAME # _H)>;
+ def : SVE_2_Op_Pat<nxv4i1, op, i64, i64, !cast<Instruction>(NAME # _S)>;
+ def : SVE_2_Op_Pat<nxv2i1, op, i64, i64, !cast<Instruction>(NAME # _D)>;
}
//===----------------------------------------------------------------------===//
diff --git a/llvm/test/CodeGen/AArch64/alias_mask.ll b/llvm/test/CodeGen/AArch64/alias_mask.ll
new file mode 100644
index 0000000000000..84a22822f1702
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/alias_mask.ll
@@ -0,0 +1,421 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 -mattr=+sve2 %s -o - | FileCheck %s --check-prefix=CHECK-SVE
+; RUN: llc -mtriple=aarch64 %s -o - | FileCheck %s --check-prefix=CHECK-NOSVE
+
+define <16 x i1> @whilewr_8(i64 %a, i64 %b) {
+; CHECK-SVE-LABEL: whilewr_8:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: whilewr p0.b, x0, x1
+; CHECK-SVE-NEXT: mov z0.b, p0/z, #-1 // =0xffffffffffffffff
+; CHECK-SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
+; CHECK-SVE-NEXT: ret
+;
+; CHECK-NOSVE-LABEL: whilewr_8:
+; CHECK-NOSVE: // %bb.0: // %entry
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI0_0
+; CHECK-NOSVE-NEXT: adrp x10, .LCPI0_1
+; CHECK-NOSVE-NEXT: sub x9, x1, x0
+; CHECK-NOSVE-NEXT: ldr q0, [x8, :lo12:.LCPI0_0]
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI0_2
+; CHECK-NOSVE-NEXT: ldr q1, [x10, :lo12:.LCPI0_1]
+; CHECK-NOSVE-NEXT: ldr q3, [x8, :lo12:.LCPI0_2]
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI0_4
+; CHECK-NOSVE-NEXT: adrp x10, .LCPI0_3
+; CHECK-NOSVE-NEXT: ldr q5, [x8, :lo12:.LCPI0_4]
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI0_5
+; CHECK-NOSVE-NEXT: dup v2.2d, x9
+; CHECK-NOSVE-NEXT: ldr q4, [x10, :lo12:.LCPI0_3]
+; CHECK-NOSVE-NEXT: adrp x10, .LCPI0_6
+; CHECK-NOSVE-NEXT: ldr q6, [x8, :lo12:.LCPI0_5]
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI0_7
+; CHECK-NOSVE-NEXT: ldr q7, [x10, :lo12:.LCPI0_6]
+; CHECK-NOSVE-NEXT: cmp x9, #1
+; CHECK-NOSVE-NEXT: ldr q16, [x8, :lo12:.LCPI0_7]
+; CHECK-NOSVE-NEXT: cmhi v0.2d, v2.2d, v0.2d
+; CHECK-NOSVE-NEXT: cmhi v1.2d, v2.2d, v1.2d
+; CHECK-NOSVE-NEXT: cmhi v3.2d, v2.2d, v3.2d
+; CHECK-NOSVE-NEXT: cmhi v4.2d, v2.2d, v4.2d
+; CHECK-NOSVE-NEXT: cmhi v5.2d, v2.2d, v5.2d
+; CHECK-NOSVE-NEXT: cmhi v6.2d, v2.2d, v6.2d
+; CHECK-NOSVE-NEXT: cmhi v7.2d, v2.2d, v7.2d
+; CHECK-NOSVE-NEXT: cmhi v2.2d, v2.2d, v16.2d
+; CHECK-NOSVE-NEXT: uzp1 v0.4s, v1.4s, v0.4s
+; CHECK-NOSVE-NEXT: cset w8, lt
+; CHECK-NOSVE-NEXT: uzp1 v1.4s, v4.4s, v3.4s
+; CHECK-NOSVE-NEXT: uzp1 v3.4s, v6.4s, v5.4s
+; CHECK-NOSVE-NEXT: uzp1 v2.4s, v2.4s, v7.4s
+; CHECK-NOSVE-NEXT: uzp1 v0.8h, v1.8h, v0.8h
+; CHECK-NOSVE-NEXT: uzp1 v1.8h, v2.8h, v3.8h
+; CHECK-NOSVE-NEXT: uzp1 v0.16b, v1.16b, v0.16b
+; CHECK-NOSVE-NEXT: dup v1.16b, w8
+; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
+; CHECK-NOSVE-NEXT: ret
+entry:
+ %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 1, i1 1)
+ ret <16 x i1> %0
+}
+
+define <8 x i1> @whilewr_16(i64 %a, i64 %b) {
+; CHECK-SVE-LABEL: whilewr_16:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: whilewr p0.b, x0, x1
+; CHECK-SVE-NEXT: mov z0.b, p0/z, #-1 // =0xffffffffffffffff
+; CHECK-SVE-NEXT: // kill: def $d0 killed $d0 killed $z0
+; CHECK-SVE-NEXT: ret
+;
+; CHECK-NOSVE-LABEL: whilewr_16:
+; CHECK-NOSVE: // %bb.0: // %entry
+; CHECK-NOSVE-NEXT: sub x8, x1, x0
+; CHECK-NOSVE-NEXT: adrp x9, .LCPI1_0
+; CHECK-NOSVE-NEXT: adrp x10, .LCPI1_1
+; CHECK-NOSVE-NEXT: add x8, x8, x8, lsr #63
+; CHECK-NOSVE-NEXT: adrp x11, .LCPI1_2
+; CHECK-NOSVE-NEXT: ldr q1, [x9, :lo12:.LCPI1_0]
+; CHECK-NOSVE-NEXT: adrp x9, .LCPI1_3
+; CHECK-NOSVE-NEXT: ldr q2, [x10, :lo12:.LCPI1_1]
+; CHECK-NOSVE-NEXT: ldr q3, [x11, :lo12:.LCPI1_2]
+; CHECK-NOSVE-NEXT: asr x8, x8, #1
+; CHECK-NOSVE-NEXT: ldr q4, [x9, :lo12:.LCPI1_3]
+; CHECK-NOSVE-NEXT: dup v0.2d, x8
+; CHECK-NOSVE-NEXT: cmp x8, #1
+; CHECK-NOSVE-NEXT: cset w8, lt
+; CHECK-NOSVE-NEXT: cmhi v1.2d, v0.2d, v1.2d
+; CHECK-NOSVE-NEXT: cmhi v2.2d, v0.2d, v2.2d
+; CHECK-NOSVE-NEXT: cmhi v3.2d, v0.2d, v3.2d
+; CHECK-NOSVE-NEXT: cmhi v0.2d, v0.2d, v4.2d
+; CHECK-NOSVE-NEXT: uzp1 v1.4s, v2.4s, v1.4s
+; CHECK-NOSVE-NEXT: uzp1 v0.4s, v0.4s, v3.4s
+; CHECK-NOSVE-NEXT: uzp1 v0.8h, v0.8h, v1.8h
+; CHECK-NOSVE-NEXT: dup v1.8b, w8
+; CHECK-NOSVE-NEXT: xtn v0.8b, v0.8h
+; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
+; CHECK-NOSVE-NEXT: ret
+entry:
+ %0 = call <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 2, i1 1)
+ ret <8 x i1> %0
+}
+
+define <4 x i1> @whilewr_32(i64 %a, i64 %b) {
+; CHECK-SVE-LABEL: whilewr_32:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: whilewr p0.h, x0, x1
+; CHECK-SVE-NEXT: mov z0.h, p0/z, #-1 // =0xffffffffffffffff
+; CHECK-SVE-NEXT: // kill: def $d0 killed $d0 killed $z0
+; CHECK-SVE-NEXT: ret
+;
+; CHECK-NOSVE-LABEL: whilewr_32:
+; CHECK-NOSVE: // %bb.0: // %entry
+; CHECK-NOSVE-NEXT: sub x9, x1, x0
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI2_0
+; CHECK-NOSVE-NEXT: add x10, x9, #3
+; CHECK-NOSVE-NEXT: cmp x9, #0
+; CHECK-NOSVE-NEXT: ldr q1, [x8, :lo12:.LCPI2_0]
+; CHECK-NOSVE-NEXT: csel x9, x10, x9, lt
+; CHECK-NOSVE-NEXT: adrp x10, .LCPI2_1
+; CHECK-NOSVE-NEXT: asr x9, x9, #2
+; CHECK-NOSVE-NEXT: ldr q2, [x10, :lo12:.LCPI2_1]
+; CHECK-NOSVE-NEXT: dup v0.2d, x9
+; CHECK-NOSVE-NEXT: cmp x9, #1
+; CHECK-NOSVE-NEXT: cset w8, lt
+; CHECK-NOSVE-NEXT: cmhi v1.2d, v0.2d, v1.2d
+; CHECK-NOSVE-NEXT: cmhi v0.2d, v0.2d, v2.2d
+; CHECK-NOSVE-NEXT: uzp1 v0.4s, v0.4s, v1.4s
+; CHECK-NOSVE-NEXT: dup v1.4h, w8
+; CHECK-NOSVE-NEXT: xtn v0.4h, v0.4s
+; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
+; CHECK-NOSVE-NEXT: ret
+entry:
+ %0 = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 4, i1 1)
+ ret <4 x i1> %0
+}
+
+define <2 x i1> @whilewr_64(i64 %a, i64 %b) {
+; CHECK-SVE-LABEL: whilewr_64:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: whilewr p0.s, x0, x1
+; CHECK-SVE-NEXT: mov z0.s, p0/z, #-1 // =0xffffffffffffffff
+; CHECK-SVE-NEXT: // kill: def $d0 killed $d0 killed $z0
+; CHECK-SVE-NEXT: ret
+;
+; CHECK-NOSVE-LABEL: whilewr_64:
+; CHECK-NOSVE: // %bb.0: // %entry
+; CHECK-NOSVE-NEXT: sub x9, x1, x0
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI3_0
+; CHECK-NOSVE-NEXT: add x10, x9, #7
+; CHECK-NOSVE-NEXT: cmp x9, #0
+; CHECK-NOSVE-NEXT: ldr q1, [x8, :lo12:.LCPI3_0]
+; CHECK-NOSVE-NEXT: csel x9, x10, x9, lt
+; CHECK-NOSVE-NEXT: asr x9, x9, #3
+; CHECK-NOSVE-NEXT: dup v0.2d, x9
+; CHECK-NOSVE-NEXT: cmp x9, #1
+; CHECK-NOSVE-NEXT: cset w8, lt
+; CHECK-NOSVE-NEXT: cmhi v0.2d, v0.2d, v1.2d
+; CHECK-NOSVE-NEXT: dup v1.2s, w8
+; CHECK-NOSVE-NEXT: xtn v0.2s, v0.2d
+; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
+; CHECK-NOSVE-NEXT: ret
+entry:
+ %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 8, i1 1)
+ ret <2 x i1> %0
+}
+
+define <16 x i1> @whilerw_8(i64 %a, i64 %b) {
+; CHECK-SVE-LABEL: whilerw_8:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: whilerw p0.b, x0, x1
+; CHECK-SVE-NEXT: mov z0.b, p0/z, #-1 // =0xffffffffffffffff
+; CHECK-SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
+; CHECK-SVE-NEXT: ret
+;
+; CHECK-NOSVE-LABEL: whilerw_8:
+; CHECK-NOSVE: // %bb.0: // %entry
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI4_0
+; CHECK-NOSVE-NEXT: subs x9, x1, x0
+; CHECK-NOSVE-NEXT: adrp x10, .LCPI4_1
+; CHECK-NOSVE-NEXT: ldr q0, [x8, :lo12:.LCPI4_0]
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI4_2
+; CHECK-NOSVE-NEXT: cneg x9, x9, mi
+; CHECK-NOSVE-NEXT: ldr q2, [x8, :lo12:.LCPI4_2]
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI4_3
+; CHECK-NOSVE-NEXT: ldr q1, [x10, :lo12:.LCPI4_1]
+; CHECK-NOSVE-NEXT: ldr q4, [x8, :lo12:.LCPI4_3]
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI4_4
+; CHECK-NOSVE-NEXT: dup v3.2d, x9
+; CHECK-NOSVE-NEXT: ldr q5, [x8, :lo12:.LCPI4_4]
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI4_5
+; CHECK-NOSVE-NEXT: adrp x10, .LCPI4_6
+; CHECK-NOSVE-NEXT: ldr q6, [x8, :lo12:.LCPI4_5]
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI4_7
+; CHECK-NOSVE-NEXT: ldr q7, [x10, :lo12:.LCPI4_6]
+; CHECK-NOSVE-NEXT: ldr q16, [x8, :lo12:.LCPI4_7]
+; CHECK-NOSVE-NEXT: cmhi v0.2d, v3.2d, v0.2d
+; CHECK-NOSVE-NEXT: cmhi v1.2d, v3.2d, v1.2d
+; CHECK-NOSVE-NEXT: cmhi v2.2d, v3.2d, v2.2d
+; CHECK-NOSVE-NEXT: cmhi v4.2d, v3.2d, v4.2d
+; CHECK-NOSVE-NEXT: cmhi v5.2d, v3.2d, v5.2d
+; CHECK-NOSVE-NEXT: cmhi v6.2d, v3.2d, v6.2d
+; CHECK-NOSVE-NEXT: cmhi v7.2d, v3.2d, v7.2d
+; CHECK-NOSVE-NEXT: cmhi v3.2d, v3.2d, v16.2d
+; CHECK-NOSVE-NEXT: uzp1 v0.4s, v1.4s, v0.4s
+; CHECK-NOSVE-NEXT: cmp x9, #0
+; CHECK-NOSVE-NEXT: uzp1 v1.4s, v4.4s, v2.4s
+; CHECK-NOSVE-NEXT: cset w8, eq
+; CHECK-NOSVE-NEXT: uzp1 v2.4s, v6.4s, v5.4s
+; CHECK-NOSVE-NEXT: uzp1 v3.4s, v3.4s, v7.4s
+; CHECK-NOSVE-NEXT: uzp1 v0.8h, v1.8h, v0.8h
+; CHECK-NOSVE-NEXT: uzp1 v1.8h, v3.8h, v2.8h
+; CHECK-NOSVE-NEXT: uzp1 v0.16b, v1.16b, v0.16b
+; CHECK-NOSVE-NEXT: dup v1.16b, w8
+; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
+; CHECK-NOSVE-NEXT: ret
+entry:
+ %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 1, i1 0)
+ ret <16 x i1> %0
+}
+
+define <8 x i1> @whilerw_16(i64 %a, i64 %b) {
+; CHECK-SVE-LABEL: whilerw_16:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: whilerw p0.b, x0, x1
+; CHECK-SVE-NEXT: mov z0.b, p0/z, #-1 // =0xffffffffffffffff
+; CHECK-SVE-NEXT: // kill: def $d0 killed $d0 killed $z0
+; CHECK-SVE-NEXT: ret
+;
+; CHECK-NOSVE-LABEL: whilerw_16:
+; CHECK-NOSVE: // %bb.0: // %entry
+; CHECK-NOSVE-NEXT: subs x8, x1, x0
+; CHECK-NOSVE-NEXT: adrp x9, .LCPI5_0
+; CHECK-NOSVE-NEXT: adrp x10, .LCPI5_1
+; CHECK-NOSVE-NEXT: cneg x8, x8, mi
+; CHECK-NOSVE-NEXT: adrp x11, .LCPI5_2
+; CHECK-NOSVE-NEXT: ldr q1, [x9, :lo12:.LCPI5_0]
+; CHECK-NOSVE-NEXT: add x8, x8, x8, lsr #63
+; CHECK-NOSVE-NEXT: adrp x9, .LCPI5_3
+; CHECK-NOSVE-NEXT: ldr q2, [x10, :lo12:.LCPI5_1]
+; CHECK-NOSVE-NEXT: ldr q3, [x11, :lo12:.LCPI5_2]
+; CHECK-NOSVE-NEXT: ldr q4, [x9, :lo12:.LCPI5_3]
+; CHECK-NOSVE-NEXT: asr x8, x8, #1
+; CHECK-NOSVE-NEXT: dup v0.2d, x8
+; CHECK-NOSVE-NEXT: cmp x8, #0
+; CHECK-NOSVE-NEXT: cset w8, eq
+; CHECK-NOSVE-NEXT: cmhi v1.2d, v0.2d, v1.2d
+; CHECK-NOSVE-NEXT: cmhi v2.2d, v0.2d, v2.2d
+; CHECK-NOSVE-NEXT: cmhi v3.2d, v0.2d, v3.2d
+; CHECK-NOSVE-NEXT: cmhi v0.2d, v0.2d, v4.2d
+; CHECK-NOSVE-NEXT: uzp1 v1.4s, v2.4s, v1.4s
+; CHECK-NOSVE-NEXT: uzp1 v0.4s, v0.4s, v3.4s
+; CHECK-NOSVE-NEXT: uzp1 v0.8h, v0.8h, v1.8h
+; CHECK-NOSVE-NEXT: dup v1.8b, w8
+; CHECK-NOSVE-NEXT: xtn v0.8b, v0.8h
+; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
+; CHECK-NOSVE-NEXT: ret
+entry:
+ %0 = call <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 2, i1 0)
+ ret <8 x i1> %0
+}
+
+define <4 x i1> @whilerw_32(i64 %a, i64 %b) {
+; CHECK-SVE-LABEL: whilerw_32:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: whilerw p0.h, x0, x1
+; CHECK-SVE-NEXT: mov z0.h, p0/z, #-1 // =0xffffffffffffffff
+; CHECK-SVE-NEXT: // kill: def $d0 killed $d0 killed $z0
+; CHECK-SVE-NEXT: ret
+;
+; CHECK-NOSVE-LABEL: whilerw_32:
+; CHECK-NOSVE: // %bb.0: // %entry
+; CHECK-NOSVE-NEXT: subs x9, x1, x0
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI6_0
+; CHECK-NOSVE-NEXT: cneg x9, x9, mi
+; CHECK-NOSVE-NEXT: ldr q1, [x8, :lo12:.LCPI6_0]
+; CHECK-NOSVE-NEXT: add x10, x9, #3
+; CHECK-NOSVE-NEXT: cmp x9, #0
+; CHECK-NOSVE-NEXT: csel x9, x10, x9, lt
+; CHECK-NOSVE-NEXT: adrp x10, .LCPI6_1
+; CHECK-NOSVE-NEXT: asr x9, x9, #2
+; CHECK-NOSVE-NEXT: ldr q2, [x10, :lo12:.LCPI6_1]
+; CHECK-NOSVE-NEXT: dup v0.2d, x9
+; CHECK-NOSVE-NEXT: cmp x9, #0
+; CHECK-NOSVE-NEXT: cset w8, eq
+; CHECK-NOSVE-NEXT: cmhi v1.2d, v0.2d, v1.2d
+; CHECK-NOSVE-NEXT: cmhi v0.2d, v0.2d, v2.2d
+; CHECK-NOSVE-NEXT: uzp1 v0.4s, v0.4s, v1.4s
+; CHECK-NOSVE-NEXT: dup v1.4h, w8
+; CHECK-NOSVE-NEXT: xtn v0.4h, v0.4s
+; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
+; CHECK-NOSVE-NEXT: ret
+entry:
+ %0 = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 4, i1 0)
+ ret <4 x i1> %0
+}
+
+define <2 x i1> @whilerw_64(i64 %a, i64 %b) {
+; CHECK-SVE-LABEL: whilerw_64:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: whilerw p0.s, x0, x1
+; CHECK-SVE-NEXT: mov z0.s, p0/z, #-1 // =0xffffffffffffffff
+; CHECK-SVE-NEXT: // kill: def $d0 killed $d0 killed $z0
+; CHECK-SVE-NEXT: ret
+;
+; CHECK-NOSVE-LABEL: whilerw_64:
+; CHECK-NOSVE: // %bb.0: // %entry
+; CHECK-NOSVE-NEXT: subs x9, x1, x0
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI7_0
+; CHECK-NOSVE-NEXT: cneg x9, x9, mi
+; CHECK-NOSVE-NEXT: ldr q1, [x8, :lo12:.LCPI7_0]
+; CHECK-NOSVE-NEXT: add x10, x9, #7
+; CHECK-NOSVE-NEXT: cmp x9, #0
+; CHECK-NOSVE-NEXT: csel x9, x10, x9, lt
+; CHECK-NOSVE-NEXT: asr x9, x9, #3
+; CHECK-NOSVE-NEXT: dup v0.2d, x9
+; CHECK-NOSVE-NEXT: cmp x9, #0
+; CHECK-NOSVE-NEXT: cset w8, eq
+; CHECK-NOSVE-NEXT: cmhi v0.2d, v0.2d, v1.2d
+; CHECK-NOSVE-NEXT: dup v1.2s, w8
+; CHECK-NOSVE-NEXT: xtn v0.2s, v0.2d
+; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
+; CHECK-NOSVE-NEXT: ret
+entry:
+ %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 8, i1 0)
+ ret <2 x i1> %0
+}
+
+define <16 x i1> @not_whilewr_wrong_eltsize(i64 %a, i64 %b) {
+; CHECK-SVE-LABEL: not_whilewr_wrong_eltsize:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: sub x8, x1, x0
+; CHECK-SVE-NEXT: add x8, x8, x8, lsr #63
+; CHECK-SVE-NEXT: asr x8, x8, #1
+; CHECK-SVE-NEXT: cmp x8, #1
+; CHECK-SVE-NEXT: cset w9, lt
+; CHECK-SVE-NEXT: whilelo p0.b, #0, x8
+; CHECK-SVE-NEXT: dup v0.16b, w9
+; CHECK-SVE-NEXT: mov z1.b, p0/z, #-1 // =0xffffffffffffffff
+; CHECK-SVE-NEXT: orr v0.16b, v1.16b, v0.16b
+; CHECK-SVE-NEXT: ret
+;
+; CHECK-NOSVE-LABEL: not_whilewr_wrong_eltsize:
+; CHECK-NOSVE: // %bb.0: // %entry
+; CHECK-NOSVE-NEXT: sub x8, x1, x0
+; CHECK-NOSVE-NEXT: adrp x9, .LCPI8_0
+; CHECK-NOSVE-NEXT: adrp x10, .LCPI8_1
+; CHECK-NOSVE-NEXT: add x8, x8, x8, lsr #63
+; CHECK-NOSVE-NEXT: ldr q0, [x9, :lo12:.LCPI8_0]
+; CHECK-NOSVE-NEXT: adrp x9, .LCPI8_2
+; CHECK-NOSVE-NEXT: ldr q2, [x9, :lo12:.LCPI8_2]
+; CHECK-NOSVE-NEXT: adrp x9, .LCPI8_4
+; CHECK-NOSVE-NEXT: ldr q1, [x10, :lo12:.LCPI8_1]
+; CHECK-NOSVE-NEXT: asr x8, x8, #1
+; CHECK-NOSVE-NEXT: adrp x10, .LCPI8_3
+; CHECK-NOSVE-NEXT: ldr q5, [x9, :lo12:.LCPI8_4]
+; CHECK-NOSVE-NEXT: adrp x9, .LCPI8_6
+; CHECK-NOSVE-NEXT: ldr q3, [x10, :lo12:.LCPI8_3]
+; CHECK-NOSVE-NEXT: adrp x10, .LCPI8_5
+; CHECK-NOSVE-NEXT: dup v4.2d, x8
+; CHECK-NOSVE-NEXT: ldr q7, [x9, :lo12:.LCPI8_6]
+; CHECK-NOSVE-NEXT: adrp x9, .LCPI8_7
+; CHECK-NOSVE-NEXT: ldr q6, [x10, :lo12:.LCPI8_5]
+; CHECK-NOSVE-NEXT: ldr q16, [x9, :lo12:.LCPI8_7]
+; CHECK-NOSVE-NEXT: cmp x8, #1
+; CHECK-NOSVE-NEXT: cset w8, lt
+; CHECK-NOSVE-NEXT: cmhi v0.2d, v4.2d, v0.2d
+; CHECK-NOSVE-NEXT: cmhi v1.2d, v4.2d, v1.2d
+; CHECK-NOSVE-NEXT: cmhi v2.2d, v4.2d, v2.2d
+; CHECK-NOSVE-NEXT: cmhi v3.2d, v4.2d, v3.2d
+; CHECK-NOSVE-NEXT: cmhi v5.2d, v4.2d, v5.2d
+; CHECK-NOSVE-NEXT: cmhi v6.2d, v4.2d, v6.2d
+; CHECK-NOSVE-NEXT: cmhi v7.2d, v4.2d, v7.2d
+; CHECK-NOSVE-NEXT: cmhi v4.2d, v4.2d, v16.2d
+; CHECK-NOSVE-NEXT: uzp1 v0.4s, v1.4s, v0.4s
+; CHECK-NOSVE-NEXT: uzp1 v1.4s, v3.4s, v2.4s
+; CHECK-NOSVE-NEXT: uzp1 v2.4s, v6.4s, v5.4s
+; CHECK-NOSVE-NEXT: uzp1 v3.4s, v4.4s, v7.4s
+; CHECK-NOSVE-NEXT: uzp1 v0.8h, v1.8h, v0.8h
+; CHECK-NOSVE-NEXT: uzp1 v1.8h, v3.8h, v2.8h
+; CHECK-NOSVE-NEXT: uzp1 v0.16b, v1.16b, v0.16b
+; CHECK-NOSVE-NEXT: dup v1.16b, w8
+; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
+; CHECK-NOSVE-NEXT: ret
+entry:
+ %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 2, i1 1)
+ ret <16 x i1> %0
+}
+
+define <2 x i1> @not_whilerw_ptr32(i32 %a, i32 %b) {
+; CHECK-SVE-LABEL: not_whilerw_ptr32:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: subs w8, w1, w0
+; CHECK-SVE-NEXT: cneg w8, w8, mi
+; CHECK-SVE-NEXT: add w9, w8, #7
+; CHECK-SVE-NEXT: cmp w8, #0
+; CHECK-SVE-NEXT: csel w8, w9, w8, lt
+; CHECK-SVE-NEXT: asr w8, w8, #3
+; CHECK-SVE-NEXT: cmp w8, #0
+; CHECK-SVE-NEXT: cset w9, eq
+; CHECK-SVE-NEXT: whilelo p0.s, #0, w8
+; CHECK-SVE-NEXT: dup v0.2s, w9
+; CHECK-SVE-NEXT: mov z1.s, p0/z, #-1 // =0xffffffffffffffff
+; CHECK-SVE-NEXT: orr v0.8b, v1.8b, v0.8b
+; CHECK-SVE-NEXT: ret
+;
+; CHECK-NOSVE-LABEL: not_whilerw_ptr32:
+; CHECK-NOSVE: // %bb.0: // %entry
+; CHECK-NOSVE-NEXT: subs w9, w1, w0
+; CHECK-NOSVE-NEXT: adrp x8, .LCPI9_0
+; CHECK-NOSVE-NEXT: cneg w9, w9, mi
+; CHECK-NOSVE-NEXT: ldr d1, [x8, :lo12:.LCPI9_0]
+; CHECK-NOSVE-NEXT: add w10, w9, #7
+; CHECK-NOSVE-NEXT: cmp w9, #0
+; CHECK-NOSVE-NEXT: csel w9, w10, w9, lt
+; CHECK-NOSVE-NEXT: asr w9, w9, #3
+; CHECK-NOSVE-NEXT: dup v0.2s, w9
+; CHECK-NOSVE-NEXT: cmp w9, #0
+; CHECK-NOSVE-NEXT: cset w8, eq
+; CHECK-NOSVE-NEXT: dup v2.2s, w8
+; CHECK-NOSVE-NEXT: cmhi v0.2s, v0.2s, v1.2s
+; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v2.8b
+; CHECK-NOSVE-NEXT: ret
+entry:
+ %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i32.i32(i32 %a, i32 %b, i32 8, i1 0)
+ ret <2 x i1> %0
+}
diff --git a/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll b/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
new file mode 100644
index 0000000000000..be5ec8b2a82bf
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
@@ -0,0 +1,195 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 -mattr=+sve2 %s -o - | FileCheck %s --check-prefix=CHECK-SVE2
+; RUN: llc -mtriple=aarch64 -mattr=+sve %s -o - | FileCheck %s --check-prefix=CHECK-SVE
+
+define <vscale x 16 x i1> @whilewr_8(i64 %a, i64 %b) {
+; CHECK-SVE2-LABEL: whilewr_8:
+; CHECK-SVE2: // %bb.0: // %entry
+; CHECK-SVE2-NEXT: whilewr p0.b, x0, x1
+; CHECK-SVE2-NEXT: ret
+;
+; CHECK-SVE-LABEL: whilewr_8:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: sub x8, x1, x0
+; CHECK-SVE-NEXT: cmp x8, #1
+; CHECK-SVE-NEXT: cset w9, lt
+; CHECK-SVE-NEXT: whilelo p0.b, #0, x8
+; CHECK-SVE-NEXT: sbfx x8, x9, #0, #1
+; CHECK-SVE-NEXT: whilelo p1.b, xzr, x8
+; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
+; CHECK-SVE-NEXT: ret
+entry:
+ %0 = call <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 1, i1 1)
+ ret <vscale x 16 x i1> %0
+}
+
+define <vscale x 8 x i1> @whilewr_16(i64 %a, i64 %b) {
+; CHECK-SVE2-LABEL: whilewr_16:
+; CHECK-SVE2: // %bb.0: // %entry
+; CHECK-SVE2-NEXT: whilewr p0.h, x0, x1
+; CHECK-SVE2-NEXT: ret
+;
+; CHECK-SVE-LABEL: whilewr_16:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: sub x8, x1, x0
+; CHECK-SVE-NEXT: add x8, x8, x8, lsr #63
+; CHECK-SVE-NEXT: asr x8, x8, #1
+; CHECK-SVE-NEXT: cmp x8, #1
+; CHECK-SVE-NEXT: cset w9, lt
+; CHECK-SVE-NEXT: whilelo p0.h, #0, x8
+; CHECK-SVE-NEXT: sbfx x8, x9, #0, #1
+; CHECK-SVE-NEXT: whilelo p1.h, xzr, x8
+; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
+; CHECK-SVE-NEXT: ret
+entry:
+ %0 = call <vscale x 8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 2, i1 1)
+ ret <vscale x 8 x i1> %0
+}
+
+define <vscale x 4 x i1> @whilewr_32(i64 %a, i64 %b) {
+; CHECK-SVE2-LABEL: whilewr_32:
+; CHECK-SVE2: // %bb.0: // %entry
+; CHECK-SVE2-NEXT: whilewr p0.s, x0, x1
+; CHECK-SVE2-NEXT: ret
+;
+; CHECK-SVE-LABEL: whilewr_32:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: sub x8, x1, x0
+; CHECK-SVE-NEXT: add x9, x8, #3
+; CHECK-SVE-NEXT: cmp x8, #0
+; CHECK-SVE-NEXT: csel x8, x9, x8, lt
+; CHECK-SVE-NEXT: asr x8, x8, #2
+; CHECK-SVE-NEXT: cmp x8, #1
+; CHECK-SVE-NEXT: cset w9, lt
+; CHECK-SVE-NEXT: whilelo p1.s, #0, x8
+; CHECK-SVE-NEXT: sbfx x9, x9, #0, #1
+; CHECK-SVE-NEXT: whilelo p0.s, xzr, x9
+; CHECK-SVE-NEXT: mov p0.b, p1/m, p1.b
+; CHECK-SVE-NEXT: ret
+entry:
+ %0 = call <vscale x 4 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 4, i1 1)
+ ret <vscale x 4 x i1> %0
+}
+
+define <vscale x 2 x i1> @whilewr_64(i64 %a, i64 %b) {
+; CHECK-SVE2-LABEL: whilewr_64:
+; CHECK-SVE2: // %bb.0: // %entry
+; CHECK-SVE2-NEXT: whilewr p0.d, x0, x1
+; CHECK-SVE2-NEXT: ret
+;
+; CHECK-SVE-LABEL: whilewr_64:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: sub x8, x1, x0
+; CHECK-SVE-NEXT: add x9, x8, #7
+; CHECK-SVE-NEXT: cmp x8, #0
+; CHECK-SVE-NEXT: csel x8, x9, x8, lt
+; CHECK-SVE-NEXT: asr x8, x8, #3
+; CHECK-SVE-NEXT: cmp x8, #1
+; CHECK-SVE-NEXT: cset w9, lt
+; CHECK-SVE-NEXT: whilelo p1.d, #0, x8
+; CHECK-SVE-NEXT: sbfx x9, x9, #0, #1
+; CHECK-SVE-NEXT: whilelo p0.d, xzr, x9
+; CHECK-SVE-NEXT: mov p0.b, p1/m, p1.b
+; CHECK-SVE-NEXT: ret
+entry:
+ %0 = call <vscale x 2 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 8, i1 1)
+ ret <vscale x 2 x i1> %0
+}
+
+define <vscale x 16 x i1> @whilerw_8(i64 %a, i64 %b) {
+; CHECK-SVE2-LABEL: whilerw_8:
+; CHECK-SVE2: // %bb.0: // %entry
+; CHECK-SVE2-NEXT: whilerw p0.b, x0, x1
+; CHECK-SVE2-NEXT: ret
+;
+; CHECK-SVE-LABEL: whilerw_8:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: subs x8, x1, x0
+; CHECK-SVE-NEXT: cneg x8, x8, mi
+; CHECK-SVE-NEXT: cmp x8, #0
+; CHECK-SVE-NEXT: cset w9, eq
+; CHECK-SVE-NEXT: whilelo p0.b, #0, x8
+; CHECK-SVE-NEXT: sbfx x8, x9, #0, #1
+; CHECK-SVE-NEXT: whilelo p1.b, xzr, x8
+; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
+; CHECK-SVE-NEXT: ret
+entry:
+ %0 = call <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 1, i1 0)
+ ret <vscale x 16 x i1> %0
+}
+
+define <vscale x 8 x i1> @whilerw_16(i64 %a, i64 %b) {
+; CHECK-SVE2-LABEL: whilerw_16:
+; CHECK-SVE2: // %bb.0: // %entry
+; CHECK-SVE2-NEXT: whilerw p0.h, x0, x1
+; CHECK-SVE2-NEXT: ret
+;
+; CHECK-SVE-LABEL: whilerw_16:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: subs x8, x1, x0
+; CHECK-SVE-NEXT: cneg x8, x8, mi
+; CHECK-SVE-NEXT: add x8, x8, x8, lsr #63
+; CHECK-SVE-NEXT: asr x8, x8, #1
+; CHECK-SVE-NEXT: cmp x8, #0
+; CHECK-SVE-NEXT: cset w9, eq
+; CHECK-SVE-NEXT: whilelo p0.h, #0, x8
+; CHECK-SVE-NEXT: sbfx x8, x9, #0, #1
+; CHECK-SVE-NEXT: whilelo p1.h, xzr, x8
+; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
+; CHECK-SVE-NEXT: ret
+entry:
+ %0 = call <vscale x 8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 2, i1 0)
+ ret <vscale x 8 x i1> %0
+}
+
+define <vscale x 4 x i1> @whilerw_32(i64 %a, i64 %b) {
+; CHECK-SVE2-LABEL: whilerw_32:
+; CHECK-SVE2: // %bb.0: // %entry
+; CHECK-SVE2-NEXT: whilerw p0.s, x0, x1
+; CHECK-SVE2-NEXT: ret
+;
+; CHECK-SVE-LABEL: whilerw_32:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: subs x8, x1, x0
+; CHECK-SVE-NEXT: cneg x8, x8, mi
+; CHECK-SVE-NEXT: add x9, x8, #3
+; CHECK-SVE-NEXT: cmp x8, #0
+; CHECK-SVE-NEXT: csel x8, x9, x8, lt
+; CHECK-SVE-NEXT: asr x8, x8, #2
+; CHECK-SVE-NEXT: cmp x8, #0
+; CHECK-SVE-NEXT: cset w9, eq
+; CHECK-SVE-NEXT: whilelo p1.s, #0, x8
+; CHECK-SVE-NEXT: sbfx x9, x9, #0, #1
+; CHECK-SVE-NEXT: whilelo p0.s, xzr, x9
+; CHECK-SVE-NEXT: mov p0.b, p1/m, p1.b
+; CHECK-SVE-NEXT: ret
+entry:
+ %0 = call <vscale x 4 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 4, i1 0)
+ ret <vscale x 4 x i1> %0
+}
+
+define <vscale x 2 x i1> @whilerw_64(i64 %a, i64 %b) {
+; CHECK-SVE2-LABEL: whilerw_64:
+; CHECK-SVE2: // %bb.0: // %entry
+; CHECK-SVE2-NEXT: whilerw p0.d, x0, x1
+; CHECK-SVE2-NEXT: ret
+;
+; CHECK-SVE-LABEL: whilerw_64:
+; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: subs x8, x1, x0
+; CHECK-SVE-NEXT: cneg x8, x8, mi
+; CHECK-SVE-NEXT: add x9, x8, #7
+; CHECK-SVE-NEXT: cmp x8, #0
+; CHECK-SVE-NEXT: csel x8, x9, x8, lt
+; CHECK-SVE-NEXT: asr x8, x8, #3
+; CHECK-SVE-NEXT: cmp x8, #0
+; CHECK-SVE-NEXT: cset w9, eq
+; CHECK-SVE-NEXT: whilelo p1.d, #0, x8
+; CHECK-SVE-NEXT: sbfx x9, x9, #0, #1
+; CHECK-SVE-NEXT: whilelo p0.d, xzr, x9
+; CHECK-SVE-NEXT: mov p0.b, p1/m, p1.b
+; CHECK-SVE-NEXT: ret
+entry:
+ %0 = call <vscale x 2 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 8, i1 0)
+ ret <vscale x 2 x i1> %0
+}
>From 44093d0a99f1bfd45226e09c0201ffd22c46ea56 Mon Sep 17 00:00:00 2001
From: Sam Tebbs <samuel.tebbs at arm.com>
Date: Fri, 10 Jan 2025 11:37:37 +0000
Subject: [PATCH 02/20] Rework lowering location
---
llvm/include/llvm/CodeGen/ISDOpcodes.h | 5 +
.../SelectionDAG/LegalizeIntegerTypes.cpp | 22 ++
llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 2 +
.../SelectionDAG/LegalizeVectorOps.cpp | 41 ++++
.../SelectionDAG/SelectionDAGBuilder.cpp | 53 +----
.../SelectionDAG/SelectionDAGDumper.cpp | 2 +
llvm/lib/CodeGen/TargetLoweringBase.cpp | 3 +
.../Target/AArch64/AArch64ISelLowering.cpp | 113 +++++++---
llvm/lib/Target/AArch64/AArch64ISelLowering.h | 1 +
llvm/test/CodeGen/AArch64/alias_mask.ll | 120 ++--------
.../CodeGen/AArch64/alias_mask_scalable.ll | 210 ++++++++++++++----
11 files changed, 350 insertions(+), 222 deletions(-)
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index 665c4d6baad80..6737e97b09384 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -1509,6 +1509,11 @@ enum NodeType {
// Operands: Mask
VECTOR_FIND_LAST_ACTIVE,
+ // The `llvm.experimental.get.alias.lane.mask.*` intrinsics
+ // Operands: Load pointer, Store pointer, Element size, Write after read
+ // Output: Mask
+ EXPERIMENTAL_ALIAS_LANE_MASK,
+
// llvm.clear_cache intrinsic
// Operands: Input Chain, Start Addres, End Address
// Outputs: Output Chain
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 204b323d7084a..2c6633c3d5a86 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -55,6 +55,9 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
N->dump(&DAG); dbgs() << "\n";
#endif
report_fatal_error("Do not know how to promote this operator!");
+ case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
+ Res = PromoteIntRes_EXPERIMENTAL_ALIAS_LANE_MASK(N);
+ break;
case ISD::MERGE_VALUES:Res = PromoteIntRes_MERGE_VALUES(N, ResNo); break;
case ISD::AssertSext: Res = PromoteIntRes_AssertSext(N); break;
case ISD::AssertZext: Res = PromoteIntRes_AssertZext(N); break;
@@ -364,6 +367,14 @@ SDValue DAGTypeLegalizer::PromoteIntRes_MERGE_VALUES(SDNode *N,
return GetPromotedInteger(Op);
}
+SDValue
+DAGTypeLegalizer::PromoteIntRes_EXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N) {
+ EVT VT = N->getValueType(0);
+ EVT NewVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
+ return DAG.getNode(ISD::EXPERIMENTAL_ALIAS_LANE_MASK, SDLoc(N), NewVT,
+ N->ops());
+}
+
SDValue DAGTypeLegalizer::PromoteIntRes_AssertSext(SDNode *N) {
// Sign-extend the new bits, and continue the assertion.
SDValue Op = SExtPromotedInteger(N->getOperand(0));
@@ -2108,6 +2119,9 @@ bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
case ISD::PARTIAL_REDUCE_SMLA:
Res = PromoteIntOp_PARTIAL_REDUCE_MLA(N);
break;
+ case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
+ Res = DAGTypeLegalizer::PromoteIntOp_EXPERIMENTAL_ALIAS_LANE_MASK(N, OpNo);
+ break;
}
// If the result is null, the sub-method took care of registering results etc.
@@ -2902,6 +2916,14 @@ SDValue DAGTypeLegalizer::PromoteIntOp_PARTIAL_REDUCE_MLA(SDNode *N) {
return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
}
+SDValue
+DAGTypeLegalizer::PromoteIntOp_EXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N,
+ unsigned OpNo) {
+ SmallVector<SDValue, 4> NewOps(N->ops());
+ NewOps[OpNo] = GetPromotedInteger(N->getOperand(OpNo));
+ return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
+}
+
//===----------------------------------------------------------------------===//
// Integer Result Expansion
//===----------------------------------------------------------------------===//
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 69c687a797485..ca54936c3fa0b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -380,6 +380,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntRes_PATCHPOINT(SDNode *N);
SDValue PromoteIntRes_VECTOR_FIND_LAST_ACTIVE(SDNode *N);
SDValue PromoteIntRes_PARTIAL_REDUCE_MLA(SDNode *N);
+ SDValue PromoteIntRes_EXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N);
// Integer Operand Promotion.
bool PromoteIntegerOperand(SDNode *N, unsigned OpNo);
@@ -432,6 +433,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntOp_VECTOR_HISTOGRAM(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_VECTOR_FIND_LAST_ACTIVE(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_PARTIAL_REDUCE_MLA(SDNode *N);
+ SDValue PromoteIntOp_EXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N, unsigned OpNo);
void SExtOrZExtPromotedOperands(SDValue &LHS, SDValue &RHS);
void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
index de4447fb0cf1a..28571579df024 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
@@ -138,6 +138,7 @@ class VectorLegalizer {
SDValue ExpandVP_FNEG(SDNode *Node);
SDValue ExpandVP_FABS(SDNode *Node);
SDValue ExpandVP_FCOPYSIGN(SDNode *Node);
+ SDValue ExpandEXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N);
SDValue ExpandSELECT(SDNode *Node);
std::pair<SDValue, SDValue> ExpandLoad(SDNode *N);
SDValue ExpandStore(SDNode *N);
@@ -471,6 +472,7 @@ SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
case ISD::UCMP:
case ISD::PARTIAL_REDUCE_UMLA:
case ISD::PARTIAL_REDUCE_SMLA:
+ case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
break;
case ISD::SMULFIX:
@@ -1252,6 +1254,9 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
case ISD::UCMP:
Results.push_back(TLI.expandCMP(Node, DAG));
return;
+ case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
+ Results.push_back(ExpandEXPERIMENTAL_ALIAS_LANE_MASK(Node));
+ return;
case ISD::FADD:
case ISD::FMUL:
@@ -1759,6 +1764,42 @@ SDValue VectorLegalizer::ExpandVP_FCOPYSIGN(SDNode *Node) {
return DAG.getNode(ISD::BITCAST, DL, VT, CopiedSign);
}
+SDValue VectorLegalizer::ExpandEXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N) {
+ SDLoc DL(N);
+ SDValue SourceValue = N->getOperand(0);
+ SDValue SinkValue = N->getOperand(1);
+ SDValue EltSize = N->getOperand(2);
+
+ bool IsWriteAfterRead =
+ cast<ConstantSDNode>(N->getOperand(3))->getZExtValue() != 0;
+ auto VT = N->getValueType(0);
+ auto PtrVT = SourceValue->getValueType(0);
+
+ SDValue Diff = DAG.getNode(ISD::SUB, DL, PtrVT, SinkValue, SourceValue);
+ if (!IsWriteAfterRead)
+ Diff = DAG.getNode(ISD::ABS, DL, PtrVT, Diff);
+
+ Diff = DAG.getNode(ISD::SDIV, DL, PtrVT, Diff, EltSize);
+ SDValue Zero = DAG.getTargetConstant(0, DL, PtrVT);
+
+ // If the difference is positive then some elements may alias
+ auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
+ Diff.getValueType());
+ SDValue Cmp = DAG.getSetCC(DL, CmpVT, Diff, Zero,
+ IsWriteAfterRead ? ISD::SETLE : ISD::SETEQ);
+
+ EVT SplatTY =
+ EVT::getVectorVT(*DAG.getContext(), PtrVT, VT.getVectorElementCount());
+ SDValue DiffSplat = DAG.getSplat(SplatTY, DL, Diff);
+ SDValue VectorStep = DAG.getStepVector(DL, SplatTY);
+ SDValue DiffMask =
+ DAG.getSetCC(DL, VT, VectorStep, DiffSplat, ISD::CondCode::SETULT);
+
+ // Splat the compare result then OR it with a lane mask
+ SDValue Splat = DAG.getSplat(VT, DL, Cmp);
+ return DAG.getNode(ISD::OR, DL, VT, DiffMask, Splat);
+}
+
void VectorLegalizer::ExpandFP_TO_UINT(SDNode *Node,
SmallVectorImpl<SDValue> &Results) {
// Attempt to expand using TargetLowering.
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index a98dfda2b6621..ea97cb2652217 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8292,54 +8292,13 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
return;
}
case Intrinsic::experimental_get_alias_lane_mask: {
- SDValue SourceValue = getValue(I.getOperand(0));
- SDValue SinkValue = getValue(I.getOperand(1));
- SDValue EltSize = getValue(I.getOperand(2));
- bool IsWriteAfterRead =
- cast<ConstantSDNode>(getValue(I.getOperand(3)))->getZExtValue() != 0;
auto IntrinsicVT = EVT::getEVT(I.getType());
- auto PtrVT = SourceValue->getValueType(0);
-
- if (!TLI.shouldExpandGetAliasLaneMask(
- IntrinsicVT, PtrVT,
- cast<ConstantSDNode>(EltSize)->getSExtValue())) {
- visitTargetIntrinsic(I, Intrinsic);
- return;
- }
-
- SDValue Diff = DAG.getNode(ISD::SUB, sdl, PtrVT, SinkValue, SourceValue);
- if (!IsWriteAfterRead)
- Diff = DAG.getNode(ISD::ABS, sdl, PtrVT, Diff);
-
- Diff = DAG.getNode(ISD::SDIV, sdl, PtrVT, Diff, EltSize);
- SDValue Zero = DAG.getTargetConstant(0, sdl, PtrVT);
-
- // If the difference is positive then some elements may alias
- auto CmpVT =
- TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), PtrVT);
- SDValue Cmp = DAG.getSetCC(sdl, CmpVT, Diff, Zero,
- IsWriteAfterRead ? ISD::SETLE : ISD::SETEQ);
-
- // Splat the compare result then OR it with a lane mask
- SDValue Splat = DAG.getSplat(IntrinsicVT, sdl, Cmp);
-
- SDValue DiffMask;
- // Don't emit an active lane mask if the target doesn't support it
- if (TLI.shouldExpandGetActiveLaneMask(IntrinsicVT, PtrVT)) {
- EVT VecTy = EVT::getVectorVT(*DAG.getContext(), PtrVT,
- IntrinsicVT.getVectorElementCount());
- SDValue DiffSplat = DAG.getSplat(VecTy, sdl, Diff);
- SDValue VectorStep = DAG.getStepVector(sdl, VecTy);
- DiffMask = DAG.getSetCC(sdl, IntrinsicVT, VectorStep, DiffSplat,
- ISD::CondCode::SETULT);
- } else {
- DiffMask = DAG.getNode(
- ISD::INTRINSIC_WO_CHAIN, sdl, IntrinsicVT,
- DAG.getTargetConstant(Intrinsic::get_active_lane_mask, sdl, MVT::i64),
- Zero, Diff);
- }
- SDValue Or = DAG.getNode(ISD::OR, sdl, IntrinsicVT, DiffMask, Splat);
- setValue(&I, Or);
+ SmallVector<SDValue, 4> Ops;
+ for (auto &Op : I.operands())
+ Ops.push_back(getValue(Op));
+ SDValue Mask =
+ DAG.getNode(ISD::EXPERIMENTAL_ALIAS_LANE_MASK, sdl, IntrinsicVT, Ops);
+ setValue(&I, Mask);
}
}
}
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index 8457bee3f665b..5426fea028e21 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -573,6 +573,8 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
return "partial_reduce_umla";
case ISD::PARTIAL_REDUCE_SMLA:
return "partial_reduce_smla";
+ case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
+ return "alias_mask";
// Vector Predication
#define BEGIN_REGISTER_VP_SDNODE(SDID, LEGALARG, NAME, ...) \
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index f5ea3c0b47d6a..407eddfd9c756 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -831,6 +831,9 @@ void TargetLoweringBase::initActions() {
// Masked vector extracts default to expand.
setOperationAction(ISD::VECTOR_FIND_LAST_ACTIVE, VT, Expand);
+ // Aliasing lanes mask default to expand
+ setOperationAction(ISD::EXPERIMENTAL_ALIAS_LANE_MASK, VT, Expand);
+
// FP environment operations default to expand.
setOperationAction(ISD::GET_FPENV, VT, Expand);
setOperationAction(ISD::SET_FPENV, VT, Expand);
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 232c2227a3b51..4bc13ee12cd82 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1822,6 +1822,13 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::INTRINSIC_WO_CHAIN, VT, Custom);
}
+ if (Subtarget->hasSVE2() || (Subtarget->hasSME() && Subtarget->isStreaming())) {
+ for (auto VT : {MVT::v2i32, MVT::v4i16, MVT::v8i8, MVT::v16i8, MVT::nxv2i1,
+ MVT::nxv4i1, MVT::nxv8i1, MVT::nxv16i1}) {
+ setOperationAction(ISD::EXPERIMENTAL_ALIAS_LANE_MASK, VT, Custom);
+ }
+ }
+
// Handle operations that are only available in non-streaming SVE mode.
if (Subtarget->isSVEAvailable()) {
for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64,
@@ -5301,6 +5308,59 @@ SDValue AArch64TargetLowering::LowerFSINCOS(SDValue Op,
static MVT getSVEContainerType(EVT ContentTy);
+SDValue AArch64TargetLowering::LowerALIAS_LANE_MASK(SDValue Op,
+ SelectionDAG &DAG) const {
+ SDLoc DL(Op);
+ unsigned IntrinsicID = 0;
+ uint64_t EltSize = Op.getOperand(2)->getAsZExtVal();
+ bool IsWriteAfterRead = Op.getOperand(3)->getAsZExtVal() == 1;
+ IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr
+ : Intrinsic::aarch64_sve_whilerw;
+ EVT VT = Op.getValueType();
+ MVT SimpleVT = VT.getSimpleVT();
+ // Make sure that the promoted mask size and element size match
+ switch (EltSize) {
+ case 1:
+ assert((SimpleVT == MVT::v16i8 || SimpleVT == MVT::nxv16i1) &&
+ "Unexpected mask or element size");
+ break;
+ case 2:
+ assert((SimpleVT == MVT::v8i8 || SimpleVT == MVT::nxv8i1) &&
+ "Unexpected mask or element size");
+ break;
+ case 4:
+ assert((SimpleVT == MVT::v4i16 || SimpleVT == MVT::nxv4i1) &&
+ "Unexpected mask or element size");
+ break;
+ case 8:
+ assert((SimpleVT == MVT::v2i32 || SimpleVT == MVT::nxv2i1) &&
+ "Unexpected mask or element size");
+ break;
+ default:
+ llvm_unreachable("Unexpected element size for get.alias.lane.mask");
+ break;
+ }
+ SDValue ID = DAG.getTargetConstant(IntrinsicID, DL, MVT::i64);
+
+ if (VT.isScalableVector())
+ return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, VT, ID, Op.getOperand(0),
+ Op.getOperand(1));
+
+ // We can use the SVE whilewr/whilerw instruction to lower this
+ // intrinsic by creating the appropriate sequence of scalable vector
+ // operations and then extracting a fixed-width subvector from the scalable
+ // vector.
+
+ EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);
+ EVT WhileVT = ContainerVT.changeElementType(MVT::i1);
+
+ SDValue Mask = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, WhileVT, ID,
+ Op.getOperand(0), Op.getOperand(1));
+ SDValue MaskAsInt = DAG.getNode(ISD::SIGN_EXTEND, DL, ContainerVT, Mask);
+ return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, MaskAsInt,
+ DAG.getVectorIdxConstant(0, DL));
+}
+
SDValue AArch64TargetLowering::LowerBITCAST(SDValue Op,
SelectionDAG &DAG) const {
EVT OpVT = Op.getValueType();
@@ -6528,31 +6588,6 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
case Intrinsic::experimental_get_alias_lane_mask:
case Intrinsic::get_active_lane_mask: {
unsigned IntrinsicID = Intrinsic::aarch64_sve_whilelo;
- if (IntNo == Intrinsic::experimental_get_alias_lane_mask) {
- uint64_t EltSize = Op.getOperand(3)->getAsZExtVal();
- bool IsWriteAfterRead = Op.getOperand(4)->getAsZExtVal() == 1;
- switch (EltSize) {
- case 1:
- IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_b
- : Intrinsic::aarch64_sve_whilerw_b;
- break;
- case 2:
- IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_h
- : Intrinsic::aarch64_sve_whilerw_h;
- break;
- case 4:
- IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_s
- : Intrinsic::aarch64_sve_whilerw_s;
- break;
- case 8:
- IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_d
- : Intrinsic::aarch64_sve_whilerw_d;
- break;
- default:
- llvm_unreachable("Unexpected element size for get.alias.lane.mask");
- break;
- }
- }
SDValue ID = DAG.getTargetConstant(IntrinsicID, dl, MVT::i64);
EVT VT = Op.getValueType();
@@ -6560,7 +6595,7 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, VT, ID, Op.getOperand(1),
Op.getOperand(2));
- // We can use the SVE whilelo/whilewr/whilerw instruction to lower this
+ // We can use the SVE whilelo instruction to lower this
// intrinsic by creating the appropriate sequence of scalable vector
// operations and then extracting a fixed-width subvector from the scalable
// vector.
@@ -7423,6 +7458,8 @@ SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
default:
llvm_unreachable("unimplemented operand");
return SDValue();
+ case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
+ return LowerALIAS_LANE_MASK(Op, DAG);
case ISD::BITCAST:
return LowerBITCAST(Op, DAG);
case ISD::GlobalAddress:
@@ -19721,7 +19758,8 @@ static SDValue getPTest(SelectionDAG &DAG, EVT VT, SDValue Pg, SDValue Op,
AArch64CC::CondCode Cond);
static bool isPredicateCCSettingOp(SDValue N) {
- if ((N.getOpcode() == ISD::SETCC) ||
+ if ((N.getOpcode() == ISD::SETCC ||
+ N.getOpcode() == ISD::EXPERIMENTAL_ALIAS_LANE_MASK) ||
(N.getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
(N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilege ||
N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilegt ||
@@ -19732,10 +19770,7 @@ static bool isPredicateCCSettingOp(SDValue N) {
N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilels ||
N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilelt ||
// get_active_lane_mask is lowered to a whilelo instruction.
- N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask ||
- // get_alias_lane_mask is lowered to a whilewr/rw instruction.
- N.getConstantOperandVal(0) ==
- Intrinsic::experimental_get_alias_lane_mask)))
+ N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask)))
return true;
return false;
@@ -27616,6 +27651,22 @@ void AArch64TargetLowering::ReplaceNodeResults(
// CONCAT_VECTORS -- but delegate to common code for result type
// legalisation
return;
+ case ISD::EXPERIMENTAL_ALIAS_LANE_MASK: {
+ EVT VT = N->getValueType(0);
+ if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
+ return;
+
+ // NOTE: Only trivial type promotion is supported.
+ EVT NewVT = getTypeToTransformTo(*DAG.getContext(), VT);
+ if (NewVT.getVectorNumElements() != VT.getVectorNumElements())
+ return;
+
+ SDLoc DL(N);
+ auto V =
+ DAG.getNode(ISD::EXPERIMENTAL_ALIAS_LANE_MASK, DL, NewVT, N->ops());
+ Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, VT, V));
+ return;
+ }
case ISD::INTRINSIC_WO_CHAIN: {
EVT VT = N->getValueType(0);
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index bcb3c21e17535..4023c7c9076b7 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -1224,6 +1224,7 @@ class AArch64TargetLowering : public TargetLowering {
SDValue LowerXOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFSINCOS(SDValue Op, SelectionDAG &DAG) const;
+ SDValue LowerALIAS_LANE_MASK(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVSCALE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/test/CodeGen/AArch64/alias_mask.ll b/llvm/test/CodeGen/AArch64/alias_mask.ll
index 84a22822f1702..9b344f03da077 100644
--- a/llvm/test/CodeGen/AArch64/alias_mask.ll
+++ b/llvm/test/CodeGen/AArch64/alias_mask.ll
@@ -48,10 +48,12 @@ define <16 x i1> @whilewr_8(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: uzp1 v1.8h, v2.8h, v3.8h
; CHECK-NOSVE-NEXT: uzp1 v0.16b, v1.16b, v0.16b
; CHECK-NOSVE-NEXT: dup v1.16b, w8
+; CHECK-NOSVE-NEXT: shl v0.16b, v0.16b, #7
+; CHECK-NOSVE-NEXT: cmlt v0.16b, v0.16b, #0
; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 1, i1 1)
+ %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64.i64(i64 %a, i64 %b, i64 1, i1 1)
ret <16 x i1> %0
}
@@ -88,6 +90,8 @@ define <8 x i1> @whilewr_16(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: uzp1 v0.8h, v0.8h, v1.8h
; CHECK-NOSVE-NEXT: dup v1.8b, w8
; CHECK-NOSVE-NEXT: xtn v0.8b, v0.8h
+; CHECK-NOSVE-NEXT: shl v0.8b, v0.8b, #7
+; CHECK-NOSVE-NEXT: cmlt v0.8b, v0.8b, #0
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
@@ -125,7 +129,7 @@ define <4 x i1> @whilewr_32(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 4, i1 1)
+ %0 = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64.i64(i64 %a, i64 %b, i64 4, i1 1)
ret <4 x i1> %0
}
@@ -155,7 +159,7 @@ define <2 x i1> @whilewr_64(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 8, i1 1)
+ %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1.i64.i64(i64 %a, i64 %b, i64 8, i1 1)
ret <2 x i1> %0
}
@@ -206,10 +210,12 @@ define <16 x i1> @whilerw_8(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: uzp1 v1.8h, v3.8h, v2.8h
; CHECK-NOSVE-NEXT: uzp1 v0.16b, v1.16b, v0.16b
; CHECK-NOSVE-NEXT: dup v1.16b, w8
+; CHECK-NOSVE-NEXT: shl v0.16b, v0.16b, #7
+; CHECK-NOSVE-NEXT: cmlt v0.16b, v0.16b, #0
; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 1, i1 0)
+ %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64.i64(i64 %a, i64 %b, i64 1, i1 0)
ret <16 x i1> %0
}
@@ -247,6 +253,8 @@ define <8 x i1> @whilerw_16(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: uzp1 v0.8h, v0.8h, v1.8h
; CHECK-NOSVE-NEXT: dup v1.8b, w8
; CHECK-NOSVE-NEXT: xtn v0.8b, v0.8h
+; CHECK-NOSVE-NEXT: shl v0.8b, v0.8b, #7
+; CHECK-NOSVE-NEXT: cmlt v0.8b, v0.8b, #0
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
@@ -285,7 +293,7 @@ define <4 x i1> @whilerw_32(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 4, i1 0)
+ %0 = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64.i64(i64 %a, i64 %b, i64 4, i1 0)
ret <4 x i1> %0
}
@@ -316,106 +324,6 @@ define <2 x i1> @whilerw_64(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 8, i1 0)
- ret <2 x i1> %0
-}
-
-define <16 x i1> @not_whilewr_wrong_eltsize(i64 %a, i64 %b) {
-; CHECK-SVE-LABEL: not_whilewr_wrong_eltsize:
-; CHECK-SVE: // %bb.0: // %entry
-; CHECK-SVE-NEXT: sub x8, x1, x0
-; CHECK-SVE-NEXT: add x8, x8, x8, lsr #63
-; CHECK-SVE-NEXT: asr x8, x8, #1
-; CHECK-SVE-NEXT: cmp x8, #1
-; CHECK-SVE-NEXT: cset w9, lt
-; CHECK-SVE-NEXT: whilelo p0.b, #0, x8
-; CHECK-SVE-NEXT: dup v0.16b, w9
-; CHECK-SVE-NEXT: mov z1.b, p0/z, #-1 // =0xffffffffffffffff
-; CHECK-SVE-NEXT: orr v0.16b, v1.16b, v0.16b
-; CHECK-SVE-NEXT: ret
-;
-; CHECK-NOSVE-LABEL: not_whilewr_wrong_eltsize:
-; CHECK-NOSVE: // %bb.0: // %entry
-; CHECK-NOSVE-NEXT: sub x8, x1, x0
-; CHECK-NOSVE-NEXT: adrp x9, .LCPI8_0
-; CHECK-NOSVE-NEXT: adrp x10, .LCPI8_1
-; CHECK-NOSVE-NEXT: add x8, x8, x8, lsr #63
-; CHECK-NOSVE-NEXT: ldr q0, [x9, :lo12:.LCPI8_0]
-; CHECK-NOSVE-NEXT: adrp x9, .LCPI8_2
-; CHECK-NOSVE-NEXT: ldr q2, [x9, :lo12:.LCPI8_2]
-; CHECK-NOSVE-NEXT: adrp x9, .LCPI8_4
-; CHECK-NOSVE-NEXT: ldr q1, [x10, :lo12:.LCPI8_1]
-; CHECK-NOSVE-NEXT: asr x8, x8, #1
-; CHECK-NOSVE-NEXT: adrp x10, .LCPI8_3
-; CHECK-NOSVE-NEXT: ldr q5, [x9, :lo12:.LCPI8_4]
-; CHECK-NOSVE-NEXT: adrp x9, .LCPI8_6
-; CHECK-NOSVE-NEXT: ldr q3, [x10, :lo12:.LCPI8_3]
-; CHECK-NOSVE-NEXT: adrp x10, .LCPI8_5
-; CHECK-NOSVE-NEXT: dup v4.2d, x8
-; CHECK-NOSVE-NEXT: ldr q7, [x9, :lo12:.LCPI8_6]
-; CHECK-NOSVE-NEXT: adrp x9, .LCPI8_7
-; CHECK-NOSVE-NEXT: ldr q6, [x10, :lo12:.LCPI8_5]
-; CHECK-NOSVE-NEXT: ldr q16, [x9, :lo12:.LCPI8_7]
-; CHECK-NOSVE-NEXT: cmp x8, #1
-; CHECK-NOSVE-NEXT: cset w8, lt
-; CHECK-NOSVE-NEXT: cmhi v0.2d, v4.2d, v0.2d
-; CHECK-NOSVE-NEXT: cmhi v1.2d, v4.2d, v1.2d
-; CHECK-NOSVE-NEXT: cmhi v2.2d, v4.2d, v2.2d
-; CHECK-NOSVE-NEXT: cmhi v3.2d, v4.2d, v3.2d
-; CHECK-NOSVE-NEXT: cmhi v5.2d, v4.2d, v5.2d
-; CHECK-NOSVE-NEXT: cmhi v6.2d, v4.2d, v6.2d
-; CHECK-NOSVE-NEXT: cmhi v7.2d, v4.2d, v7.2d
-; CHECK-NOSVE-NEXT: cmhi v4.2d, v4.2d, v16.2d
-; CHECK-NOSVE-NEXT: uzp1 v0.4s, v1.4s, v0.4s
-; CHECK-NOSVE-NEXT: uzp1 v1.4s, v3.4s, v2.4s
-; CHECK-NOSVE-NEXT: uzp1 v2.4s, v6.4s, v5.4s
-; CHECK-NOSVE-NEXT: uzp1 v3.4s, v4.4s, v7.4s
-; CHECK-NOSVE-NEXT: uzp1 v0.8h, v1.8h, v0.8h
-; CHECK-NOSVE-NEXT: uzp1 v1.8h, v3.8h, v2.8h
-; CHECK-NOSVE-NEXT: uzp1 v0.16b, v1.16b, v0.16b
-; CHECK-NOSVE-NEXT: dup v1.16b, w8
-; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
-; CHECK-NOSVE-NEXT: ret
-entry:
- %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 2, i1 1)
- ret <16 x i1> %0
-}
-
-define <2 x i1> @not_whilerw_ptr32(i32 %a, i32 %b) {
-; CHECK-SVE-LABEL: not_whilerw_ptr32:
-; CHECK-SVE: // %bb.0: // %entry
-; CHECK-SVE-NEXT: subs w8, w1, w0
-; CHECK-SVE-NEXT: cneg w8, w8, mi
-; CHECK-SVE-NEXT: add w9, w8, #7
-; CHECK-SVE-NEXT: cmp w8, #0
-; CHECK-SVE-NEXT: csel w8, w9, w8, lt
-; CHECK-SVE-NEXT: asr w8, w8, #3
-; CHECK-SVE-NEXT: cmp w8, #0
-; CHECK-SVE-NEXT: cset w9, eq
-; CHECK-SVE-NEXT: whilelo p0.s, #0, w8
-; CHECK-SVE-NEXT: dup v0.2s, w9
-; CHECK-SVE-NEXT: mov z1.s, p0/z, #-1 // =0xffffffffffffffff
-; CHECK-SVE-NEXT: orr v0.8b, v1.8b, v0.8b
-; CHECK-SVE-NEXT: ret
-;
-; CHECK-NOSVE-LABEL: not_whilerw_ptr32:
-; CHECK-NOSVE: // %bb.0: // %entry
-; CHECK-NOSVE-NEXT: subs w9, w1, w0
-; CHECK-NOSVE-NEXT: adrp x8, .LCPI9_0
-; CHECK-NOSVE-NEXT: cneg w9, w9, mi
-; CHECK-NOSVE-NEXT: ldr d1, [x8, :lo12:.LCPI9_0]
-; CHECK-NOSVE-NEXT: add w10, w9, #7
-; CHECK-NOSVE-NEXT: cmp w9, #0
-; CHECK-NOSVE-NEXT: csel w9, w10, w9, lt
-; CHECK-NOSVE-NEXT: asr w9, w9, #3
-; CHECK-NOSVE-NEXT: dup v0.2s, w9
-; CHECK-NOSVE-NEXT: cmp w9, #0
-; CHECK-NOSVE-NEXT: cset w8, eq
-; CHECK-NOSVE-NEXT: dup v2.2s, w8
-; CHECK-NOSVE-NEXT: cmhi v0.2s, v0.2s, v1.2s
-; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v2.8b
-; CHECK-NOSVE-NEXT: ret
-entry:
- %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i32.i32(i32 %a, i32 %b, i32 8, i1 0)
+ %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1.i64.i64(i64 %a, i64 %b, i64 8, i1 0)
ret <2 x i1> %0
}
diff --git a/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll b/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
index be5ec8b2a82bf..a7c9c5e3cdd33 100644
--- a/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
+++ b/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
@@ -10,16 +10,57 @@ define <vscale x 16 x i1> @whilewr_8(i64 %a, i64 %b) {
;
; CHECK-SVE-LABEL: whilewr_8:
; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-SVE-NEXT: addvl sp, sp, #-1
+; CHECK-SVE-NEXT: str p7, [sp, #4, mul vl] // 2-byte Folded Spill
+; CHECK-SVE-NEXT: str p6, [sp, #5, mul vl] // 2-byte Folded Spill
+; CHECK-SVE-NEXT: str p5, [sp, #6, mul vl] // 2-byte Folded Spill
+; CHECK-SVE-NEXT: str p4, [sp, #7, mul vl] // 2-byte Folded Spill
+; CHECK-SVE-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
+; CHECK-SVE-NEXT: .cfi_offset w29, -16
+; CHECK-SVE-NEXT: index z0.d, #0, #1
; CHECK-SVE-NEXT: sub x8, x1, x0
+; CHECK-SVE-NEXT: ptrue p0.d
+; CHECK-SVE-NEXT: mov z2.d, x8
+; CHECK-SVE-NEXT: mov z1.d, z0.d
+; CHECK-SVE-NEXT: mov z3.d, z0.d
+; CHECK-SVE-NEXT: cmphi p1.d, p0/z, z2.d, z0.d
+; CHECK-SVE-NEXT: incd z0.d, all, mul #4
+; CHECK-SVE-NEXT: incd z1.d
+; CHECK-SVE-NEXT: incd z3.d, all, mul #2
+; CHECK-SVE-NEXT: cmphi p5.d, p0/z, z2.d, z0.d
+; CHECK-SVE-NEXT: mov z4.d, z1.d
+; CHECK-SVE-NEXT: cmphi p2.d, p0/z, z2.d, z1.d
+; CHECK-SVE-NEXT: incd z1.d, all, mul #4
+; CHECK-SVE-NEXT: cmphi p3.d, p0/z, z2.d, z3.d
+; CHECK-SVE-NEXT: incd z3.d, all, mul #4
+; CHECK-SVE-NEXT: incd z4.d, all, mul #2
+; CHECK-SVE-NEXT: cmphi p6.d, p0/z, z2.d, z1.d
+; CHECK-SVE-NEXT: cmphi p7.d, p0/z, z2.d, z3.d
+; CHECK-SVE-NEXT: uzp1 p1.s, p1.s, p2.s
+; CHECK-SVE-NEXT: cmphi p4.d, p0/z, z2.d, z4.d
+; CHECK-SVE-NEXT: incd z4.d, all, mul #4
+; CHECK-SVE-NEXT: uzp1 p2.s, p5.s, p6.s
+; CHECK-SVE-NEXT: ldr p6, [sp, #5, mul vl] // 2-byte Folded Reload
+; CHECK-SVE-NEXT: ldr p5, [sp, #6, mul vl] // 2-byte Folded Reload
+; CHECK-SVE-NEXT: cmphi p0.d, p0/z, z2.d, z4.d
+; CHECK-SVE-NEXT: uzp1 p3.s, p3.s, p4.s
; CHECK-SVE-NEXT: cmp x8, #1
-; CHECK-SVE-NEXT: cset w9, lt
-; CHECK-SVE-NEXT: whilelo p0.b, #0, x8
-; CHECK-SVE-NEXT: sbfx x8, x9, #0, #1
+; CHECK-SVE-NEXT: ldr p4, [sp, #7, mul vl] // 2-byte Folded Reload
+; CHECK-SVE-NEXT: cset w8, lt
+; CHECK-SVE-NEXT: uzp1 p1.h, p1.h, p3.h
+; CHECK-SVE-NEXT: sbfx x8, x8, #0, #1
+; CHECK-SVE-NEXT: uzp1 p0.s, p7.s, p0.s
+; CHECK-SVE-NEXT: ldr p7, [sp, #4, mul vl] // 2-byte Folded Reload
+; CHECK-SVE-NEXT: uzp1 p0.h, p2.h, p0.h
+; CHECK-SVE-NEXT: uzp1 p0.b, p1.b, p0.b
; CHECK-SVE-NEXT: whilelo p1.b, xzr, x8
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
+; CHECK-SVE-NEXT: addvl sp, sp, #1
+; CHECK-SVE-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 1, i1 1)
+ %0 = call <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64.i64(i64 %a, i64 %b, i64 1, i1 1)
ret <vscale x 16 x i1> %0
}
@@ -31,13 +72,28 @@ define <vscale x 8 x i1> @whilewr_16(i64 %a, i64 %b) {
;
; CHECK-SVE-LABEL: whilewr_16:
; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: index z0.d, #0, #1
; CHECK-SVE-NEXT: sub x8, x1, x0
+; CHECK-SVE-NEXT: ptrue p0.d
; CHECK-SVE-NEXT: add x8, x8, x8, lsr #63
; CHECK-SVE-NEXT: asr x8, x8, #1
+; CHECK-SVE-NEXT: mov z1.d, z0.d
+; CHECK-SVE-NEXT: mov z2.d, z0.d
+; CHECK-SVE-NEXT: mov z3.d, x8
+; CHECK-SVE-NEXT: incd z1.d
+; CHECK-SVE-NEXT: incd z2.d, all, mul #2
+; CHECK-SVE-NEXT: cmphi p1.d, p0/z, z3.d, z0.d
+; CHECK-SVE-NEXT: mov z4.d, z1.d
+; CHECK-SVE-NEXT: cmphi p2.d, p0/z, z3.d, z1.d
+; CHECK-SVE-NEXT: cmphi p3.d, p0/z, z3.d, z2.d
+; CHECK-SVE-NEXT: incd z4.d, all, mul #2
+; CHECK-SVE-NEXT: uzp1 p1.s, p1.s, p2.s
+; CHECK-SVE-NEXT: cmphi p0.d, p0/z, z3.d, z4.d
; CHECK-SVE-NEXT: cmp x8, #1
-; CHECK-SVE-NEXT: cset w9, lt
-; CHECK-SVE-NEXT: whilelo p0.h, #0, x8
-; CHECK-SVE-NEXT: sbfx x8, x9, #0, #1
+; CHECK-SVE-NEXT: cset w8, lt
+; CHECK-SVE-NEXT: sbfx x8, x8, #0, #1
+; CHECK-SVE-NEXT: uzp1 p0.s, p3.s, p0.s
+; CHECK-SVE-NEXT: uzp1 p0.h, p1.h, p0.h
; CHECK-SVE-NEXT: whilelo p1.h, xzr, x8
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
@@ -54,20 +110,27 @@ define <vscale x 4 x i1> @whilewr_32(i64 %a, i64 %b) {
;
; CHECK-SVE-LABEL: whilewr_32:
; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: index z0.d, #0, #1
; CHECK-SVE-NEXT: sub x8, x1, x0
+; CHECK-SVE-NEXT: ptrue p0.d
; CHECK-SVE-NEXT: add x9, x8, #3
; CHECK-SVE-NEXT: cmp x8, #0
; CHECK-SVE-NEXT: csel x8, x9, x8, lt
; CHECK-SVE-NEXT: asr x8, x8, #2
+; CHECK-SVE-NEXT: mov z1.d, z0.d
+; CHECK-SVE-NEXT: mov z2.d, x8
+; CHECK-SVE-NEXT: incd z1.d
+; CHECK-SVE-NEXT: cmphi p1.d, p0/z, z2.d, z0.d
+; CHECK-SVE-NEXT: cmphi p0.d, p0/z, z2.d, z1.d
; CHECK-SVE-NEXT: cmp x8, #1
-; CHECK-SVE-NEXT: cset w9, lt
-; CHECK-SVE-NEXT: whilelo p1.s, #0, x8
-; CHECK-SVE-NEXT: sbfx x9, x9, #0, #1
-; CHECK-SVE-NEXT: whilelo p0.s, xzr, x9
-; CHECK-SVE-NEXT: mov p0.b, p1/m, p1.b
+; CHECK-SVE-NEXT: cset w8, lt
+; CHECK-SVE-NEXT: sbfx x8, x8, #0, #1
+; CHECK-SVE-NEXT: uzp1 p0.s, p1.s, p0.s
+; CHECK-SVE-NEXT: whilelo p1.s, xzr, x8
+; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 4 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 4, i1 1)
+ %0 = call <vscale x 4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64.i64(i64 %a, i64 %b, i64 4, i1 1)
ret <vscale x 4 x i1> %0
}
@@ -80,19 +143,22 @@ define <vscale x 2 x i1> @whilewr_64(i64 %a, i64 %b) {
; CHECK-SVE-LABEL: whilewr_64:
; CHECK-SVE: // %bb.0: // %entry
; CHECK-SVE-NEXT: sub x8, x1, x0
+; CHECK-SVE-NEXT: index z0.d, #0, #1
+; CHECK-SVE-NEXT: ptrue p0.d
; CHECK-SVE-NEXT: add x9, x8, #7
; CHECK-SVE-NEXT: cmp x8, #0
; CHECK-SVE-NEXT: csel x8, x9, x8, lt
; CHECK-SVE-NEXT: asr x8, x8, #3
+; CHECK-SVE-NEXT: mov z1.d, x8
+; CHECK-SVE-NEXT: cmphi p0.d, p0/z, z1.d, z0.d
; CHECK-SVE-NEXT: cmp x8, #1
-; CHECK-SVE-NEXT: cset w9, lt
-; CHECK-SVE-NEXT: whilelo p1.d, #0, x8
-; CHECK-SVE-NEXT: sbfx x9, x9, #0, #1
-; CHECK-SVE-NEXT: whilelo p0.d, xzr, x9
-; CHECK-SVE-NEXT: mov p0.b, p1/m, p1.b
+; CHECK-SVE-NEXT: cset w8, lt
+; CHECK-SVE-NEXT: sbfx x8, x8, #0, #1
+; CHECK-SVE-NEXT: whilelo p1.d, xzr, x8
+; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 2 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 8, i1 1)
+ %0 = call <vscale x 2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1.i64.i64(i64 %a, i64 %b, i64 8, i1 1)
ret <vscale x 2 x i1> %0
}
@@ -104,17 +170,60 @@ define <vscale x 16 x i1> @whilerw_8(i64 %a, i64 %b) {
;
; CHECK-SVE-LABEL: whilerw_8:
; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-SVE-NEXT: addvl sp, sp, #-1
+; CHECK-SVE-NEXT: str p7, [sp, #4, mul vl] // 2-byte Folded Spill
+; CHECK-SVE-NEXT: str p6, [sp, #5, mul vl] // 2-byte Folded Spill
+; CHECK-SVE-NEXT: str p5, [sp, #6, mul vl] // 2-byte Folded Spill
+; CHECK-SVE-NEXT: str p4, [sp, #7, mul vl] // 2-byte Folded Spill
+; CHECK-SVE-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
+; CHECK-SVE-NEXT: .cfi_offset w29, -16
+; CHECK-SVE-NEXT: index z0.d, #0, #1
; CHECK-SVE-NEXT: subs x8, x1, x0
+; CHECK-SVE-NEXT: ptrue p0.d
; CHECK-SVE-NEXT: cneg x8, x8, mi
+; CHECK-SVE-NEXT: mov z1.d, x8
+; CHECK-SVE-NEXT: mov z2.d, z0.d
+; CHECK-SVE-NEXT: mov z4.d, z0.d
+; CHECK-SVE-NEXT: mov z5.d, z0.d
+; CHECK-SVE-NEXT: cmphi p2.d, p0/z, z1.d, z0.d
+; CHECK-SVE-NEXT: incd z2.d
+; CHECK-SVE-NEXT: incd z4.d, all, mul #2
+; CHECK-SVE-NEXT: incd z5.d, all, mul #4
+; CHECK-SVE-NEXT: mov z3.d, z2.d
+; CHECK-SVE-NEXT: cmphi p1.d, p0/z, z1.d, z2.d
+; CHECK-SVE-NEXT: incd z2.d, all, mul #4
+; CHECK-SVE-NEXT: cmphi p3.d, p0/z, z1.d, z4.d
+; CHECK-SVE-NEXT: incd z4.d, all, mul #4
+; CHECK-SVE-NEXT: cmphi p4.d, p0/z, z1.d, z5.d
+; CHECK-SVE-NEXT: incd z3.d, all, mul #2
+; CHECK-SVE-NEXT: cmphi p5.d, p0/z, z1.d, z2.d
+; CHECK-SVE-NEXT: cmphi p7.d, p0/z, z1.d, z4.d
+; CHECK-SVE-NEXT: uzp1 p1.s, p2.s, p1.s
+; CHECK-SVE-NEXT: mov z0.d, z3.d
+; CHECK-SVE-NEXT: cmphi p6.d, p0/z, z1.d, z3.d
+; CHECK-SVE-NEXT: uzp1 p2.s, p4.s, p5.s
+; CHECK-SVE-NEXT: ldr p5, [sp, #6, mul vl] // 2-byte Folded Reload
+; CHECK-SVE-NEXT: ldr p4, [sp, #7, mul vl] // 2-byte Folded Reload
+; CHECK-SVE-NEXT: incd z0.d, all, mul #4
+; CHECK-SVE-NEXT: uzp1 p3.s, p3.s, p6.s
+; CHECK-SVE-NEXT: ldr p6, [sp, #5, mul vl] // 2-byte Folded Reload
+; CHECK-SVE-NEXT: cmphi p0.d, p0/z, z1.d, z0.d
+; CHECK-SVE-NEXT: uzp1 p1.h, p1.h, p3.h
; CHECK-SVE-NEXT: cmp x8, #0
-; CHECK-SVE-NEXT: cset w9, eq
-; CHECK-SVE-NEXT: whilelo p0.b, #0, x8
-; CHECK-SVE-NEXT: sbfx x8, x9, #0, #1
+; CHECK-SVE-NEXT: cset w8, eq
+; CHECK-SVE-NEXT: sbfx x8, x8, #0, #1
+; CHECK-SVE-NEXT: uzp1 p0.s, p7.s, p0.s
+; CHECK-SVE-NEXT: ldr p7, [sp, #4, mul vl] // 2-byte Folded Reload
+; CHECK-SVE-NEXT: uzp1 p0.h, p2.h, p0.h
+; CHECK-SVE-NEXT: uzp1 p0.b, p1.b, p0.b
; CHECK-SVE-NEXT: whilelo p1.b, xzr, x8
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
+; CHECK-SVE-NEXT: addvl sp, sp, #1
+; CHECK-SVE-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 1, i1 0)
+ %0 = call <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64.i64(i64 %a, i64 %b, i64 1, i1 0)
ret <vscale x 16 x i1> %0
}
@@ -126,14 +235,29 @@ define <vscale x 8 x i1> @whilerw_16(i64 %a, i64 %b) {
;
; CHECK-SVE-LABEL: whilerw_16:
; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: index z0.d, #0, #1
; CHECK-SVE-NEXT: subs x8, x1, x0
+; CHECK-SVE-NEXT: ptrue p0.d
; CHECK-SVE-NEXT: cneg x8, x8, mi
; CHECK-SVE-NEXT: add x8, x8, x8, lsr #63
+; CHECK-SVE-NEXT: mov z1.d, z0.d
+; CHECK-SVE-NEXT: mov z2.d, z0.d
; CHECK-SVE-NEXT: asr x8, x8, #1
+; CHECK-SVE-NEXT: mov z3.d, x8
+; CHECK-SVE-NEXT: incd z1.d
+; CHECK-SVE-NEXT: incd z2.d, all, mul #2
+; CHECK-SVE-NEXT: cmphi p1.d, p0/z, z3.d, z0.d
+; CHECK-SVE-NEXT: mov z4.d, z1.d
+; CHECK-SVE-NEXT: cmphi p2.d, p0/z, z3.d, z1.d
+; CHECK-SVE-NEXT: cmphi p3.d, p0/z, z3.d, z2.d
+; CHECK-SVE-NEXT: incd z4.d, all, mul #2
+; CHECK-SVE-NEXT: uzp1 p1.s, p1.s, p2.s
+; CHECK-SVE-NEXT: cmphi p0.d, p0/z, z3.d, z4.d
; CHECK-SVE-NEXT: cmp x8, #0
-; CHECK-SVE-NEXT: cset w9, eq
-; CHECK-SVE-NEXT: whilelo p0.h, #0, x8
-; CHECK-SVE-NEXT: sbfx x8, x9, #0, #1
+; CHECK-SVE-NEXT: cset w8, eq
+; CHECK-SVE-NEXT: sbfx x8, x8, #0, #1
+; CHECK-SVE-NEXT: uzp1 p0.s, p3.s, p0.s
+; CHECK-SVE-NEXT: uzp1 p0.h, p1.h, p0.h
; CHECK-SVE-NEXT: whilelo p1.h, xzr, x8
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
@@ -150,21 +274,28 @@ define <vscale x 4 x i1> @whilerw_32(i64 %a, i64 %b) {
;
; CHECK-SVE-LABEL: whilerw_32:
; CHECK-SVE: // %bb.0: // %entry
+; CHECK-SVE-NEXT: index z0.d, #0, #1
; CHECK-SVE-NEXT: subs x8, x1, x0
+; CHECK-SVE-NEXT: ptrue p0.d
; CHECK-SVE-NEXT: cneg x8, x8, mi
; CHECK-SVE-NEXT: add x9, x8, #3
; CHECK-SVE-NEXT: cmp x8, #0
; CHECK-SVE-NEXT: csel x8, x9, x8, lt
+; CHECK-SVE-NEXT: mov z1.d, z0.d
; CHECK-SVE-NEXT: asr x8, x8, #2
+; CHECK-SVE-NEXT: mov z2.d, x8
+; CHECK-SVE-NEXT: incd z1.d
+; CHECK-SVE-NEXT: cmphi p1.d, p0/z, z2.d, z0.d
+; CHECK-SVE-NEXT: cmphi p0.d, p0/z, z2.d, z1.d
; CHECK-SVE-NEXT: cmp x8, #0
-; CHECK-SVE-NEXT: cset w9, eq
-; CHECK-SVE-NEXT: whilelo p1.s, #0, x8
-; CHECK-SVE-NEXT: sbfx x9, x9, #0, #1
-; CHECK-SVE-NEXT: whilelo p0.s, xzr, x9
-; CHECK-SVE-NEXT: mov p0.b, p1/m, p1.b
+; CHECK-SVE-NEXT: cset w8, eq
+; CHECK-SVE-NEXT: sbfx x8, x8, #0, #1
+; CHECK-SVE-NEXT: uzp1 p0.s, p1.s, p0.s
+; CHECK-SVE-NEXT: whilelo p1.s, xzr, x8
+; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 4 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 4, i1 0)
+ %0 = call <vscale x 4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64.i64(i64 %a, i64 %b, i64 4, i1 0)
ret <vscale x 4 x i1> %0
}
@@ -177,19 +308,22 @@ define <vscale x 2 x i1> @whilerw_64(i64 %a, i64 %b) {
; CHECK-SVE-LABEL: whilerw_64:
; CHECK-SVE: // %bb.0: // %entry
; CHECK-SVE-NEXT: subs x8, x1, x0
+; CHECK-SVE-NEXT: index z0.d, #0, #1
+; CHECK-SVE-NEXT: ptrue p0.d
; CHECK-SVE-NEXT: cneg x8, x8, mi
; CHECK-SVE-NEXT: add x9, x8, #7
; CHECK-SVE-NEXT: cmp x8, #0
; CHECK-SVE-NEXT: csel x8, x9, x8, lt
; CHECK-SVE-NEXT: asr x8, x8, #3
+; CHECK-SVE-NEXT: mov z1.d, x8
+; CHECK-SVE-NEXT: cmphi p0.d, p0/z, z1.d, z0.d
; CHECK-SVE-NEXT: cmp x8, #0
-; CHECK-SVE-NEXT: cset w9, eq
-; CHECK-SVE-NEXT: whilelo p1.d, #0, x8
-; CHECK-SVE-NEXT: sbfx x9, x9, #0, #1
-; CHECK-SVE-NEXT: whilelo p0.d, xzr, x9
-; CHECK-SVE-NEXT: mov p0.b, p1/m, p1.b
+; CHECK-SVE-NEXT: cset w8, eq
+; CHECK-SVE-NEXT: sbfx x8, x8, #0, #1
+; CHECK-SVE-NEXT: whilelo p1.d, xzr, x8
+; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 2 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 8, i1 0)
+ %0 = call <vscale x 2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1.i64.i64(i64 %a, i64 %b, i64 8, i1 0)
ret <vscale x 2 x i1> %0
}
>From 2a0084d624514ae728fc000f6878dc7d7fbfeee4 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs <samuel.tebbs at arm.com>
Date: Wed, 15 Jan 2025 11:27:56 +0000
Subject: [PATCH 03/20] Format
---
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 4bc13ee12cd82..04d6abd7d16e9 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1822,7 +1822,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::INTRINSIC_WO_CHAIN, VT, Custom);
}
- if (Subtarget->hasSVE2() || (Subtarget->hasSME() && Subtarget->isStreaming())) {
+ if (Subtarget->hasSVE2() ||
+ (Subtarget->hasSME() && Subtarget->isStreaming())) {
for (auto VT : {MVT::v2i32, MVT::v4i16, MVT::v8i8, MVT::v16i8, MVT::nxv2i1,
MVT::nxv4i1, MVT::nxv8i1, MVT::nxv16i1}) {
setOperationAction(ISD::EXPERIMENTAL_ALIAS_LANE_MASK, VT, Custom);
>From f054252d9695a6dc3f995420bfec2d75b39b14d3 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs <samuel.tebbs at arm.com>
Date: Wed, 15 Jan 2025 16:16:31 +0000
Subject: [PATCH 04/20] Fix ISD node name string and remove shouldExpand
function
---
.../SelectionDAG/SelectionDAGDumper.cpp | 2 +-
.../Target/AArch64/AArch64ISelLowering.cpp | 19 -------------------
llvm/lib/Target/AArch64/AArch64ISelLowering.h | 3 ---
3 files changed, 1 insertion(+), 23 deletions(-)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index 5426fea028e21..1a325c388dc2e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -574,7 +574,7 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
case ISD::PARTIAL_REDUCE_SMLA:
return "partial_reduce_smla";
case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
- return "alias_mask";
+ return "alias_lane_mask";
// Vector Predication
#define BEGIN_REGISTER_VP_SDNODE(SDID, LEGALARG, NAME, ...) \
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 04d6abd7d16e9..e19adc60a3c08 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2061,25 +2061,6 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
return false;
}
-bool AArch64TargetLowering::shouldExpandGetAliasLaneMask(
- EVT VT, EVT PtrVT, unsigned EltSize) const {
- if (!Subtarget->hasSVE2())
- return true;
-
- if (PtrVT != MVT::i64)
- return true;
-
- if (VT == MVT::v2i1 || VT == MVT::nxv2i1)
- return EltSize != 8;
- if (VT == MVT::v4i1 || VT == MVT::nxv4i1)
- return EltSize != 4;
- if (VT == MVT::v8i1 || VT == MVT::nxv8i1)
- return EltSize != 2;
- if (VT == MVT::v16i1 || VT == MVT::nxv16i1)
- return EltSize != 1;
- return true;
-}
-
bool AArch64TargetLowering::shouldExpandPartialReductionIntrinsic(
const IntrinsicInst *I) const {
if (I->getIntrinsicID() != Intrinsic::experimental_vector_partial_reduce_add)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 4023c7c9076b7..814fb095cfc7f 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -1007,9 +1007,6 @@ class AArch64TargetLowering : public TargetLowering {
bool shouldExpandGetActiveLaneMask(EVT VT, EVT OpVT) const override;
- bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT,
- unsigned EltSize) const override;
-
bool
shouldExpandPartialReductionIntrinsic(const IntrinsicInst *I) const override;
>From e1589eb2c4f025ba0b2ecb6ba1fa3e28217fa6ac Mon Sep 17 00:00:00 2001
From: Samuel Tebbs <samuel.tebbs at arm.com>
Date: Thu, 16 Jan 2025 10:24:59 +0000
Subject: [PATCH 05/20] Format
---
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index e19adc60a3c08..339d01b7f655f 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -5296,25 +5296,31 @@ SDValue AArch64TargetLowering::LowerALIAS_LANE_MASK(SDValue Op,
unsigned IntrinsicID = 0;
uint64_t EltSize = Op.getOperand(2)->getAsZExtVal();
bool IsWriteAfterRead = Op.getOperand(3)->getAsZExtVal() == 1;
- IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr
- : Intrinsic::aarch64_sve_whilerw;
EVT VT = Op.getValueType();
MVT SimpleVT = VT.getSimpleVT();
// Make sure that the promoted mask size and element size match
switch (EltSize) {
case 1:
+ IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_b
+ : Intrinsic::aarch64_sve_whilerw_b;
assert((SimpleVT == MVT::v16i8 || SimpleVT == MVT::nxv16i1) &&
"Unexpected mask or element size");
break;
case 2:
+ IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_h
+ : Intrinsic::aarch64_sve_whilerw_h;
assert((SimpleVT == MVT::v8i8 || SimpleVT == MVT::nxv8i1) &&
"Unexpected mask or element size");
break;
case 4:
+ IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_s
+ : Intrinsic::aarch64_sve_whilerw_s;
assert((SimpleVT == MVT::v4i16 || SimpleVT == MVT::nxv4i1) &&
"Unexpected mask or element size");
break;
case 8:
+ IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_d
+ : Intrinsic::aarch64_sve_whilerw_d;
assert((SimpleVT == MVT::v2i32 || SimpleVT == MVT::nxv2i1) &&
"Unexpected mask or element size");
break;
>From 7975c947ebbe1d8cf7f774cdc499b505582db397 Mon Sep 17 00:00:00 2001
From: Sam Tebbs <samuel.tebbs at arm.com>
Date: Mon, 27 Jan 2025 14:17:16 +0000
Subject: [PATCH 06/20] Move promote case
---
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 2c6633c3d5a86..0f6c9896086e4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -55,9 +55,6 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
N->dump(&DAG); dbgs() << "\n";
#endif
report_fatal_error("Do not know how to promote this operator!");
- case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
- Res = PromoteIntRes_EXPERIMENTAL_ALIAS_LANE_MASK(N);
- break;
case ISD::MERGE_VALUES:Res = PromoteIntRes_MERGE_VALUES(N, ResNo); break;
case ISD::AssertSext: Res = PromoteIntRes_AssertSext(N); break;
case ISD::AssertZext: Res = PromoteIntRes_AssertZext(N); break;
@@ -320,6 +317,10 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
Res = PromoteIntRes_VP_REDUCE(N);
break;
+ case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
+ Res = PromoteIntRes_EXPERIMENTAL_ALIAS_LANE_MASK(N);
+ break;
+
case ISD::FREEZE:
Res = PromoteIntRes_FREEZE(N);
break;
>From 92b04980c7a0e0e6101da8c6e0da3569857230e6 Mon Sep 17 00:00:00 2001
From: Sam Tebbs <samuel.tebbs at arm.com>
Date: Mon, 27 Jan 2025 14:17:30 +0000
Subject: [PATCH 07/20] Fix tablegen comment
---
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 84fe1bf4acc05..50b2b2855b48f 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -3935,7 +3935,7 @@ let Predicates = [HasSVE2_or_SME] in {
// SVE2 pointer conflict compare
defm WHILEWR_PXX : sve2_int_while_rr<0b0, "whilewr", AArch64whilewr>;
defm WHILERW_PXX : sve2_int_while_rr<0b1, "whilerw", AArch64whilerw>;
-} // End HasSVE2orSME
+} // End HasSVE2_or_SME
let Predicates = [HasSVEAES, HasNonStreamingSVE2_or_SSVE_AES] in {
// SVE2 crypto destructive binary operations
>From 8d4f70c9e78d72ab2336005a3068c67c4817ccd7 Mon Sep 17 00:00:00 2001
From: Sam Tebbs <samuel.tebbs at arm.com>
Date: Mon, 27 Jan 2025 14:20:31 +0000
Subject: [PATCH 08/20] Remove DAGTypeLegalizer::
---
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 0f6c9896086e4..b581414e3273f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -2121,7 +2121,7 @@ bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
Res = PromoteIntOp_PARTIAL_REDUCE_MLA(N);
break;
case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
- Res = DAGTypeLegalizer::PromoteIntOp_EXPERIMENTAL_ALIAS_LANE_MASK(N, OpNo);
+ Res = PromoteIntOp_EXPERIMENTAL_ALIAS_LANE_MASK(N, OpNo);
break;
}
>From 55cec8c82c210c950928e76dbbe8f31da1cc6d01 Mon Sep 17 00:00:00 2001
From: Sam Tebbs <samuel.tebbs at arm.com>
Date: Mon, 27 Jan 2025 14:20:39 +0000
Subject: [PATCH 09/20] Use getConstantOperandVal
---
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 339d01b7f655f..f964bf76bdb37 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -5294,8 +5294,8 @@ SDValue AArch64TargetLowering::LowerALIAS_LANE_MASK(SDValue Op,
SelectionDAG &DAG) const {
SDLoc DL(Op);
unsigned IntrinsicID = 0;
- uint64_t EltSize = Op.getOperand(2)->getAsZExtVal();
- bool IsWriteAfterRead = Op.getOperand(3)->getAsZExtVal() == 1;
+ uint64_t EltSize = Op.getConstantOperandVal(2);
+ bool IsWriteAfterRead = Op.getConstantOperandVal(3) == 1;
EVT VT = Op.getValueType();
MVT SimpleVT = VT.getSimpleVT();
// Make sure that the promoted mask size and element size match
>From 014fe8d49c9e0eb016792e730ee07bef4a05dcf6 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs <samuel.tebbs at arm.com>
Date: Wed, 29 Jan 2025 11:40:32 +0000
Subject: [PATCH 10/20] Remove isPredicateCCSettingOp case
---
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | 1 -
1 file changed, 1 deletion(-)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index f964bf76bdb37..428aec7a311e4 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -19747,7 +19747,6 @@ static SDValue getPTest(SelectionDAG &DAG, EVT VT, SDValue Pg, SDValue Op,
static bool isPredicateCCSettingOp(SDValue N) {
if ((N.getOpcode() == ISD::SETCC ||
- N.getOpcode() == ISD::EXPERIMENTAL_ALIAS_LANE_MASK) ||
(N.getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
(N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilege ||
N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilegt ||
>From 123bbd9f4e4bdd049f48ca06ea706a907f7778f0 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs <samuel.tebbs at arm.com>
Date: Thu, 30 Jan 2025 14:40:12 +0000
Subject: [PATCH 11/20] Remove overloads for pointer and element size
parameters
---
llvm/docs/LangRef.rst | 12 +++----
llvm/include/llvm/IR/Intrinsics.td | 2 +-
.../SelectionDAG/LegalizeVectorOps.cpp | 11 ++++---
.../Target/AArch64/AArch64ISelLowering.cpp | 2 +-
llvm/test/CodeGen/AArch64/alias_mask.ll | 32 +++++++++----------
.../CodeGen/AArch64/alias_mask_scalable.ll | 32 +++++++++----------
6 files changed, 47 insertions(+), 44 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index ba317c7c8640b..83a6bd6eba3a9 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -23744,10 +23744,10 @@ This is an overloaded intrinsic.
::
- declare <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64.i64(i64 %ptrA, i64 %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
- declare <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %ptrA, i64 %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
- declare <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64.i32(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
- declare <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.nxv16i1.i64.i32(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.nxv16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
Overview:
@@ -23787,7 +23787,7 @@ equivalent to:
%m[i] = (icmp ult i, %diff) || (%diff == 0)
where ``%m`` is a vector (mask) of active/inactive lanes with its elements
-indexed by ``i``, and ``%ptrA``, ``%ptrB`` are the two i64 arguments to
+indexed by ``i``, and ``%ptrA``, ``%ptrB`` are the two ptr arguments to
``llvm.experimental.get.alias.lane.mask.*`` and ``%elementSize`` is the first
immediate argument. The ``%writeAfterRead`` argument is expected to be true if
``%ptrB`` is stored to after ``%ptrA`` is read from.
@@ -23813,7 +23813,7 @@ Examples:
.. code-block:: llvm
- %alias.lane.mask = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64.i32(i64 %ptrA, i64 %ptrB, i32 4, i1 1)
+ %alias.lane.mask = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4, i1 1)
%vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison)
[...]
call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %alias.lane.mask)
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 7625a501a596e..7bf86bd70802b 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2381,7 +2381,7 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<1>>
def int_experimental_get_alias_lane_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
- [llvm_anyint_ty, LLVMMatchType<1>, llvm_anyint_ty, llvm_i1_ty],
+ [llvm_anyptr_ty, LLVMMatchType<1>, llvm_i64_ty, llvm_i1_ty],
[IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>]>;
def int_get_active_lane_mask:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
index 28571579df024..763084be23520 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
@@ -1770,8 +1770,7 @@ SDValue VectorLegalizer::ExpandEXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N) {
SDValue SinkValue = N->getOperand(1);
SDValue EltSize = N->getOperand(2);
- bool IsWriteAfterRead =
- cast<ConstantSDNode>(N->getOperand(3))->getZExtValue() != 0;
+ bool IsWriteAfterRead = N->getConstantOperandVal(3) != 0;
auto VT = N->getValueType(0);
auto PtrVT = SourceValue->getValueType(0);
@@ -1780,14 +1779,15 @@ SDValue VectorLegalizer::ExpandEXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N) {
Diff = DAG.getNode(ISD::ABS, DL, PtrVT, Diff);
Diff = DAG.getNode(ISD::SDIV, DL, PtrVT, Diff, EltSize);
- SDValue Zero = DAG.getTargetConstant(0, DL, PtrVT);
// If the difference is positive then some elements may alias
auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
Diff.getValueType());
+ SDValue Zero = DAG.getTargetConstant(0, DL, PtrVT);
SDValue Cmp = DAG.getSetCC(DL, CmpVT, Diff, Zero,
IsWriteAfterRead ? ISD::SETLE : ISD::SETEQ);
+ // Create the lane mask
EVT SplatTY =
EVT::getVectorVT(*DAG.getContext(), PtrVT, VT.getVectorElementCount());
SDValue DiffSplat = DAG.getSplat(SplatTY, DL, Diff);
@@ -1795,7 +1795,10 @@ SDValue VectorLegalizer::ExpandEXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N) {
SDValue DiffMask =
DAG.getSetCC(DL, VT, VectorStep, DiffSplat, ISD::CondCode::SETULT);
- // Splat the compare result then OR it with a lane mask
+ // Splat the compare result then OR it with the lane mask
+ auto VTElementTy = VT.getVectorElementType();
+ if (CmpVT.getScalarSizeInBits() < VTElementTy.getScalarSizeInBits())
+ Cmp = DAG.getNode(ISD::ZERO_EXTEND, DL, VTElementTy, Cmp);
SDValue Splat = DAG.getSplat(VT, DL, Cmp);
return DAG.getNode(ISD::OR, DL, VT, DiffMask, Splat);
}
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 428aec7a311e4..ab26ae8543ac0 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -19746,7 +19746,7 @@ static SDValue getPTest(SelectionDAG &DAG, EVT VT, SDValue Pg, SDValue Op,
AArch64CC::CondCode Cond);
static bool isPredicateCCSettingOp(SDValue N) {
- if ((N.getOpcode() == ISD::SETCC ||
+ if (N.getOpcode() == ISD::SETCC ||
(N.getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
(N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilege ||
N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilegt ||
diff --git a/llvm/test/CodeGen/AArch64/alias_mask.ll b/llvm/test/CodeGen/AArch64/alias_mask.ll
index 9b344f03da077..f88baeece0356 100644
--- a/llvm/test/CodeGen/AArch64/alias_mask.ll
+++ b/llvm/test/CodeGen/AArch64/alias_mask.ll
@@ -2,7 +2,7 @@
; RUN: llc -mtriple=aarch64 -mattr=+sve2 %s -o - | FileCheck %s --check-prefix=CHECK-SVE
; RUN: llc -mtriple=aarch64 %s -o - | FileCheck %s --check-prefix=CHECK-NOSVE
-define <16 x i1> @whilewr_8(i64 %a, i64 %b) {
+define <16 x i1> @whilewr_8(ptr %a, ptr %b) {
; CHECK-SVE-LABEL: whilewr_8:
; CHECK-SVE: // %bb.0: // %entry
; CHECK-SVE-NEXT: whilewr p0.b, x0, x1
@@ -53,11 +53,11 @@ define <16 x i1> @whilewr_8(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64.i64(i64 %a, i64 %b, i64 1, i1 1)
+ %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 1)
ret <16 x i1> %0
}
-define <8 x i1> @whilewr_16(i64 %a, i64 %b) {
+define <8 x i1> @whilewr_16(ptr %a, ptr %b) {
; CHECK-SVE-LABEL: whilewr_16:
; CHECK-SVE: // %bb.0: // %entry
; CHECK-SVE-NEXT: whilewr p0.b, x0, x1
@@ -95,11 +95,11 @@ define <8 x i1> @whilewr_16(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 2, i1 1)
+ %0 = call <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 1)
ret <8 x i1> %0
}
-define <4 x i1> @whilewr_32(i64 %a, i64 %b) {
+define <4 x i1> @whilewr_32(ptr %a, ptr %b) {
; CHECK-SVE-LABEL: whilewr_32:
; CHECK-SVE: // %bb.0: // %entry
; CHECK-SVE-NEXT: whilewr p0.h, x0, x1
@@ -129,11 +129,11 @@ define <4 x i1> @whilewr_32(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64.i64(i64 %a, i64 %b, i64 4, i1 1)
+ %0 = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 1)
ret <4 x i1> %0
}
-define <2 x i1> @whilewr_64(i64 %a, i64 %b) {
+define <2 x i1> @whilewr_64(ptr %a, ptr %b) {
; CHECK-SVE-LABEL: whilewr_64:
; CHECK-SVE: // %bb.0: // %entry
; CHECK-SVE-NEXT: whilewr p0.s, x0, x1
@@ -159,11 +159,11 @@ define <2 x i1> @whilewr_64(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1.i64.i64(i64 %a, i64 %b, i64 8, i1 1)
+ %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 1)
ret <2 x i1> %0
}
-define <16 x i1> @whilerw_8(i64 %a, i64 %b) {
+define <16 x i1> @whilerw_8(ptr %a, ptr %b) {
; CHECK-SVE-LABEL: whilerw_8:
; CHECK-SVE: // %bb.0: // %entry
; CHECK-SVE-NEXT: whilerw p0.b, x0, x1
@@ -215,11 +215,11 @@ define <16 x i1> @whilerw_8(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64.i64(i64 %a, i64 %b, i64 1, i1 0)
+ %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 0)
ret <16 x i1> %0
}
-define <8 x i1> @whilerw_16(i64 %a, i64 %b) {
+define <8 x i1> @whilerw_16(ptr %a, ptr %b) {
; CHECK-SVE-LABEL: whilerw_16:
; CHECK-SVE: // %bb.0: // %entry
; CHECK-SVE-NEXT: whilerw p0.b, x0, x1
@@ -258,11 +258,11 @@ define <8 x i1> @whilerw_16(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 2, i1 0)
+ %0 = call <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 0)
ret <8 x i1> %0
}
-define <4 x i1> @whilerw_32(i64 %a, i64 %b) {
+define <4 x i1> @whilerw_32(ptr %a, ptr %b) {
; CHECK-SVE-LABEL: whilerw_32:
; CHECK-SVE: // %bb.0: // %entry
; CHECK-SVE-NEXT: whilerw p0.h, x0, x1
@@ -293,11 +293,11 @@ define <4 x i1> @whilerw_32(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64.i64(i64 %a, i64 %b, i64 4, i1 0)
+ %0 = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 0)
ret <4 x i1> %0
}
-define <2 x i1> @whilerw_64(i64 %a, i64 %b) {
+define <2 x i1> @whilerw_64(ptr %a, ptr %b) {
; CHECK-SVE-LABEL: whilerw_64:
; CHECK-SVE: // %bb.0: // %entry
; CHECK-SVE-NEXT: whilerw p0.s, x0, x1
@@ -324,6 +324,6 @@ define <2 x i1> @whilerw_64(i64 %a, i64 %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1.i64.i64(i64 %a, i64 %b, i64 8, i1 0)
+ %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 0)
ret <2 x i1> %0
}
diff --git a/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll b/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
index a7c9c5e3cdd33..3d0f293b4687a 100644
--- a/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
+++ b/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
@@ -2,7 +2,7 @@
; RUN: llc -mtriple=aarch64 -mattr=+sve2 %s -o - | FileCheck %s --check-prefix=CHECK-SVE2
; RUN: llc -mtriple=aarch64 -mattr=+sve %s -o - | FileCheck %s --check-prefix=CHECK-SVE
-define <vscale x 16 x i1> @whilewr_8(i64 %a, i64 %b) {
+define <vscale x 16 x i1> @whilewr_8(ptr %a, ptr %b) {
; CHECK-SVE2-LABEL: whilewr_8:
; CHECK-SVE2: // %bb.0: // %entry
; CHECK-SVE2-NEXT: whilewr p0.b, x0, x1
@@ -60,11 +60,11 @@ define <vscale x 16 x i1> @whilewr_8(i64 %a, i64 %b) {
; CHECK-SVE-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64.i64(i64 %a, i64 %b, i64 1, i1 1)
+ %0 = call <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 1)
ret <vscale x 16 x i1> %0
}
-define <vscale x 8 x i1> @whilewr_16(i64 %a, i64 %b) {
+define <vscale x 8 x i1> @whilewr_16(ptr %a, ptr %b) {
; CHECK-SVE2-LABEL: whilewr_16:
; CHECK-SVE2: // %bb.0: // %entry
; CHECK-SVE2-NEXT: whilewr p0.h, x0, x1
@@ -98,11 +98,11 @@ define <vscale x 8 x i1> @whilewr_16(i64 %a, i64 %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 2, i1 1)
+ %0 = call <vscale x 8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 1)
ret <vscale x 8 x i1> %0
}
-define <vscale x 4 x i1> @whilewr_32(i64 %a, i64 %b) {
+define <vscale x 4 x i1> @whilewr_32(ptr %a, ptr %b) {
; CHECK-SVE2-LABEL: whilewr_32:
; CHECK-SVE2: // %bb.0: // %entry
; CHECK-SVE2-NEXT: whilewr p0.s, x0, x1
@@ -130,11 +130,11 @@ define <vscale x 4 x i1> @whilewr_32(i64 %a, i64 %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64.i64(i64 %a, i64 %b, i64 4, i1 1)
+ %0 = call <vscale x 4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 1)
ret <vscale x 4 x i1> %0
}
-define <vscale x 2 x i1> @whilewr_64(i64 %a, i64 %b) {
+define <vscale x 2 x i1> @whilewr_64(ptr %a, ptr %b) {
; CHECK-SVE2-LABEL: whilewr_64:
; CHECK-SVE2: // %bb.0: // %entry
; CHECK-SVE2-NEXT: whilewr p0.d, x0, x1
@@ -158,11 +158,11 @@ define <vscale x 2 x i1> @whilewr_64(i64 %a, i64 %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1.i64.i64(i64 %a, i64 %b, i64 8, i1 1)
+ %0 = call <vscale x 2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 1)
ret <vscale x 2 x i1> %0
}
-define <vscale x 16 x i1> @whilerw_8(i64 %a, i64 %b) {
+define <vscale x 16 x i1> @whilerw_8(ptr %a, ptr %b) {
; CHECK-SVE2-LABEL: whilerw_8:
; CHECK-SVE2: // %bb.0: // %entry
; CHECK-SVE2-NEXT: whilerw p0.b, x0, x1
@@ -223,11 +223,11 @@ define <vscale x 16 x i1> @whilerw_8(i64 %a, i64 %b) {
; CHECK-SVE-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64.i64(i64 %a, i64 %b, i64 1, i1 0)
+ %0 = call <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 0)
ret <vscale x 16 x i1> %0
}
-define <vscale x 8 x i1> @whilerw_16(i64 %a, i64 %b) {
+define <vscale x 8 x i1> @whilerw_16(ptr %a, ptr %b) {
; CHECK-SVE2-LABEL: whilerw_16:
; CHECK-SVE2: // %bb.0: // %entry
; CHECK-SVE2-NEXT: whilerw p0.h, x0, x1
@@ -262,11 +262,11 @@ define <vscale x 8 x i1> @whilerw_16(i64 %a, i64 %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64.i64(i64 %a, i64 %b, i64 2, i1 0)
+ %0 = call <vscale x 8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 0)
ret <vscale x 8 x i1> %0
}
-define <vscale x 4 x i1> @whilerw_32(i64 %a, i64 %b) {
+define <vscale x 4 x i1> @whilerw_32(ptr %a, ptr %b) {
; CHECK-SVE2-LABEL: whilerw_32:
; CHECK-SVE2: // %bb.0: // %entry
; CHECK-SVE2-NEXT: whilerw p0.s, x0, x1
@@ -295,11 +295,11 @@ define <vscale x 4 x i1> @whilerw_32(i64 %a, i64 %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64.i64(i64 %a, i64 %b, i64 4, i1 0)
+ %0 = call <vscale x 4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 0)
ret <vscale x 4 x i1> %0
}
-define <vscale x 2 x i1> @whilerw_64(i64 %a, i64 %b) {
+define <vscale x 2 x i1> @whilerw_64(ptr %a, ptr %b) {
; CHECK-SVE2-LABEL: whilerw_64:
; CHECK-SVE2: // %bb.0: // %entry
; CHECK-SVE2-NEXT: whilerw p0.d, x0, x1
@@ -324,6 +324,6 @@ define <vscale x 2 x i1> @whilerw_64(i64 %a, i64 %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1.i64.i64(i64 %a, i64 %b, i64 8, i1 0)
+ %0 = call <vscale x 2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 0)
ret <vscale x 2 x i1> %0
}
>From db43b82b9a550577b65ef45cc7086378bb47e5b7 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs <samuel.tebbs at arm.com>
Date: Thu, 30 Jan 2025 15:16:57 +0000
Subject: [PATCH 12/20] Clarify elementSize and writeAfterRead = 0
---
llvm/docs/LangRef.rst | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 83a6bd6eba3a9..a776c1ff62ecd 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -23766,6 +23766,7 @@ The final two are immediates and the result is a vector with the i1 element type
Semantics:
""""""""""
+``%elementSize`` is the size of the accessed elements in bytes.
The intrinsic will return poison if ``%ptrA`` and ``%ptrB`` are within
VF * ``%elementSize`` of each other and ``%ptrA`` + VF * ``%elementSize`` wraps.
In other cases when ``%writeAfterRead`` is true, the
@@ -23790,7 +23791,8 @@ where ``%m`` is a vector (mask) of active/inactive lanes with its elements
indexed by ``i``, and ``%ptrA``, ``%ptrB`` are the two ptr arguments to
``llvm.experimental.get.alias.lane.mask.*`` and ``%elementSize`` is the first
immediate argument. The ``%writeAfterRead`` argument is expected to be true if
-``%ptrB`` is stored to after ``%ptrA`` is read from.
+``%ptrB`` is stored to after ``%ptrA`` is read from, otherwise it is false for
+a read after write.
The above is equivalent to:
::
>From 9386cc4c89c73a8c94051011e6b35319d4e90f96 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs <samuel.tebbs at arm.com>
Date: Thu, 30 Jan 2025 15:23:46 +0000
Subject: [PATCH 13/20] Add i=0 to VF-1
---
llvm/docs/LangRef.rst | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index a776c1ff62ecd..2bcf2bb547364 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -23788,11 +23788,11 @@ equivalent to:
%m[i] = (icmp ult i, %diff) || (%diff == 0)
where ``%m`` is a vector (mask) of active/inactive lanes with its elements
-indexed by ``i``, and ``%ptrA``, ``%ptrB`` are the two ptr arguments to
-``llvm.experimental.get.alias.lane.mask.*`` and ``%elementSize`` is the first
-immediate argument. The ``%writeAfterRead`` argument is expected to be true if
-``%ptrB`` is stored to after ``%ptrA`` is read from, otherwise it is false for
-a read after write.
+indexed by ``i`` (i = 0 to VF - 1), and ``%ptrA``, ``%ptrB`` are the two ptr
+arguments to ``llvm.experimental.get.alias.lane.mask.*`` and ``%elementSize``
+is the first immediate argument. The ``%writeAfterRead`` argument is expected
+to be true if ``%ptrB`` is stored to after ``%ptrA`` is read from, otherwise
+it is false for a read after write.
The above is equivalent to:
::
>From ee959c8aca3449593fcf125cc55b79e93bf8bb69 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs <samuel.tebbs at arm.com>
Date: Thu, 30 Jan 2025 16:08:47 +0000
Subject: [PATCH 14/20] Rename to get.nonalias.lane.mask
---
llvm/docs/LangRef.rst | 28 +++++++++----------
llvm/include/llvm/CodeGen/ISDOpcodes.h | 2 +-
llvm/include/llvm/IR/Intrinsics.td | 4 +--
.../SelectionDAG/LegalizeIntegerTypes.cpp | 16 +++++------
llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 5 ++--
.../SelectionDAG/LegalizeVectorOps.cpp | 10 +++----
.../SelectionDAG/SelectionDAGBuilder.cpp | 6 ++--
.../SelectionDAG/SelectionDAGDumper.cpp | 2 +-
llvm/lib/CodeGen/TargetLoweringBase.cpp | 4 +--
.../Target/AArch64/AArch64ISelLowering.cpp | 18 ++++++------
llvm/lib/Target/AArch64/AArch64ISelLowering.h | 2 +-
llvm/test/CodeGen/AArch64/alias_mask.ll | 16 +++++------
.../CodeGen/AArch64/alias_mask_scalable.ll | 16 +++++------
13 files changed, 65 insertions(+), 64 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 2bcf2bb547364..2dfa06ef37252 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -23733,9 +23733,9 @@ Examples:
%active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %elem0, i64 429)
%wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> poison)
-.. _int_experimental_get_alias_lane_mask:
+.. _int_experimental_get_nonalias_lane_mask:
-'``llvm.experimental.get.alias.lane.mask.*``' Intrinsics
+'``llvm.experimental.get.nonalias.lane.mask.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
@@ -23744,16 +23744,16 @@ This is an overloaded intrinsic.
::
- declare <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
- declare <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
- declare <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
- declare <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.nxv16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <4 x i1> @llvm.experimental.get.nonalias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <8 x i1> @llvm.experimental.get.nonalias.lane.mask.v8i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <16 x i1> @llvm.experimental.get.nonalias.lane.mask.v16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <vscale x 16 x i1> @llvm.experimental.get.nonalias.lane.mask.nxv16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
Overview:
"""""""""
-Create a mask representing lanes that do or not overlap between two pointers
+Create a mask enabling lanes that do not overlap between two pointers
across one vector loop iteration.
@@ -23770,7 +23770,7 @@ Semantics:
The intrinsic will return poison if ``%ptrA`` and ``%ptrB`` are within
VF * ``%elementSize`` of each other and ``%ptrA`` + VF * ``%elementSize`` wraps.
In other cases when ``%writeAfterRead`` is true, the
-'``llvm.experimental.get.alias.lane.mask.*``' intrinsics are semantically
+'``llvm.experimental.get.nonalias.lane.mask.*``' intrinsics are semantically
equivalent to:
::
@@ -23779,7 +23779,7 @@ equivalent to:
%m[i] = (icmp ult i, %diff) || (%diff <= 0)
When the return value is not poison and ``%writeAfterRead`` is false, the
-'``llvm.experimental.get.alias.lane.mask.*``' intrinsics are semantically
+'``llvm.experimental.get.nonalias.lane.mask.*``' intrinsics are semantically
equivalent to:
::
@@ -23789,7 +23789,7 @@ equivalent to:
where ``%m`` is a vector (mask) of active/inactive lanes with its elements
indexed by ``i`` (i = 0 to VF - 1), and ``%ptrA``, ``%ptrB`` are the two ptr
-arguments to ``llvm.experimental.get.alias.lane.mask.*`` and ``%elementSize``
+arguments to ``llvm.experimental.get.nonalias.lane.mask.*`` and ``%elementSize``
is the first immediate argument. The ``%writeAfterRead`` argument is expected
to be true if ``%ptrB`` is stored to after ``%ptrA`` is read from, otherwise
it is false for a read after write.
@@ -23797,7 +23797,7 @@ The above is equivalent to:
::
- %m = @llvm.experimental.get.alias.lane.mask(%ptrA, %ptrB, %elementSize, %writeAfterRead)
+ %m = @llvm.experimental.get.nonalias.lane.mask(%ptrA, %ptrB, %elementSize, %writeAfterRead)
This can, for example, be emitted by the loop vectorizer in which case
``%ptrA`` is a pointer that is read from within the loop, and ``%ptrB`` is a
@@ -23815,10 +23815,10 @@ Examples:
.. code-block:: llvm
- %alias.lane.mask = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4, i1 1)
- %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison)
+ %nonalias.lane.mask = call <4 x i1> @llvm.experimental.get.nonalias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4, i1 1)
+ %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %nonalias.lane.mask, <4 x i32> poison)
[...]
- call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %alias.lane.mask)
+ call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %nonalias.lane.mask)
.. _int_experimental_vp_splice:
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index 6737e97b09384..7ed3c747c3ec9 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -1512,7 +1512,7 @@ enum NodeType {
// The `llvm.experimental.get.alias.lane.mask.*` intrinsics
// Operands: Load pointer, Store pointer, Element size, Write after read
// Output: Mask
- EXPERIMENTAL_ALIAS_LANE_MASK,
+ EXPERIMENTAL_NONALIAS_LANE_MASK,
// llvm.clear_cache intrinsic
// Operands: Input Chain, Start Addres, End Address
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 7bf86bd70802b..8f5d9bb1be8b6 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2379,9 +2379,9 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<1>>
llvm_i32_ty]>;
}
-def int_experimental_get_alias_lane_mask:
+def int_experimental_get_nonalias_lane_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
- [llvm_anyptr_ty, LLVMMatchType<1>, llvm_i64_ty, llvm_i1_ty],
+ [llvm_ptr_ty, llvm_ptr_ty, llvm_i64_ty, llvm_i1_ty],
[IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>]>;
def int_get_active_lane_mask:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index b581414e3273f..602971bcdb9ee 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -317,8 +317,8 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
Res = PromoteIntRes_VP_REDUCE(N);
break;
- case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
- Res = PromoteIntRes_EXPERIMENTAL_ALIAS_LANE_MASK(N);
+ case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK:
+ Res = PromoteIntRes_EXPERIMENTAL_NONALIAS_LANE_MASK(N);
break;
case ISD::FREEZE:
@@ -369,10 +369,10 @@ SDValue DAGTypeLegalizer::PromoteIntRes_MERGE_VALUES(SDNode *N,
}
SDValue
-DAGTypeLegalizer::PromoteIntRes_EXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N) {
+DAGTypeLegalizer::PromoteIntRes_EXPERIMENTAL_NONALIAS_LANE_MASK(SDNode *N) {
EVT VT = N->getValueType(0);
EVT NewVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
- return DAG.getNode(ISD::EXPERIMENTAL_ALIAS_LANE_MASK, SDLoc(N), NewVT,
+ return DAG.getNode(ISD::EXPERIMENTAL_NONALIAS_LANE_MASK, SDLoc(N), NewVT,
N->ops());
}
@@ -2120,8 +2120,8 @@ bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
case ISD::PARTIAL_REDUCE_SMLA:
Res = PromoteIntOp_PARTIAL_REDUCE_MLA(N);
break;
- case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
- Res = PromoteIntOp_EXPERIMENTAL_ALIAS_LANE_MASK(N, OpNo);
+ case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK:
+ Res = PromoteIntOp_EXPERIMENTAL_NONALIAS_LANE_MASK(N, OpNo);
break;
}
@@ -2918,8 +2918,8 @@ SDValue DAGTypeLegalizer::PromoteIntOp_PARTIAL_REDUCE_MLA(SDNode *N) {
}
SDValue
-DAGTypeLegalizer::PromoteIntOp_EXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N,
- unsigned OpNo) {
+DAGTypeLegalizer::PromoteIntOp_EXPERIMENTAL_NONALIAS_LANE_MASK(SDNode *N,
+ unsigned OpNo) {
SmallVector<SDValue, 4> NewOps(N->ops());
NewOps[OpNo] = GetPromotedInteger(N->getOperand(OpNo));
return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index ca54936c3fa0b..1069b2d279c44 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -380,7 +380,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntRes_PATCHPOINT(SDNode *N);
SDValue PromoteIntRes_VECTOR_FIND_LAST_ACTIVE(SDNode *N);
SDValue PromoteIntRes_PARTIAL_REDUCE_MLA(SDNode *N);
- SDValue PromoteIntRes_EXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N);
+ SDValue PromoteIntRes_EXPERIMENTAL_NONALIAS_LANE_MASK(SDNode *N);
// Integer Operand Promotion.
bool PromoteIntegerOperand(SDNode *N, unsigned OpNo);
@@ -433,7 +433,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntOp_VECTOR_HISTOGRAM(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_VECTOR_FIND_LAST_ACTIVE(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_PARTIAL_REDUCE_MLA(SDNode *N);
- SDValue PromoteIntOp_EXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N, unsigned OpNo);
+ SDValue PromoteIntOp_EXPERIMENTAL_NONALIAS_LANE_MASK(SDNode *N,
+ unsigned OpNo);
void SExtOrZExtPromotedOperands(SDValue &LHS, SDValue &RHS);
void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
index 763084be23520..1384912a08e13 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
@@ -138,7 +138,7 @@ class VectorLegalizer {
SDValue ExpandVP_FNEG(SDNode *Node);
SDValue ExpandVP_FABS(SDNode *Node);
SDValue ExpandVP_FCOPYSIGN(SDNode *Node);
- SDValue ExpandEXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N);
+ SDValue ExpandEXPERIMENTAL_NONALIAS_LANE_MASK(SDNode *N);
SDValue ExpandSELECT(SDNode *Node);
std::pair<SDValue, SDValue> ExpandLoad(SDNode *N);
SDValue ExpandStore(SDNode *N);
@@ -472,7 +472,7 @@ SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
case ISD::UCMP:
case ISD::PARTIAL_REDUCE_UMLA:
case ISD::PARTIAL_REDUCE_SMLA:
- case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
+ case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK:
Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
break;
case ISD::SMULFIX:
@@ -1254,8 +1254,8 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
case ISD::UCMP:
Results.push_back(TLI.expandCMP(Node, DAG));
return;
- case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
- Results.push_back(ExpandEXPERIMENTAL_ALIAS_LANE_MASK(Node));
+ case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK:
+ Results.push_back(ExpandEXPERIMENTAL_NONALIAS_LANE_MASK(Node));
return;
case ISD::FADD:
@@ -1764,7 +1764,7 @@ SDValue VectorLegalizer::ExpandVP_FCOPYSIGN(SDNode *Node) {
return DAG.getNode(ISD::BITCAST, DL, VT, CopiedSign);
}
-SDValue VectorLegalizer::ExpandEXPERIMENTAL_ALIAS_LANE_MASK(SDNode *N) {
+SDValue VectorLegalizer::ExpandEXPERIMENTAL_NONALIAS_LANE_MASK(SDNode *N) {
SDLoc DL(N);
SDValue SourceValue = N->getOperand(0);
SDValue SinkValue = N->getOperand(1);
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index ea97cb2652217..bac4ef4f9fd85 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8291,13 +8291,13 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
visitVectorExtractLastActive(I, Intrinsic);
return;
}
- case Intrinsic::experimental_get_alias_lane_mask: {
+ case Intrinsic::experimental_get_nonalias_lane_mask: {
auto IntrinsicVT = EVT::getEVT(I.getType());
SmallVector<SDValue, 4> Ops;
for (auto &Op : I.operands())
Ops.push_back(getValue(Op));
- SDValue Mask =
- DAG.getNode(ISD::EXPERIMENTAL_ALIAS_LANE_MASK, sdl, IntrinsicVT, Ops);
+ SDValue Mask = DAG.getNode(ISD::EXPERIMENTAL_NONALIAS_LANE_MASK, sdl,
+ IntrinsicVT, Ops);
setValue(&I, Mask);
}
}
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index 1a325c388dc2e..0ae3039535ef4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -573,7 +573,7 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
return "partial_reduce_umla";
case ISD::PARTIAL_REDUCE_SMLA:
return "partial_reduce_smla";
- case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
+ case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK:
return "alias_lane_mask";
// Vector Predication
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 407eddfd9c756..f9154d61871b7 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -831,8 +831,8 @@ void TargetLoweringBase::initActions() {
// Masked vector extracts default to expand.
setOperationAction(ISD::VECTOR_FIND_LAST_ACTIVE, VT, Expand);
- // Aliasing lanes mask default to expand
- setOperationAction(ISD::EXPERIMENTAL_ALIAS_LANE_MASK, VT, Expand);
+ // Non-aliasing lanes mask default to expand
+ setOperationAction(ISD::EXPERIMENTAL_NONALIAS_LANE_MASK, VT, Expand);
// FP environment operations default to expand.
setOperationAction(ISD::GET_FPENV, VT, Expand);
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index ab26ae8543ac0..fc8473e559a69 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1826,7 +1826,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
(Subtarget->hasSME() && Subtarget->isStreaming())) {
for (auto VT : {MVT::v2i32, MVT::v4i16, MVT::v8i8, MVT::v16i8, MVT::nxv2i1,
MVT::nxv4i1, MVT::nxv8i1, MVT::nxv16i1}) {
- setOperationAction(ISD::EXPERIMENTAL_ALIAS_LANE_MASK, VT, Custom);
+ setOperationAction(ISD::EXPERIMENTAL_NONALIAS_LANE_MASK, VT, Custom);
}
}
@@ -5290,8 +5290,9 @@ SDValue AArch64TargetLowering::LowerFSINCOS(SDValue Op,
static MVT getSVEContainerType(EVT ContentTy);
-SDValue AArch64TargetLowering::LowerALIAS_LANE_MASK(SDValue Op,
- SelectionDAG &DAG) const {
+SDValue
+AArch64TargetLowering::LowerNONALIAS_LANE_MASK(SDValue Op,
+ SelectionDAG &DAG) const {
SDLoc DL(Op);
unsigned IntrinsicID = 0;
uint64_t EltSize = Op.getConstantOperandVal(2);
@@ -6573,7 +6574,6 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
return DAG.getNode(AArch64ISD::USDOT, dl, Op.getValueType(),
Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
}
- case Intrinsic::experimental_get_alias_lane_mask:
case Intrinsic::get_active_lane_mask: {
unsigned IntrinsicID = Intrinsic::aarch64_sve_whilelo;
SDValue ID = DAG.getTargetConstant(IntrinsicID, dl, MVT::i64);
@@ -7446,8 +7446,8 @@ SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
default:
llvm_unreachable("unimplemented operand");
return SDValue();
- case ISD::EXPERIMENTAL_ALIAS_LANE_MASK:
- return LowerALIAS_LANE_MASK(Op, DAG);
+ case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK:
+ return LowerNONALIAS_LANE_MASK(Op, DAG);
case ISD::BITCAST:
return LowerBITCAST(Op, DAG);
case ISD::GlobalAddress:
@@ -27638,7 +27638,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
// CONCAT_VECTORS -- but delegate to common code for result type
// legalisation
return;
- case ISD::EXPERIMENTAL_ALIAS_LANE_MASK: {
+ case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK: {
EVT VT = N->getValueType(0);
if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
return;
@@ -27650,7 +27650,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
SDLoc DL(N);
auto V =
- DAG.getNode(ISD::EXPERIMENTAL_ALIAS_LANE_MASK, DL, NewVT, N->ops());
+ DAG.getNode(ISD::EXPERIMENTAL_NONALIAS_LANE_MASK, DL, NewVT, N->ops());
Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, VT, V));
return;
}
@@ -27710,7 +27710,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
return;
}
case Intrinsic::experimental_vector_match:
- case Intrinsic::experimental_get_alias_lane_mask:
+ case Intrinsic::experimental_get_nonalias_lane_mask:
case Intrinsic::get_active_lane_mask: {
if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
return;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 814fb095cfc7f..299d4b8bf81b6 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -1221,7 +1221,7 @@ class AArch64TargetLowering : public TargetLowering {
SDValue LowerXOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFSINCOS(SDValue Op, SelectionDAG &DAG) const;
- SDValue LowerALIAS_LANE_MASK(SDValue Op, SelectionDAG &DAG) const;
+ SDValue LowerNONALIAS_LANE_MASK(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVSCALE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/test/CodeGen/AArch64/alias_mask.ll b/llvm/test/CodeGen/AArch64/alias_mask.ll
index f88baeece0356..5ef6b588fe767 100644
--- a/llvm/test/CodeGen/AArch64/alias_mask.ll
+++ b/llvm/test/CodeGen/AArch64/alias_mask.ll
@@ -53,7 +53,7 @@ define <16 x i1> @whilewr_8(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 1)
+ %0 = call <16 x i1> @llvm.experimental.get.nonalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 1)
ret <16 x i1> %0
}
@@ -95,7 +95,7 @@ define <8 x i1> @whilewr_16(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 1)
+ %0 = call <8 x i1> @llvm.experimental.get.nonalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 1)
ret <8 x i1> %0
}
@@ -129,7 +129,7 @@ define <4 x i1> @whilewr_32(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 1)
+ %0 = call <4 x i1> @llvm.experimental.get.nonalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 1)
ret <4 x i1> %0
}
@@ -159,7 +159,7 @@ define <2 x i1> @whilewr_64(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 1)
+ %0 = call <2 x i1> @llvm.experimental.get.nonalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 1)
ret <2 x i1> %0
}
@@ -215,7 +215,7 @@ define <16 x i1> @whilerw_8(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 0)
+ %0 = call <16 x i1> @llvm.experimental.get.nonalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 0)
ret <16 x i1> %0
}
@@ -258,7 +258,7 @@ define <8 x i1> @whilerw_16(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 0)
+ %0 = call <8 x i1> @llvm.experimental.get.nonalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 0)
ret <8 x i1> %0
}
@@ -293,7 +293,7 @@ define <4 x i1> @whilerw_32(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 0)
+ %0 = call <4 x i1> @llvm.experimental.get.nonalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 0)
ret <4 x i1> %0
}
@@ -324,6 +324,6 @@ define <2 x i1> @whilerw_64(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 0)
+ %0 = call <2 x i1> @llvm.experimental.get.nonalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 0)
ret <2 x i1> %0
}
diff --git a/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll b/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
index 3d0f293b4687a..6884f14d685b5 100644
--- a/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
+++ b/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
@@ -60,7 +60,7 @@ define <vscale x 16 x i1> @whilewr_8(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 1)
+ %0 = call <vscale x 16 x i1> @llvm.experimental.get.nonalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 1)
ret <vscale x 16 x i1> %0
}
@@ -98,7 +98,7 @@ define <vscale x 8 x i1> @whilewr_16(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 1)
+ %0 = call <vscale x 8 x i1> @llvm.experimental.get.nonalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 1)
ret <vscale x 8 x i1> %0
}
@@ -130,7 +130,7 @@ define <vscale x 4 x i1> @whilewr_32(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 1)
+ %0 = call <vscale x 4 x i1> @llvm.experimental.get.nonalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 1)
ret <vscale x 4 x i1> %0
}
@@ -158,7 +158,7 @@ define <vscale x 2 x i1> @whilewr_64(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 1)
+ %0 = call <vscale x 2 x i1> @llvm.experimental.get.nonalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 1)
ret <vscale x 2 x i1> %0
}
@@ -223,7 +223,7 @@ define <vscale x 16 x i1> @whilerw_8(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 0)
+ %0 = call <vscale x 16 x i1> @llvm.experimental.get.nonalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 0)
ret <vscale x 16 x i1> %0
}
@@ -262,7 +262,7 @@ define <vscale x 8 x i1> @whilerw_16(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 0)
+ %0 = call <vscale x 8 x i1> @llvm.experimental.get.nonalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 0)
ret <vscale x 8 x i1> %0
}
@@ -295,7 +295,7 @@ define <vscale x 4 x i1> @whilerw_32(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 0)
+ %0 = call <vscale x 4 x i1> @llvm.experimental.get.nonalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 0)
ret <vscale x 4 x i1> %0
}
@@ -324,6 +324,6 @@ define <vscale x 2 x i1> @whilerw_64(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 2 x i1> @llvm.experimental.get.alias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 0)
+ %0 = call <vscale x 2 x i1> @llvm.experimental.get.nonalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 0)
ret <vscale x 2 x i1> %0
}
>From 8787880c5fda3a5fa84ae4c7cde8dc6b0e9fe024 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs <samuel.tebbs at arm.com>
Date: Thu, 30 Jan 2025 16:15:26 +0000
Subject: [PATCH 15/20] Fix pointer types in example
---
llvm/docs/LangRef.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 2dfa06ef37252..7d09cb20a4cd6 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -23816,9 +23816,9 @@ Examples:
.. code-block:: llvm
%nonalias.lane.mask = call <4 x i1> @llvm.experimental.get.nonalias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4, i1 1)
- %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %nonalias.lane.mask, <4 x i32> poison)
+ %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(ptr %ptrA, i32 4, <4 x i1> %nonalias.lane.mask, <4 x i32> poison)
[...]
- call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %nonalias.lane.mask)
+ call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, ptr %ptrB, i32 4, <4 x i1> %nonalias.lane.mask)
.. _int_experimental_vp_splice:
>From 2291a6dc88b6f43178d4c05d2ef2b7194d77aec2 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs <samuel.tebbs at arm.com>
Date: Thu, 30 Jan 2025 16:15:35 +0000
Subject: [PATCH 16/20] Remove shouldExpandGetAliasLaneMask
---
llvm/include/llvm/CodeGen/TargetLowering.h | 7 -------
1 file changed, 7 deletions(-)
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 7bb04593d34e5..a4c3d042fe3a4 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -469,13 +469,6 @@ class TargetLoweringBase {
return true;
}
- /// Return true if the @llvm.experimental.get.alias.lane.mask intrinsic should
- /// be expanded using generic code in SelectionDAGBuilder.
- virtual bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT,
- unsigned EltSize) const {
- return true;
- }
-
virtual bool shouldExpandGetVectorLength(EVT CountVT, unsigned VF,
bool IsScalable) const {
return true;
>From a86d95fd7a59e497f07b0d565c15d8c89efa6162 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs <samuel.tebbs at arm.com>
Date: Thu, 30 Jan 2025 16:25:35 +0000
Subject: [PATCH 17/20] Lower to ISD node rather than intrinsic
---
.../Target/AArch64/AArch64ISelLowering.cpp | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index fc8473e559a69..709999e8206a5 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -5294,34 +5294,27 @@ SDValue
AArch64TargetLowering::LowerNONALIAS_LANE_MASK(SDValue Op,
SelectionDAG &DAG) const {
SDLoc DL(Op);
- unsigned IntrinsicID = 0;
uint64_t EltSize = Op.getConstantOperandVal(2);
bool IsWriteAfterRead = Op.getConstantOperandVal(3) == 1;
+ unsigned Opcode =
+ IsWriteAfterRead ? AArch64ISD::WHILEWR : AArch64ISD::WHILERW;
EVT VT = Op.getValueType();
MVT SimpleVT = VT.getSimpleVT();
// Make sure that the promoted mask size and element size match
switch (EltSize) {
case 1:
- IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_b
- : Intrinsic::aarch64_sve_whilerw_b;
assert((SimpleVT == MVT::v16i8 || SimpleVT == MVT::nxv16i1) &&
"Unexpected mask or element size");
break;
case 2:
- IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_h
- : Intrinsic::aarch64_sve_whilerw_h;
assert((SimpleVT == MVT::v8i8 || SimpleVT == MVT::nxv8i1) &&
"Unexpected mask or element size");
break;
case 4:
- IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_s
- : Intrinsic::aarch64_sve_whilerw_s;
assert((SimpleVT == MVT::v4i16 || SimpleVT == MVT::nxv4i1) &&
"Unexpected mask or element size");
break;
case 8:
- IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_d
- : Intrinsic::aarch64_sve_whilerw_d;
assert((SimpleVT == MVT::v2i32 || SimpleVT == MVT::nxv2i1) &&
"Unexpected mask or element size");
break;
@@ -5329,11 +5322,9 @@ AArch64TargetLowering::LowerNONALIAS_LANE_MASK(SDValue Op,
llvm_unreachable("Unexpected element size for get.alias.lane.mask");
break;
}
- SDValue ID = DAG.getTargetConstant(IntrinsicID, DL, MVT::i64);
if (VT.isScalableVector())
- return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, VT, ID, Op.getOperand(0),
- Op.getOperand(1));
+ return DAG.getNode(Opcode, DL, VT, Op.getOperand(0), Op.getOperand(1));
// We can use the SVE whilewr/whilerw instruction to lower this
// intrinsic by creating the appropriate sequence of scalable vector
@@ -5343,8 +5334,8 @@ AArch64TargetLowering::LowerNONALIAS_LANE_MASK(SDValue Op,
EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);
EVT WhileVT = ContainerVT.changeElementType(MVT::i1);
- SDValue Mask = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, WhileVT, ID,
- Op.getOperand(0), Op.getOperand(1));
+ SDValue Mask =
+ DAG.getNode(Opcode, DL, WhileVT, Op.getOperand(0), Op.getOperand(1));
SDValue MaskAsInt = DAG.getNode(ISD::SIGN_EXTEND, DL, ContainerVT, Mask);
return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, MaskAsInt,
DAG.getVectorIdxConstant(0, DL));
>From 246e075b3ef5ee8904b462226adaf0ed81d72e14 Mon Sep 17 00:00:00 2001
From: Sam Tebbs <samuel.tebbs at arm.com>
Date: Fri, 31 Jan 2025 14:24:11 +0000
Subject: [PATCH 18/20] Rename to noalias
---
llvm/docs/LangRef.rst | 28 +++++++++----------
llvm/include/llvm/CodeGen/ISDOpcodes.h | 2 +-
llvm/include/llvm/IR/Intrinsics.td | 2 +-
.../SelectionDAG/LegalizeIntegerTypes.cpp | 16 +++++------
llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 4 +--
.../SelectionDAG/LegalizeVectorOps.cpp | 10 +++----
.../SelectionDAG/SelectionDAGBuilder.cpp | 6 ++--
.../SelectionDAG/SelectionDAGDumper.cpp | 2 +-
llvm/lib/CodeGen/TargetLoweringBase.cpp | 2 +-
.../Target/AArch64/AArch64ISelLowering.cpp | 17 ++++++-----
llvm/lib/Target/AArch64/AArch64ISelLowering.h | 2 +-
llvm/test/CodeGen/AArch64/alias_mask.ll | 16 +++++------
.../CodeGen/AArch64/alias_mask_scalable.ll | 16 +++++------
13 files changed, 61 insertions(+), 62 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 7d09cb20a4cd6..91c8bc4ea588c 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -23733,10 +23733,10 @@ Examples:
%active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %elem0, i64 429)
%wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> poison)
-.. _int_experimental_get_nonalias_lane_mask:
+.. _int_experimental_get_noalias_lane_mask:
-'``llvm.experimental.get.nonalias.lane.mask.*``' Intrinsics
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+'``llvm.experimental.get.noalias.lane.mask.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
"""""""
@@ -23744,10 +23744,10 @@ This is an overloaded intrinsic.
::
- declare <4 x i1> @llvm.experimental.get.nonalias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
- declare <8 x i1> @llvm.experimental.get.nonalias.lane.mask.v8i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
- declare <16 x i1> @llvm.experimental.get.nonalias.lane.mask.v16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
- declare <vscale x 16 x i1> @llvm.experimental.get.nonalias.lane.mask.nxv16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <4 x i1> @llvm.experimental.get.noalias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <8 x i1> @llvm.experimental.get.noalias.lane.mask.v8i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <16 x i1> @llvm.experimental.get.noalias.lane.mask.v16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <vscale x 16 x i1> @llvm.experimental.get.noalias.lane.mask.nxv16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
Overview:
@@ -23770,7 +23770,7 @@ Semantics:
The intrinsic will return poison if ``%ptrA`` and ``%ptrB`` are within
VF * ``%elementSize`` of each other and ``%ptrA`` + VF * ``%elementSize`` wraps.
In other cases when ``%writeAfterRead`` is true, the
-'``llvm.experimental.get.nonalias.lane.mask.*``' intrinsics are semantically
+'``llvm.experimental.get.noalias.lane.mask.*``' intrinsics are semantically
equivalent to:
::
@@ -23779,7 +23779,7 @@ equivalent to:
%m[i] = (icmp ult i, %diff) || (%diff <= 0)
When the return value is not poison and ``%writeAfterRead`` is false, the
-'``llvm.experimental.get.nonalias.lane.mask.*``' intrinsics are semantically
+'``llvm.experimental.get.noalias.lane.mask.*``' intrinsics are semantically
equivalent to:
::
@@ -23789,7 +23789,7 @@ equivalent to:
where ``%m`` is a vector (mask) of active/inactive lanes with its elements
indexed by ``i`` (i = 0 to VF - 1), and ``%ptrA``, ``%ptrB`` are the two ptr
-arguments to ``llvm.experimental.get.nonalias.lane.mask.*`` and ``%elementSize``
+arguments to ``llvm.experimental.get.noalias.lane.mask.*`` and ``%elementSize``
is the first immediate argument. The ``%writeAfterRead`` argument is expected
to be true if ``%ptrB`` is stored to after ``%ptrA`` is read from, otherwise
it is false for a read after write.
@@ -23797,7 +23797,7 @@ The above is equivalent to:
::
- %m = @llvm.experimental.get.nonalias.lane.mask(%ptrA, %ptrB, %elementSize, %writeAfterRead)
+ %m = @llvm.experimental.get.noalias.lane.mask(%ptrA, %ptrB, %elementSize, %writeAfterRead)
This can, for example, be emitted by the loop vectorizer in which case
``%ptrA`` is a pointer that is read from within the loop, and ``%ptrB`` is a
@@ -23815,10 +23815,10 @@ Examples:
.. code-block:: llvm
- %nonalias.lane.mask = call <4 x i1> @llvm.experimental.get.nonalias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4, i1 1)
- %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(ptr %ptrA, i32 4, <4 x i1> %nonalias.lane.mask, <4 x i32> poison)
+ %noalias.lane.mask = call <4 x i1> @llvm.experimental.get.noalias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4, i1 1)
+ %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(ptr %ptrA, i32 4, <4 x i1> %noalias.lane.mask, <4 x i32> poison)
[...]
- call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, ptr %ptrB, i32 4, <4 x i1> %nonalias.lane.mask)
+ call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, ptr %ptrB, i32 4, <4 x i1> %noalias.lane.mask)
.. _int_experimental_vp_splice:
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index 7ed3c747c3ec9..e7527fef1476a 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -1512,7 +1512,7 @@ enum NodeType {
// The `llvm.experimental.get.alias.lane.mask.*` intrinsics
// Operands: Load pointer, Store pointer, Element size, Write after read
// Output: Mask
- EXPERIMENTAL_NONALIAS_LANE_MASK,
+ EXPERIMENTAL_NOALIAS_LANE_MASK,
// llvm.clear_cache intrinsic
// Operands: Input Chain, Start Addres, End Address
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 8f5d9bb1be8b6..4d5ed3118d824 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2379,7 +2379,7 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<1>>
llvm_i32_ty]>;
}
-def int_experimental_get_nonalias_lane_mask:
+def int_experimental_get_noalias_lane_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_ptr_ty, llvm_ptr_ty, llvm_i64_ty, llvm_i1_ty],
[IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>]>;
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 602971bcdb9ee..5182efc2be137 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -317,8 +317,8 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
Res = PromoteIntRes_VP_REDUCE(N);
break;
- case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK:
- Res = PromoteIntRes_EXPERIMENTAL_NONALIAS_LANE_MASK(N);
+ case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK:
+ Res = PromoteIntRes_EXPERIMENTAL_NOALIAS_LANE_MASK(N);
break;
case ISD::FREEZE:
@@ -369,10 +369,10 @@ SDValue DAGTypeLegalizer::PromoteIntRes_MERGE_VALUES(SDNode *N,
}
SDValue
-DAGTypeLegalizer::PromoteIntRes_EXPERIMENTAL_NONALIAS_LANE_MASK(SDNode *N) {
+DAGTypeLegalizer::PromoteIntRes_EXPERIMENTAL_NOALIAS_LANE_MASK(SDNode *N) {
EVT VT = N->getValueType(0);
EVT NewVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
- return DAG.getNode(ISD::EXPERIMENTAL_NONALIAS_LANE_MASK, SDLoc(N), NewVT,
+ return DAG.getNode(ISD::EXPERIMENTAL_NOALIAS_LANE_MASK, SDLoc(N), NewVT,
N->ops());
}
@@ -2120,8 +2120,8 @@ bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
case ISD::PARTIAL_REDUCE_SMLA:
Res = PromoteIntOp_PARTIAL_REDUCE_MLA(N);
break;
- case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK:
- Res = PromoteIntOp_EXPERIMENTAL_NONALIAS_LANE_MASK(N, OpNo);
+ case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK:
+ Res = PromoteIntOp_EXPERIMENTAL_NOALIAS_LANE_MASK(N, OpNo);
break;
}
@@ -2918,8 +2918,8 @@ SDValue DAGTypeLegalizer::PromoteIntOp_PARTIAL_REDUCE_MLA(SDNode *N) {
}
SDValue
-DAGTypeLegalizer::PromoteIntOp_EXPERIMENTAL_NONALIAS_LANE_MASK(SDNode *N,
- unsigned OpNo) {
+DAGTypeLegalizer::PromoteIntOp_EXPERIMENTAL_NOALIAS_LANE_MASK(SDNode *N,
+ unsigned OpNo) {
SmallVector<SDValue, 4> NewOps(N->ops());
NewOps[OpNo] = GetPromotedInteger(N->getOperand(OpNo));
return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 1069b2d279c44..1bca5589247f6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -380,7 +380,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntRes_PATCHPOINT(SDNode *N);
SDValue PromoteIntRes_VECTOR_FIND_LAST_ACTIVE(SDNode *N);
SDValue PromoteIntRes_PARTIAL_REDUCE_MLA(SDNode *N);
- SDValue PromoteIntRes_EXPERIMENTAL_NONALIAS_LANE_MASK(SDNode *N);
+ SDValue PromoteIntRes_EXPERIMENTAL_NOALIAS_LANE_MASK(SDNode *N);
// Integer Operand Promotion.
bool PromoteIntegerOperand(SDNode *N, unsigned OpNo);
@@ -433,8 +433,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntOp_VECTOR_HISTOGRAM(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_VECTOR_FIND_LAST_ACTIVE(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_PARTIAL_REDUCE_MLA(SDNode *N);
- SDValue PromoteIntOp_EXPERIMENTAL_NONALIAS_LANE_MASK(SDNode *N,
unsigned OpNo);
+ SDValue PromoteIntOp_EXPERIMENTAL_NOALIAS_LANE_MASK(SDNode *N, unsigned OpNo);
void SExtOrZExtPromotedOperands(SDValue &LHS, SDValue &RHS);
void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
index 1384912a08e13..b2f320502f684 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
@@ -138,7 +138,7 @@ class VectorLegalizer {
SDValue ExpandVP_FNEG(SDNode *Node);
SDValue ExpandVP_FABS(SDNode *Node);
SDValue ExpandVP_FCOPYSIGN(SDNode *Node);
- SDValue ExpandEXPERIMENTAL_NONALIAS_LANE_MASK(SDNode *N);
+ SDValue ExpandEXPERIMENTAL_NOALIAS_LANE_MASK(SDNode *N);
SDValue ExpandSELECT(SDNode *Node);
std::pair<SDValue, SDValue> ExpandLoad(SDNode *N);
SDValue ExpandStore(SDNode *N);
@@ -472,7 +472,7 @@ SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
case ISD::UCMP:
case ISD::PARTIAL_REDUCE_UMLA:
case ISD::PARTIAL_REDUCE_SMLA:
- case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK:
+ case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK:
Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
break;
case ISD::SMULFIX:
@@ -1254,8 +1254,8 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
case ISD::UCMP:
Results.push_back(TLI.expandCMP(Node, DAG));
return;
- case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK:
- Results.push_back(ExpandEXPERIMENTAL_NONALIAS_LANE_MASK(Node));
+ case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK:
+ Results.push_back(ExpandEXPERIMENTAL_NOALIAS_LANE_MASK(Node));
return;
case ISD::FADD:
@@ -1764,7 +1764,7 @@ SDValue VectorLegalizer::ExpandVP_FCOPYSIGN(SDNode *Node) {
return DAG.getNode(ISD::BITCAST, DL, VT, CopiedSign);
}
-SDValue VectorLegalizer::ExpandEXPERIMENTAL_NONALIAS_LANE_MASK(SDNode *N) {
+SDValue VectorLegalizer::ExpandEXPERIMENTAL_NOALIAS_LANE_MASK(SDNode *N) {
SDLoc DL(N);
SDValue SourceValue = N->getOperand(0);
SDValue SinkValue = N->getOperand(1);
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index bac4ef4f9fd85..7127c25932884 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8291,13 +8291,13 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
visitVectorExtractLastActive(I, Intrinsic);
return;
}
- case Intrinsic::experimental_get_nonalias_lane_mask: {
+ case Intrinsic::experimental_get_noalias_lane_mask: {
auto IntrinsicVT = EVT::getEVT(I.getType());
SmallVector<SDValue, 4> Ops;
for (auto &Op : I.operands())
Ops.push_back(getValue(Op));
- SDValue Mask = DAG.getNode(ISD::EXPERIMENTAL_NONALIAS_LANE_MASK, sdl,
- IntrinsicVT, Ops);
+ SDValue Mask =
+ DAG.getNode(ISD::EXPERIMENTAL_NOALIAS_LANE_MASK, sdl, IntrinsicVT, Ops);
setValue(&I, Mask);
}
}
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index 0ae3039535ef4..a4e3d03cf9ef2 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -573,7 +573,7 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
return "partial_reduce_umla";
case ISD::PARTIAL_REDUCE_SMLA:
return "partial_reduce_smla";
- case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK:
+ case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK:
return "alias_lane_mask";
// Vector Predication
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index f9154d61871b7..d065b4e39d8f8 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -832,7 +832,7 @@ void TargetLoweringBase::initActions() {
setOperationAction(ISD::VECTOR_FIND_LAST_ACTIVE, VT, Expand);
// Non-aliasing lanes mask default to expand
- setOperationAction(ISD::EXPERIMENTAL_NONALIAS_LANE_MASK, VT, Expand);
+ setOperationAction(ISD::EXPERIMENTAL_NOALIAS_LANE_MASK, VT, Expand);
// FP environment operations default to expand.
setOperationAction(ISD::GET_FPENV, VT, Expand);
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 709999e8206a5..250808e757e83 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1826,7 +1826,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
(Subtarget->hasSME() && Subtarget->isStreaming())) {
for (auto VT : {MVT::v2i32, MVT::v4i16, MVT::v8i8, MVT::v16i8, MVT::nxv2i1,
MVT::nxv4i1, MVT::nxv8i1, MVT::nxv16i1}) {
- setOperationAction(ISD::EXPERIMENTAL_NONALIAS_LANE_MASK, VT, Custom);
+ setOperationAction(ISD::EXPERIMENTAL_NOALIAS_LANE_MASK, VT, Custom);
}
}
@@ -5290,9 +5290,8 @@ SDValue AArch64TargetLowering::LowerFSINCOS(SDValue Op,
static MVT getSVEContainerType(EVT ContentTy);
-SDValue
-AArch64TargetLowering::LowerNONALIAS_LANE_MASK(SDValue Op,
- SelectionDAG &DAG) const {
+SDValue AArch64TargetLowering::LowerNOALIAS_LANE_MASK(SDValue Op,
+ SelectionDAG &DAG) const {
SDLoc DL(Op);
uint64_t EltSize = Op.getConstantOperandVal(2);
bool IsWriteAfterRead = Op.getConstantOperandVal(3) == 1;
@@ -7437,8 +7436,8 @@ SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
default:
llvm_unreachable("unimplemented operand");
return SDValue();
- case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK:
- return LowerNONALIAS_LANE_MASK(Op, DAG);
+ case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK:
+ return LowerNOALIAS_LANE_MASK(Op, DAG);
case ISD::BITCAST:
return LowerBITCAST(Op, DAG);
case ISD::GlobalAddress:
@@ -27629,7 +27628,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
// CONCAT_VECTORS -- but delegate to common code for result type
// legalisation
return;
- case ISD::EXPERIMENTAL_NONALIAS_LANE_MASK: {
+ case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK: {
EVT VT = N->getValueType(0);
if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
return;
@@ -27641,7 +27640,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
SDLoc DL(N);
auto V =
- DAG.getNode(ISD::EXPERIMENTAL_NONALIAS_LANE_MASK, DL, NewVT, N->ops());
+ DAG.getNode(ISD::EXPERIMENTAL_NOALIAS_LANE_MASK, DL, NewVT, N->ops());
Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, VT, V));
return;
}
@@ -27701,7 +27700,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
return;
}
case Intrinsic::experimental_vector_match:
- case Intrinsic::experimental_get_nonalias_lane_mask:
+ case Intrinsic::experimental_get_noalias_lane_mask:
case Intrinsic::get_active_lane_mask: {
if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
return;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 299d4b8bf81b6..bbc44343cbe2e 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -1221,7 +1221,7 @@ class AArch64TargetLowering : public TargetLowering {
SDValue LowerXOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFSINCOS(SDValue Op, SelectionDAG &DAG) const;
- SDValue LowerNONALIAS_LANE_MASK(SDValue Op, SelectionDAG &DAG) const;
+ SDValue LowerNOALIAS_LANE_MASK(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVSCALE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/test/CodeGen/AArch64/alias_mask.ll b/llvm/test/CodeGen/AArch64/alias_mask.ll
index 5ef6b588fe767..21eff3b11c001 100644
--- a/llvm/test/CodeGen/AArch64/alias_mask.ll
+++ b/llvm/test/CodeGen/AArch64/alias_mask.ll
@@ -53,7 +53,7 @@ define <16 x i1> @whilewr_8(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <16 x i1> @llvm.experimental.get.nonalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 1)
+ %0 = call <16 x i1> @llvm.experimental.get.noalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 1)
ret <16 x i1> %0
}
@@ -95,7 +95,7 @@ define <8 x i1> @whilewr_16(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <8 x i1> @llvm.experimental.get.nonalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 1)
+ %0 = call <8 x i1> @llvm.experimental.get.noalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 1)
ret <8 x i1> %0
}
@@ -129,7 +129,7 @@ define <4 x i1> @whilewr_32(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <4 x i1> @llvm.experimental.get.nonalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 1)
+ %0 = call <4 x i1> @llvm.experimental.get.noalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 1)
ret <4 x i1> %0
}
@@ -159,7 +159,7 @@ define <2 x i1> @whilewr_64(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <2 x i1> @llvm.experimental.get.nonalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 1)
+ %0 = call <2 x i1> @llvm.experimental.get.noalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 1)
ret <2 x i1> %0
}
@@ -215,7 +215,7 @@ define <16 x i1> @whilerw_8(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <16 x i1> @llvm.experimental.get.nonalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 0)
+ %0 = call <16 x i1> @llvm.experimental.get.noalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 0)
ret <16 x i1> %0
}
@@ -258,7 +258,7 @@ define <8 x i1> @whilerw_16(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <8 x i1> @llvm.experimental.get.nonalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 0)
+ %0 = call <8 x i1> @llvm.experimental.get.noalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 0)
ret <8 x i1> %0
}
@@ -293,7 +293,7 @@ define <4 x i1> @whilerw_32(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <4 x i1> @llvm.experimental.get.nonalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 0)
+ %0 = call <4 x i1> @llvm.experimental.get.noalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 0)
ret <4 x i1> %0
}
@@ -324,6 +324,6 @@ define <2 x i1> @whilerw_64(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <2 x i1> @llvm.experimental.get.nonalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 0)
+ %0 = call <2 x i1> @llvm.experimental.get.noalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 0)
ret <2 x i1> %0
}
diff --git a/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll b/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
index 6884f14d685b5..b29619c7f397d 100644
--- a/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
+++ b/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
@@ -60,7 +60,7 @@ define <vscale x 16 x i1> @whilewr_8(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 16 x i1> @llvm.experimental.get.nonalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 1)
+ %0 = call <vscale x 16 x i1> @llvm.experimental.get.noalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 1)
ret <vscale x 16 x i1> %0
}
@@ -98,7 +98,7 @@ define <vscale x 8 x i1> @whilewr_16(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 8 x i1> @llvm.experimental.get.nonalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 1)
+ %0 = call <vscale x 8 x i1> @llvm.experimental.get.noalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 1)
ret <vscale x 8 x i1> %0
}
@@ -130,7 +130,7 @@ define <vscale x 4 x i1> @whilewr_32(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 4 x i1> @llvm.experimental.get.nonalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 1)
+ %0 = call <vscale x 4 x i1> @llvm.experimental.get.noalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 1)
ret <vscale x 4 x i1> %0
}
@@ -158,7 +158,7 @@ define <vscale x 2 x i1> @whilewr_64(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 2 x i1> @llvm.experimental.get.nonalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 1)
+ %0 = call <vscale x 2 x i1> @llvm.experimental.get.noalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 1)
ret <vscale x 2 x i1> %0
}
@@ -223,7 +223,7 @@ define <vscale x 16 x i1> @whilerw_8(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 16 x i1> @llvm.experimental.get.nonalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 0)
+ %0 = call <vscale x 16 x i1> @llvm.experimental.get.noalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 0)
ret <vscale x 16 x i1> %0
}
@@ -262,7 +262,7 @@ define <vscale x 8 x i1> @whilerw_16(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 8 x i1> @llvm.experimental.get.nonalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 0)
+ %0 = call <vscale x 8 x i1> @llvm.experimental.get.noalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 0)
ret <vscale x 8 x i1> %0
}
@@ -295,7 +295,7 @@ define <vscale x 4 x i1> @whilerw_32(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 4 x i1> @llvm.experimental.get.nonalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 0)
+ %0 = call <vscale x 4 x i1> @llvm.experimental.get.noalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 0)
ret <vscale x 4 x i1> %0
}
@@ -324,6 +324,6 @@ define <vscale x 2 x i1> @whilerw_64(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 2 x i1> @llvm.experimental.get.nonalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 0)
+ %0 = call <vscale x 2 x i1> @llvm.experimental.get.noalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 0)
ret <vscale x 2 x i1> %0
}
>From 72d646f8ecf586fccc669f2d0274089c7e388743 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs <samuel.tebbs at arm.com>
Date: Wed, 26 Feb 2025 23:39:53 +0000
Subject: [PATCH 19/20] Rename to loop.dependence.raw/war.mask
---
llvm/include/llvm/CodeGen/ISDOpcodes.h | 7 ++---
llvm/include/llvm/IR/Intrinsics.td | 11 +++++---
.../SelectionDAG/LegalizeIntegerTypes.cpp | 20 +++++++-------
llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h | 6 ++---
.../SelectionDAG/LegalizeVectorOps.cpp | 15 ++++++-----
.../SelectionDAG/SelectionDAGBuilder.cpp | 9 ++++---
.../SelectionDAG/SelectionDAGDumper.cpp | 6 +++--
llvm/lib/CodeGen/TargetLoweringBase.cpp | 5 ++--
.../Target/AArch64/AArch64ISelLowering.cpp | 27 ++++++++++++-------
llvm/lib/Target/AArch64/AArch64ISelLowering.h | 2 +-
llvm/test/CodeGen/AArch64/alias_mask.ll | 16 +++++------
.../CodeGen/AArch64/alias_mask_scalable.ll | 16 +++++------
12 files changed, 81 insertions(+), 59 deletions(-)
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index e7527fef1476a..63d9044e45ca3 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -1509,10 +1509,11 @@ enum NodeType {
// Operands: Mask
VECTOR_FIND_LAST_ACTIVE,
- // The `llvm.experimental.get.alias.lane.mask.*` intrinsics
- // Operands: Load pointer, Store pointer, Element size, Write after read
+ // The `llvm.experimental.loop.dependence.{war, raw}.mask` intrinsics
+ // Operands: Load pointer, Store pointer, Element size
// Output: Mask
- EXPERIMENTAL_NOALIAS_LANE_MASK,
+ EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK,
+ EXPERIMENTAL_LOOP_DEPENDENCE_RAW_MASK,
// llvm.clear_cache intrinsic
// Operands: Input Chain, Start Addres, End Address
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 4d5ed3118d824..f5122fc884b80 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2379,10 +2379,15 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<1>>
llvm_i32_ty]>;
}
-def int_experimental_get_noalias_lane_mask:
+def int_experimental_loop_dependence_raw_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
- [llvm_ptr_ty, llvm_ptr_ty, llvm_i64_ty, llvm_i1_ty],
- [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>]>;
+ [llvm_ptr_ty, llvm_ptr_ty, llvm_i64_ty],
+ [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<2>>]>;
+
+def int_experimental_loop_dependence_war_mask:
+ DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+ [llvm_ptr_ty, llvm_ptr_ty, llvm_i64_ty],
+ [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<2>>]>;
def int_get_active_lane_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 5182efc2be137..3176ed3ef4a88 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -317,8 +317,9 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
Res = PromoteIntRes_VP_REDUCE(N);
break;
- case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK:
- Res = PromoteIntRes_EXPERIMENTAL_NOALIAS_LANE_MASK(N);
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK:
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_RAW_MASK:
+ Res = PromoteIntRes_EXPERIMENTAL_LOOP_DEPENDENCE_MASK(N);
break;
case ISD::FREEZE:
@@ -369,11 +370,10 @@ SDValue DAGTypeLegalizer::PromoteIntRes_MERGE_VALUES(SDNode *N,
}
SDValue
-DAGTypeLegalizer::PromoteIntRes_EXPERIMENTAL_NOALIAS_LANE_MASK(SDNode *N) {
+DAGTypeLegalizer::PromoteIntRes_EXPERIMENTAL_LOOP_DEPENDENCE_MASK(SDNode *N) {
EVT VT = N->getValueType(0);
EVT NewVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
- return DAG.getNode(ISD::EXPERIMENTAL_NOALIAS_LANE_MASK, SDLoc(N), NewVT,
- N->ops());
+ return DAG.getNode(N->getOpcode(), SDLoc(N), NewVT, N->ops());
}
SDValue DAGTypeLegalizer::PromoteIntRes_AssertSext(SDNode *N) {
@@ -2120,8 +2120,9 @@ bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
case ISD::PARTIAL_REDUCE_SMLA:
Res = PromoteIntOp_PARTIAL_REDUCE_MLA(N);
break;
- case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK:
- Res = PromoteIntOp_EXPERIMENTAL_NOALIAS_LANE_MASK(N, OpNo);
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_RAW_MASK:
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK:
+ Res = PromoteIntOp_EXPERIMENTAL_LOOP_DEPENDENCE_MASK(N, OpNo);
break;
}
@@ -2917,9 +2918,8 @@ SDValue DAGTypeLegalizer::PromoteIntOp_PARTIAL_REDUCE_MLA(SDNode *N) {
return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
}
-SDValue
-DAGTypeLegalizer::PromoteIntOp_EXPERIMENTAL_NOALIAS_LANE_MASK(SDNode *N,
- unsigned OpNo) {
+SDValue DAGTypeLegalizer::PromoteIntOp_EXPERIMENTAL_LOOP_DEPENDENCE_MASK(
+ SDNode *N, unsigned OpNo) {
SmallVector<SDValue, 4> NewOps(N->ops());
NewOps[OpNo] = GetPromotedInteger(N->getOperand(OpNo));
return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 1bca5589247f6..d680f9cd49109 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -380,7 +380,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntRes_PATCHPOINT(SDNode *N);
SDValue PromoteIntRes_VECTOR_FIND_LAST_ACTIVE(SDNode *N);
SDValue PromoteIntRes_PARTIAL_REDUCE_MLA(SDNode *N);
- SDValue PromoteIntRes_EXPERIMENTAL_NOALIAS_LANE_MASK(SDNode *N);
+ SDValue PromoteIntRes_EXPERIMENTAL_LOOP_DEPENDENCE_MASK(SDNode *N);
// Integer Operand Promotion.
bool PromoteIntegerOperand(SDNode *N, unsigned OpNo);
@@ -433,8 +433,8 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntOp_VECTOR_HISTOGRAM(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_VECTOR_FIND_LAST_ACTIVE(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_PARTIAL_REDUCE_MLA(SDNode *N);
- unsigned OpNo);
- SDValue PromoteIntOp_EXPERIMENTAL_NOALIAS_LANE_MASK(SDNode *N, unsigned OpNo);
+ SDValue PromoteIntOp_EXPERIMENTAL_LOOP_DEPENDENCE_MASK(SDNode *N,
+ unsigned OpNo);
void SExtOrZExtPromotedOperands(SDValue &LHS, SDValue &RHS);
void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
index b2f320502f684..6438dd07e142d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
@@ -138,7 +138,7 @@ class VectorLegalizer {
SDValue ExpandVP_FNEG(SDNode *Node);
SDValue ExpandVP_FABS(SDNode *Node);
SDValue ExpandVP_FCOPYSIGN(SDNode *Node);
- SDValue ExpandEXPERIMENTAL_NOALIAS_LANE_MASK(SDNode *N);
+ SDValue ExpandLOOP_DEPENDENCE_MASK(SDNode *N);
SDValue ExpandSELECT(SDNode *Node);
std::pair<SDValue, SDValue> ExpandLoad(SDNode *N);
SDValue ExpandStore(SDNode *N);
@@ -472,7 +472,8 @@ SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
case ISD::UCMP:
case ISD::PARTIAL_REDUCE_UMLA:
case ISD::PARTIAL_REDUCE_SMLA:
- case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK:
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK:
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_RAW_MASK:
Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
break;
case ISD::SMULFIX:
@@ -1254,8 +1255,9 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
case ISD::UCMP:
Results.push_back(TLI.expandCMP(Node, DAG));
return;
- case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK:
- Results.push_back(ExpandEXPERIMENTAL_NOALIAS_LANE_MASK(Node));
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK:
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_RAW_MASK:
+ Results.push_back(ExpandLOOP_DEPENDENCE_MASK(Node));
return;
case ISD::FADD:
@@ -1764,13 +1766,14 @@ SDValue VectorLegalizer::ExpandVP_FCOPYSIGN(SDNode *Node) {
return DAG.getNode(ISD::BITCAST, DL, VT, CopiedSign);
}
-SDValue VectorLegalizer::ExpandEXPERIMENTAL_NOALIAS_LANE_MASK(SDNode *N) {
+SDValue VectorLegalizer::ExpandLOOP_DEPENDENCE_MASK(SDNode *N) {
SDLoc DL(N);
SDValue SourceValue = N->getOperand(0);
SDValue SinkValue = N->getOperand(1);
SDValue EltSize = N->getOperand(2);
- bool IsWriteAfterRead = N->getConstantOperandVal(3) != 0;
+ bool IsWriteAfterRead =
+ N->getOpcode() == ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK;
auto VT = N->getValueType(0);
auto PtrVT = SourceValue->getValueType(0);
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 7127c25932884..577abcf57e940 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8291,13 +8291,16 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
visitVectorExtractLastActive(I, Intrinsic);
return;
}
- case Intrinsic::experimental_get_noalias_lane_mask: {
+ case Intrinsic::experimental_loop_dependence_war_mask:
+ case Intrinsic::experimental_loop_dependence_raw_mask: {
auto IntrinsicVT = EVT::getEVT(I.getType());
SmallVector<SDValue, 4> Ops;
for (auto &Op : I.operands())
Ops.push_back(getValue(Op));
- SDValue Mask =
- DAG.getNode(ISD::EXPERIMENTAL_NOALIAS_LANE_MASK, sdl, IntrinsicVT, Ops);
+ unsigned ID = Intrinsic == Intrinsic::experimental_loop_dependence_war_mask
+ ? ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK
+ : ISD::EXPERIMENTAL_LOOP_DEPENDENCE_RAW_MASK;
+ SDValue Mask = DAG.getNode(ID, sdl, IntrinsicVT, Ops);
setValue(&I, Mask);
}
}
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index a4e3d03cf9ef2..001d7179a2638 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -573,8 +573,10 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
return "partial_reduce_umla";
case ISD::PARTIAL_REDUCE_SMLA:
return "partial_reduce_smla";
- case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK:
- return "alias_lane_mask";
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK:
+ return "loop_dep_war";
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_RAW_MASK:
+ return "loop_dep_raw";
// Vector Predication
#define BEGIN_REGISTER_VP_SDNODE(SDID, LEGALARG, NAME, ...) \
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index d065b4e39d8f8..1b66f2b8a5b28 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -831,8 +831,9 @@ void TargetLoweringBase::initActions() {
// Masked vector extracts default to expand.
setOperationAction(ISD::VECTOR_FIND_LAST_ACTIVE, VT, Expand);
- // Non-aliasing lanes mask default to expand
- setOperationAction(ISD::EXPERIMENTAL_NOALIAS_LANE_MASK, VT, Expand);
+ // Lane mask with non-aliasing lanes enabled default to expand
+ setOperationAction(ISD::EXPERIMENTAL_LOOP_DEPENDENCE_RAW_MASK, VT, Expand);
+ setOperationAction(ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK, VT, Expand);
// FP environment operations default to expand.
setOperationAction(ISD::GET_FPENV, VT, Expand);
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 250808e757e83..1b05738d9920a 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1826,7 +1826,10 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
(Subtarget->hasSME() && Subtarget->isStreaming())) {
for (auto VT : {MVT::v2i32, MVT::v4i16, MVT::v8i8, MVT::v16i8, MVT::nxv2i1,
MVT::nxv4i1, MVT::nxv8i1, MVT::nxv16i1}) {
- setOperationAction(ISD::EXPERIMENTAL_NOALIAS_LANE_MASK, VT, Custom);
+ setOperationAction(ISD::EXPERIMENTAL_LOOP_DEPENDENCE_RAW_MASK, VT,
+ Custom);
+ setOperationAction(ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK, VT,
+ Custom);
}
}
@@ -5290,11 +5293,13 @@ SDValue AArch64TargetLowering::LowerFSINCOS(SDValue Op,
static MVT getSVEContainerType(EVT ContentTy);
-SDValue AArch64TargetLowering::LowerNOALIAS_LANE_MASK(SDValue Op,
- SelectionDAG &DAG) const {
+SDValue
+AArch64TargetLowering::LowerLOOP_DEPENDENCE_MASK(SDValue Op,
+ SelectionDAG &DAG) const {
SDLoc DL(Op);
uint64_t EltSize = Op.getConstantOperandVal(2);
- bool IsWriteAfterRead = Op.getConstantOperandVal(3) == 1;
+ bool IsWriteAfterRead =
+ Op.getOpcode() == ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK;
unsigned Opcode =
IsWriteAfterRead ? AArch64ISD::WHILEWR : AArch64ISD::WHILERW;
EVT VT = Op.getValueType();
@@ -7436,8 +7441,9 @@ SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
default:
llvm_unreachable("unimplemented operand");
return SDValue();
- case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK:
- return LowerNOALIAS_LANE_MASK(Op, DAG);
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_RAW_MASK:
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK:
+ return LowerLOOP_DEPENDENCE_MASK(Op, DAG);
case ISD::BITCAST:
return LowerBITCAST(Op, DAG);
case ISD::GlobalAddress:
@@ -27628,7 +27634,8 @@ void AArch64TargetLowering::ReplaceNodeResults(
// CONCAT_VECTORS -- but delegate to common code for result type
// legalisation
return;
- case ISD::EXPERIMENTAL_NOALIAS_LANE_MASK: {
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_WAR_MASK:
+ case ISD::EXPERIMENTAL_LOOP_DEPENDENCE_RAW_MASK: {
EVT VT = N->getValueType(0);
if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
return;
@@ -27639,8 +27646,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
return;
SDLoc DL(N);
- auto V =
- DAG.getNode(ISD::EXPERIMENTAL_NOALIAS_LANE_MASK, DL, NewVT, N->ops());
+ auto V = DAG.getNode(N->getOpcode(), DL, NewVT, N->ops());
Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, VT, V));
return;
}
@@ -27700,7 +27706,8 @@ void AArch64TargetLowering::ReplaceNodeResults(
return;
}
case Intrinsic::experimental_vector_match:
- case Intrinsic::experimental_get_noalias_lane_mask:
+ case Intrinsic::experimental_loop_dependence_raw_mask:
+ case Intrinsic::experimental_loop_dependence_war_mask:
case Intrinsic::get_active_lane_mask: {
if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
return;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index bbc44343cbe2e..0ba9298d26a60 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -1221,7 +1221,7 @@ class AArch64TargetLowering : public TargetLowering {
SDValue LowerXOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFSINCOS(SDValue Op, SelectionDAG &DAG) const;
- SDValue LowerNOALIAS_LANE_MASK(SDValue Op, SelectionDAG &DAG) const;
+ SDValue LowerLOOP_DEPENDENCE_MASK(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVSCALE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/test/CodeGen/AArch64/alias_mask.ll b/llvm/test/CodeGen/AArch64/alias_mask.ll
index 21eff3b11c001..3248cb2de2644 100644
--- a/llvm/test/CodeGen/AArch64/alias_mask.ll
+++ b/llvm/test/CodeGen/AArch64/alias_mask.ll
@@ -53,7 +53,7 @@ define <16 x i1> @whilewr_8(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <16 x i1> @llvm.experimental.get.noalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 1)
+ %0 = call <16 x i1> @llvm.experimental.loop.dependence.war.mask.v16i1(ptr %a, ptr %b, i64 1)
ret <16 x i1> %0
}
@@ -95,7 +95,7 @@ define <8 x i1> @whilewr_16(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <8 x i1> @llvm.experimental.get.noalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 1)
+ %0 = call <8 x i1> @llvm.experimental.loop.dependence.war.mask.v8i1(ptr %a, ptr %b, i64 2)
ret <8 x i1> %0
}
@@ -129,7 +129,7 @@ define <4 x i1> @whilewr_32(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <4 x i1> @llvm.experimental.get.noalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 1)
+ %0 = call <4 x i1> @llvm.experimental.loop.dependence.war.mask.v4i1(ptr %a, ptr %b, i64 4)
ret <4 x i1> %0
}
@@ -159,7 +159,7 @@ define <2 x i1> @whilewr_64(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <2 x i1> @llvm.experimental.get.noalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 1)
+ %0 = call <2 x i1> @llvm.experimental.loop.dependence.war.mask.v2i1(ptr %a, ptr %b, i64 8)
ret <2 x i1> %0
}
@@ -215,7 +215,7 @@ define <16 x i1> @whilerw_8(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.16b, v0.16b, v1.16b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <16 x i1> @llvm.experimental.get.noalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 0)
+ %0 = call <16 x i1> @llvm.experimental.loop.dependence.raw.mask.v16i1(ptr %a, ptr %b, i64 1)
ret <16 x i1> %0
}
@@ -258,7 +258,7 @@ define <8 x i1> @whilerw_16(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <8 x i1> @llvm.experimental.get.noalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 0)
+ %0 = call <8 x i1> @llvm.experimental.loop.dependence.raw.mask.v8i1(ptr %a, ptr %b, i64 2)
ret <8 x i1> %0
}
@@ -293,7 +293,7 @@ define <4 x i1> @whilerw_32(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <4 x i1> @llvm.experimental.get.noalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 0)
+ %0 = call <4 x i1> @llvm.experimental.loop.dependence.raw.mask.v4i1(ptr %a, ptr %b, i64 4)
ret <4 x i1> %0
}
@@ -324,6 +324,6 @@ define <2 x i1> @whilerw_64(ptr %a, ptr %b) {
; CHECK-NOSVE-NEXT: orr v0.8b, v0.8b, v1.8b
; CHECK-NOSVE-NEXT: ret
entry:
- %0 = call <2 x i1> @llvm.experimental.get.noalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 0)
+ %0 = call <2 x i1> @llvm.experimental.loop.dependence.raw.mask.v2i1(ptr %a, ptr %b, i64 8)
ret <2 x i1> %0
}
diff --git a/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll b/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
index b29619c7f397d..5a7c3180e2807 100644
--- a/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
+++ b/llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
@@ -60,7 +60,7 @@ define <vscale x 16 x i1> @whilewr_8(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 16 x i1> @llvm.experimental.get.noalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 1)
+ %0 = call <vscale x 16 x i1> @llvm.experimental.loop.dependence.war.mask.v16i1(ptr %a, ptr %b, i64 1)
ret <vscale x 16 x i1> %0
}
@@ -98,7 +98,7 @@ define <vscale x 8 x i1> @whilewr_16(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 8 x i1> @llvm.experimental.get.noalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 1)
+ %0 = call <vscale x 8 x i1> @llvm.experimental.loop.dependence.war.mask.v8i1(ptr %a, ptr %b, i64 2)
ret <vscale x 8 x i1> %0
}
@@ -130,7 +130,7 @@ define <vscale x 4 x i1> @whilewr_32(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 4 x i1> @llvm.experimental.get.noalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 1)
+ %0 = call <vscale x 4 x i1> @llvm.experimental.loop.dependence.war.mask.v4i1(ptr %a, ptr %b, i64 4)
ret <vscale x 4 x i1> %0
}
@@ -158,7 +158,7 @@ define <vscale x 2 x i1> @whilewr_64(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 2 x i1> @llvm.experimental.get.noalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 1)
+ %0 = call <vscale x 2 x i1> @llvm.experimental.loop.dependence.war.mask.v2i1(ptr %a, ptr %b, i64 8)
ret <vscale x 2 x i1> %0
}
@@ -223,7 +223,7 @@ define <vscale x 16 x i1> @whilerw_8(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 16 x i1> @llvm.experimental.get.noalias.lane.mask.v16i1(ptr %a, ptr %b, i64 1, i1 0)
+ %0 = call <vscale x 16 x i1> @llvm.experimental.loop.dependence.raw.mask.v16i1(ptr %a, ptr %b, i64 1)
ret <vscale x 16 x i1> %0
}
@@ -262,7 +262,7 @@ define <vscale x 8 x i1> @whilerw_16(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 8 x i1> @llvm.experimental.get.noalias.lane.mask.v8i1(ptr %a, ptr %b, i64 2, i1 0)
+ %0 = call <vscale x 8 x i1> @llvm.experimental.loop.dependence.raw.mask.v8i1(ptr %a, ptr %b, i64 2)
ret <vscale x 8 x i1> %0
}
@@ -295,7 +295,7 @@ define <vscale x 4 x i1> @whilerw_32(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 4 x i1> @llvm.experimental.get.noalias.lane.mask.v4i1(ptr %a, ptr %b, i64 4, i1 0)
+ %0 = call <vscale x 4 x i1> @llvm.experimental.loop.dependence.raw.mask.v4i1(ptr %a, ptr %b, i64 4)
ret <vscale x 4 x i1> %0
}
@@ -324,6 +324,6 @@ define <vscale x 2 x i1> @whilerw_64(ptr %a, ptr %b) {
; CHECK-SVE-NEXT: sel p0.b, p0, p0.b, p1.b
; CHECK-SVE-NEXT: ret
entry:
- %0 = call <vscale x 2 x i1> @llvm.experimental.get.noalias.lane.mask.v2i1(ptr %a, ptr %b, i64 8, i1 0)
+ %0 = call <vscale x 2 x i1> @llvm.experimental.loop.dependence.raw.mask.v2i1(ptr %a, ptr %b, i64 8)
ret <vscale x 2 x i1> %0
}
>From c508f034a3b4c4c80d652a77aaeec235a442be52 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs <samuel.tebbs at arm.com>
Date: Mon, 10 Mar 2025 13:29:48 +0000
Subject: [PATCH 20/20] Rename in langref
---
llvm/docs/LangRef.rst | 47 +++++++++++++++++++------------------------
1 file changed, 21 insertions(+), 26 deletions(-)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 91c8bc4ea588c..9a5038b622041 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -23733,10 +23733,12 @@ Examples:
%active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %elem0, i64 429)
%wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> poison)
-.. _int_experimental_get_noalias_lane_mask:
-'``llvm.experimental.get.noalias.lane.mask.*``' Intrinsics
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+.. _int_experimental_loop_dependence_war_mask:
+.. _int_experimental_loop_dependence_raw_mask:
+
+'``llvm.experimental.loop.dependence.raw.mask.*``' and '``llvm.experimental.loop.dependence.war.mask.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
"""""""
@@ -23744,10 +23746,10 @@ This is an overloaded intrinsic.
::
- declare <4 x i1> @llvm.experimental.get.noalias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
- declare <8 x i1> @llvm.experimental.get.noalias.lane.mask.v8i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
- declare <16 x i1> @llvm.experimental.get.noalias.lane.mask.v16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
- declare <vscale x 16 x i1> @llvm.experimental.get.noalias.lane.mask.nxv16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
+ declare <4 x i1> @llvm.experimental.loop.dependence.raw.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
+ declare <8 x i1> @llvm.experimental.loop.dependence.war.mask.v8i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
+ declare <16 x i1> @llvm.experimental.loop.dependence.raw.mask.v16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
+ declare <vscale x 16 x i1> @llvm.experimental.loop.dependence.war.mask.nxv16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
Overview:
@@ -23760,8 +23762,8 @@ across one vector loop iteration.
Arguments:
""""""""""
-The first two arguments have the same scalar integer type.
-The final two are immediates and the result is a vector with the i1 element type.
+The first two arguments have the same pointer type.
+The final one is an immediate and the result is a vector with the i1 element type.
Semantics:
""""""""""
@@ -23769,8 +23771,7 @@ Semantics:
``%elementSize`` is the size of the accessed elements in bytes.
The intrinsic will return poison if ``%ptrA`` and ``%ptrB`` are within
VF * ``%elementSize`` of each other and ``%ptrA`` + VF * ``%elementSize`` wraps.
-In other cases when ``%writeAfterRead`` is true, the
-'``llvm.experimental.get.noalias.lane.mask.*``' intrinsics are semantically
+The '``llvm.experimental.loop.dependence.war.mask*``' intrinsics are semantically
equivalent to:
::
@@ -23778,9 +23779,8 @@ equivalent to:
%diff = (%ptrB - %ptrA) / %elementSize
%m[i] = (icmp ult i, %diff) || (%diff <= 0)
-When the return value is not poison and ``%writeAfterRead`` is false, the
-'``llvm.experimental.get.noalias.lane.mask.*``' intrinsics are semantically
-equivalent to:
+When the return value is not poison the '``llvm.experimental.loop.dependence.raw.mask.*``'
+intrinsics are semantically equivalent to:
::
@@ -23789,15 +23789,10 @@ equivalent to:
where ``%m`` is a vector (mask) of active/inactive lanes with its elements
indexed by ``i`` (i = 0 to VF - 1), and ``%ptrA``, ``%ptrB`` are the two ptr
-arguments to ``llvm.experimental.get.noalias.lane.mask.*`` and ``%elementSize``
-is the first immediate argument. The ``%writeAfterRead`` argument is expected
-to be true if ``%ptrB`` is stored to after ``%ptrA`` is read from, otherwise
-it is false for a read after write.
-The above is equivalent to:
-
-::
-
- %m = @llvm.experimental.get.noalias.lane.mask(%ptrA, %ptrB, %elementSize, %writeAfterRead)
+arguments to ``llvm.experimental.loop.dependence.{raw,war}.mask.*`` and ``%elementSize``
+is the first immediate argument. The ``war`` variant is expected to be used when
+``%ptrB`` is stored to after ``%ptrA`` is read from, otherwise the ``raw`` variant is
+expected to be used.
This can, for example, be emitted by the loop vectorizer in which case
``%ptrA`` is a pointer that is read from within the loop, and ``%ptrB`` is a
@@ -23815,10 +23810,10 @@ Examples:
.. code-block:: llvm
- %noalias.lane.mask = call <4 x i1> @llvm.experimental.get.noalias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4, i1 1)
- %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(ptr %ptrA, i32 4, <4 x i1> %noalias.lane.mask, <4 x i32> poison)
+ %loop.dependence.mask = call <4 x i1> @llvm.experimental.loop.dependence.war.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4)
+ %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(ptr %ptrA, i32 4, <4 x i1> %loop.dependence.mask, <4 x i32> poison)
[...]
- call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, ptr %ptrB, i32 4, <4 x i1> %noalias.lane.mask)
+ call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, ptr %ptrB, i32 4, <4 x i1> %loop.dependence.mask)
.. _int_experimental_vp_splice:
More information about the llvm-commits
mailing list