[llvm] [Intrinsics][AArch64] Add intrinsic to mask off aliasing vector lanes (PR #117007)

Wed Nov 20 08:54:53 PST 2024

llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: Sam Tebbs (SamTebbs33)

<details>
<summary>Changes</summary>

It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration.

This PR adds an intrinsic designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored.

Along with the two pointer parameters, the intrinsic also takes an immediate that represents the size in bytes of the vector element types, as well as an immediate i1 that is true if there is a write after-read-hazard or false if there is a read-after-write hazard.

This will be used by #100579 and replaces the existing lowering for whilewr since that isn't needed now we have the intrinsic.

---

Patch is 93.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117007.diff


11 Files Affected:

- (modified) llvm/docs/LangRef.rst (+80) 
- (modified) llvm/include/llvm/CodeGen/TargetLowering.h (+5) 
- (modified) llvm/include/llvm/IR/Intrinsics.td (+5) 
- (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+44) 
- (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+59-123) 
- (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+6) 
- (modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+7-2) 
- (modified) llvm/lib/Target/AArch64/SVEInstrFormats.td (+5-5) 
- (added) llvm/test/CodeGen/AArch64/alias_mask.ll (+421) 
- (added) llvm/test/CodeGen/AArch64/alias_mask_scalable.ll (+82) 
- (removed) llvm/test/CodeGen/AArch64/whilewr.ll (-1086) 


``````````diff

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 9f4c90ba82a419..c9589d5af8ebbe 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -23475,6 +23475,86 @@ Examples:
       %active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %elem0, i64 429)
       %wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> poison)
 
+.. _int_experimental_get_alias_lane_mask:
+
+'``llvm.get.alias.lane.mask.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+      declare <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare <8 x i1> @llvm.experimental.get.alias.lane.mask.v8i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare <16 x i1> @llvm.experimental.get.alias.lane.mask.v16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+      declare <vscale x 16 x i1> @llvm.experimental.get.alias.lane.mask.nxv16i1.i64(i64 %ptrA, i64 %ptrB, i32 immarg %elementSize, i1 immarg %writeAfterRead)
+
+
+Overview:
+"""""""""
+
+Create a mask representing lanes that do or not overlap between two pointers across one vector loop iteration.
+
+
+Arguments:
+""""""""""
+
+The first two arguments have the same scalar integer type.
+The final two are immediates and the result is a vector with the i1 element type.
+
+Semantics:
+""""""""""
+
+In the case that ``%writeAfterRead`` is true, the '``llvm.experimental.get.alias.lane.mask.*``' intrinsics are semantically equivalent
+to:
+
+::
+
+      %diff = (%ptrB - %ptrA) / %elementSize
+      %m[i] = (icmp ult i, %diff) || (%diff <= 0)
+
+Otherwise they are semantically equivalent to:
+
+::
+
+      %diff = abs(%ptrB - %ptrA) / %elementSize
+      %m[i] = (icmp ult i, %diff) || (%diff == 0)
+
+where ``%m`` is a vector (mask) of active/inactive lanes with its elements
+indexed by ``i``,  and ``%ptrA``, ``%ptrB`` are the two i64 arguments to
+``llvm.experimental.get.alias.lane.mask.*``, ``%elementSize`` is the i32 argument, ``%abs`` is the absolute difference operation, ``%icmp`` is an integer compare and ``ult``
+the unsigned less-than comparison operator. The subtraction between ``%ptrA`` and ``%ptrB`` could be negative. The ``%writeAfterRead`` argument is expected to be true if the ``%ptrB`` is stored to after ``%ptrA`` is read from.
+The above is equivalent to:
+
+::
+
+      %m = @llvm.experimental.get.alias.lane.mask(%ptrA, %ptrB, %elementSize, %writeAfterRead)
+
+This can, for example, be emitted by the loop vectorizer in which case
+``%ptrA`` is a pointer that is read from within the loop, and ``%ptrB`` is a pointer that is stored to within the loop.
+If the difference between these pointers is less than the vector factor, then they overlap (alias) within a loop iteration.
+An example is if ``%ptrA`` is 20 and ``%ptrB`` is 23 with a vector factor of 8, then lanes 3, 4, 5, 6 and 7 of the vector loaded from ``%ptrA``
+share addresses with lanes 0, 1, 2, 3, 4 and 5 from the vector stored to at ``%ptrB``.
+An alias mask of these two pointers should be <1, 1, 1, 0, 0, 0, 0, 0> so that only the non-overlapping lanes are loaded and stored.
+This operation allows many loops to be vectorised when it would otherwise be unsafe to do so.
+
+To account for the fact that only a subset of lanes have been operated on in an iteration,
+the loop's induction variable should be incremented by the popcount of the mask rather than the vector factor.
+
+This mask ``%m`` can e.g. be used in masked load/store instructions.
+
+
+Examples:
+"""""""""
+
+.. code-block:: llvm
+
+      %alias.lane.mask = call <4 x i1> @llvm.experimental.get.alias.lane.mask.v4i1.i64(i64 %ptrA, i64 %ptrB, i32 4, i1 1)
+      %vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %ptrA, i32 4, <4 x i1> %alias.lane.mask, <4 x i32> poison)
+      [...]
+      call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, <4 x i32>* %ptrB, i32 4, <4 x i1> %alias.lane.mask)
 
 .. _int_experimental_vp_splice:
 
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 6a41094ff933b0..0338310fd936df 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -468,6 +468,11 @@ class TargetLoweringBase {
     return true;
   }
 
+  /// Return true if the @llvm.experimental.get.alias.lane.mask intrinsic should be expanded using generic code in SelectionDAGBuilder.
+  virtual bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const {
+    return true;
+  }
+
   virtual bool shouldExpandGetVectorLength(EVT CountVT, unsigned VF,
                                            bool IsScalable) const {
     return true;
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 1ca8c2565ab0b6..5f7073a531283e 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -2363,6 +2363,11 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<1>>
                                     llvm_i32_ty]>;
 }
 
+def int_experimental_get_alias_lane_mask:
+  DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+            [llvm_anyint_ty, LLVMMatchType<1>, llvm_anyint_ty, llvm_i1_ty],
+            [IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>]>;
+
 def int_get_active_lane_mask:
   DefaultAttrsIntrinsic<[llvm_anyvector_ty],
             [llvm_anyint_ty, LLVMMatchType<1>],
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 9d729d448502d8..39e84e06a8de60 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8284,6 +8284,50 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     visitVectorExtractLastActive(I, Intrinsic);
     return;
   }
+  case Intrinsic::experimental_get_alias_lane_mask: {
+    SDValue SourceValue = getValue(I.getOperand(0));
+    SDValue SinkValue = getValue(I.getOperand(1));
+    SDValue EltSize = getValue(I.getOperand(2));
+    bool IsWriteAfterRead = cast<ConstantSDNode>(getValue(I.getOperand(3)))->getZExtValue() != 0;
+    auto IntrinsicVT = EVT::getEVT(I.getType());
+    auto PtrVT = SourceValue->getValueType(0);
+
+    if (!TLI.shouldExpandGetAliasLaneMask(IntrinsicVT, PtrVT, cast<ConstantSDNode>(EltSize)->getSExtValue())) {
+      visitTargetIntrinsic(I, Intrinsic);
+      return;
+    }
+
+    SDValue Diff = DAG.getNode(ISD::SUB, sdl,
+                              PtrVT, SinkValue, SourceValue);
+    if (!IsWriteAfterRead)
+      Diff = DAG.getNode(ISD::ABS, sdl, PtrVT, Diff);
+
+    Diff = DAG.getNode(ISD::SDIV, sdl, PtrVT, Diff, EltSize);
+    SDValue Zero = DAG.getTargetConstant(0, sdl, PtrVT);
+
+    // If the difference is positive then some elements may alias
+    auto CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
+                                   PtrVT);
+    SDValue Cmp = DAG.getSetCC(sdl, CmpVT, Diff, Zero, IsWriteAfterRead ? ISD::SETLE : ISD::SETEQ);
+
+    // Splat the compare result then OR it with a lane mask
+    SDValue Splat = DAG.getSplat(IntrinsicVT, sdl, Cmp);
+
+    SDValue DiffMask;
+    // Don't emit an active lane mask if the target doesn't support it
+    if (TLI.shouldExpandGetActiveLaneMask(IntrinsicVT, PtrVT)) {
+        EVT VecTy = EVT::getVectorVT(*DAG.getContext(), PtrVT,
+                                     IntrinsicVT.getVectorElementCount());
+        SDValue DiffSplat = DAG.getSplat(VecTy, sdl, Diff);
+        SDValue VectorStep = DAG.getStepVector(sdl, VecTy);
+        DiffMask = DAG.getSetCC(sdl, IntrinsicVT, VectorStep,
+                                     DiffSplat, ISD::CondCode::SETULT);
+    } else {
+      DiffMask = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, sdl, IntrinsicVT, DAG.getTargetConstant(Intrinsic::get_active_lane_mask, sdl, MVT::i64), Zero, Diff);
+    }
+    SDValue Or = DAG.getNode(ISD::OR, sdl, IntrinsicVT, DiffMask, Splat);
+    setValue(&I, Or);
+  }
   }
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7ab3fc06715ec8..66eaec0d5ae6c9 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2033,6 +2033,24 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
   return false;
 }
 
+bool AArch64TargetLowering::shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const {
+  if (!Subtarget->hasSVE2())
+    return true;
+
+  if (PtrVT != MVT::i64)
+    return true;
+
+  if (VT == MVT::v2i1 || VT == MVT::nxv2i1)
+    return EltSize != 8;
+  if( VT == MVT::v4i1 || VT == MVT::nxv4i1)
+    return EltSize != 4;
+  if (VT == MVT::v8i1 || VT == MVT::nxv8i1)
+    return EltSize != 2;
+  if (VT == MVT::v16i1 || VT == MVT::nxv16i1)
+    return EltSize != 1;
+  return true;
+}
+
 bool AArch64TargetLowering::shouldExpandPartialReductionIntrinsic(
     const IntrinsicInst *I) const {
   if (I->getIntrinsicID() != Intrinsic::experimental_vector_partial_reduce_add)
@@ -2796,6 +2814,8 @@ const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
     MAKE_CASE(AArch64ISD::LS64_BUILD)
     MAKE_CASE(AArch64ISD::LS64_EXTRACT)
     MAKE_CASE(AArch64ISD::TBL)
+    MAKE_CASE(AArch64ISD::WHILEWR)
+    MAKE_CASE(AArch64ISD::WHILERW)
     MAKE_CASE(AArch64ISD::FADD_PRED)
     MAKE_CASE(AArch64ISD::FADDA_PRED)
     MAKE_CASE(AArch64ISD::FADDV_PRED)
@@ -5881,6 +5901,16 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
     EVT PtrVT = getPointerTy(DAG.getDataLayout());
     return DAG.getNode(AArch64ISD::THREAD_POINTER, dl, PtrVT);
   }
+  case Intrinsic::aarch64_sve_whilewr_b:
+  case Intrinsic::aarch64_sve_whilewr_h:
+  case Intrinsic::aarch64_sve_whilewr_s:
+  case Intrinsic::aarch64_sve_whilewr_d:
+    return DAG.getNode(AArch64ISD::WHILEWR, dl, Op.getValueType(), Op.getOperand(1), Op.getOperand(2));
+  case Intrinsic::aarch64_sve_whilerw_b:
+  case Intrinsic::aarch64_sve_whilerw_h:
+  case Intrinsic::aarch64_sve_whilerw_s:
+  case Intrinsic::aarch64_sve_whilerw_d:
+    return DAG.getNode(AArch64ISD::WHILERW, dl, Op.getValueType(), Op.getOperand(1), Op.getOperand(2));
   case Intrinsic::aarch64_neon_abs: {
     EVT Ty = Op.getValueType();
     if (Ty == MVT::i64) {
@@ -6340,16 +6370,39 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
     return DAG.getNode(AArch64ISD::USDOT, dl, Op.getValueType(),
                        Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
   }
+  case Intrinsic::experimental_get_alias_lane_mask:
   case Intrinsic::get_active_lane_mask: {
+    unsigned IntrinsicID = Intrinsic::aarch64_sve_whilelo;
+    if (IntNo == Intrinsic::experimental_get_alias_lane_mask) {
+      uint64_t EltSize = Op.getOperand(3)->getAsZExtVal();
+      bool IsWriteAfterRead = Op.getOperand(4)->getAsZExtVal() == 1;
+      switch (EltSize) {
+        case 1:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_b : Intrinsic::aarch64_sve_whilerw_b;
+          break;
+        case 2:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_h : Intrinsic::aarch64_sve_whilerw_h;
+          break;
+        case 4:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_s : Intrinsic::aarch64_sve_whilerw_s;
+          break;
+        case 8:
+          IntrinsicID = IsWriteAfterRead ? Intrinsic::aarch64_sve_whilewr_d : Intrinsic::aarch64_sve_whilerw_d;
+          break;
+        default:
+          llvm_unreachable("Unexpected element size for get.alias.lane.mask");
+          break;
+      }
+    }
     SDValue ID =
-        DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, dl, MVT::i64);
+        DAG.getTargetConstant(IntrinsicID, dl, MVT::i64);
 
     EVT VT = Op.getValueType();
     if (VT.isScalableVector())
       return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, VT, ID, Op.getOperand(1),
                          Op.getOperand(2));
 
-    // We can use the SVE whilelo instruction to lower this intrinsic by
+    // We can use the SVE whilelo/whilewr/whilerw instruction to lower this intrinsic by
     // creating the appropriate sequence of scalable vector operations and
     // then extracting a fixed-width subvector from the scalable vector.
 
@@ -14241,128 +14294,8 @@ static SDValue tryLowerToSLI(SDNode *N, SelectionDAG &DAG) {
   return ResultSLI;
 }
 
-/// Try to lower the construction of a pointer alias mask to a WHILEWR.
-/// The mask's enabled lanes represent the elements that will not overlap across
-/// one loop iteration. This tries to match:
-/// or (splat (setcc_lt (sub ptrA, ptrB), -(element_size - 1))),
-///    (get_active_lane_mask 0, (div (sub ptrA, ptrB), element_size))
-SDValue tryWhileWRFromOR(SDValue Op, SelectionDAG &DAG,
-                         const AArch64Subtarget &Subtarget) {
-  if (!Subtarget.hasSVE2())
-    return SDValue();
-  SDValue LaneMask = Op.getOperand(0);
-  SDValue Splat = Op.getOperand(1);
-
-  if (Splat.getOpcode() != ISD::SPLAT_VECTOR)
-    std::swap(LaneMask, Splat);
-
-  if (LaneMask.getOpcode() != ISD::INTRINSIC_WO_CHAIN ||
-      LaneMask.getConstantOperandVal(0) != Intrinsic::get_active_lane_mask ||
-      Splat.getOpcode() != ISD::SPLAT_VECTOR)
-    return SDValue();
-
-  SDValue Cmp = Splat.getOperand(0);
-  if (Cmp.getOpcode() != ISD::SETCC)
-    return SDValue();
-
-  CondCodeSDNode *Cond = cast<CondCodeSDNode>(Cmp.getOperand(2));
-
-  auto ComparatorConst = dyn_cast<ConstantSDNode>(Cmp.getOperand(1));
-  if (!ComparatorConst || ComparatorConst->getSExtValue() > 0 ||
-      Cond->get() != ISD::CondCode::SETLT)
-    return SDValue();
-  unsigned CompValue = std::abs(ComparatorConst->getSExtValue());
-  unsigned EltSize = CompValue + 1;
-  if (!isPowerOf2_64(EltSize) || EltSize > 8)
-    return SDValue();
-
-  SDValue Diff = Cmp.getOperand(0);
-  if (Diff.getOpcode() != ISD::SUB || Diff.getValueType() != MVT::i64)
-    return SDValue();
-
-  if (!isNullConstant(LaneMask.getOperand(1)) ||
-      (EltSize != 1 && LaneMask.getOperand(2).getOpcode() != ISD::SRA))
-    return SDValue();
-
-  // The number of elements that alias is calculated by dividing the positive
-  // difference between the pointers by the element size. An alias mask for i8
-  // elements omits the division because it would just divide by 1
-  if (EltSize > 1) {
-    SDValue DiffDiv = LaneMask.getOperand(2);
-    auto DiffDivConst = dyn_cast<ConstantSDNode>(DiffDiv.getOperand(1));
-    if (!DiffDivConst || DiffDivConst->getZExtValue() != Log2_64(EltSize))
-      return SDValue();
-    if (EltSize > 2) {
-      // When masking i32 or i64 elements, the positive value of the
-      // possibly-negative difference comes from a select of the difference if
-      // it's positive, otherwise the difference plus the element size if it's
-      // negative: pos_diff = diff < 0 ? (diff + 7) : diff
-      SDValue Select = DiffDiv.getOperand(0);
-      // Make sure the difference is being compared by the select
-      if (Select.getOpcode() != ISD::SELECT_CC || Select.getOperand(3) != Diff)
-        return SDValue();
-      // Make sure it's checking if the difference is less than 0
-      if (!isNullConstant(Select.getOperand(1)) ||
-          cast<CondCodeSDNode>(Select.getOperand(4))->get() !=
-              ISD::CondCode::SETLT)
-        return SDValue();
-      // An add creates a positive value from the negative difference
-      SDValue Add = Select.getOperand(2);
-      if (Add.getOpcode() != ISD::ADD || Add.getOperand(0) != Diff)
-        return SDValue();
-      if (auto *AddConst = dyn_cast<ConstantSDNode>(Add.getOperand(1));
-          !AddConst || AddConst->getZExtValue() != EltSize - 1)
-        return SDValue();
-    } else {
-      // When masking i16 elements, this positive value comes from adding the
-      // difference's sign bit to the difference itself. This is equivalent to
-      // the 32 bit and 64 bit case: pos_diff = diff + sign_bit (diff)
-      SDValue Add = DiffDiv.getOperand(0);
-      if (Add.getOpcode() != ISD::ADD || Add.getOperand(0) != Diff)
-        return SDValue();
-      // A logical right shift by 63 extracts the sign bit from the difference
-      SDValue Shift = Add.getOperand(1);
-      if (Shift.getOpcode() != ISD::SRL || Shift.getOperand(0) != Diff)
-        return SDValue();
-      if (auto *ShiftConst = dyn_cast<ConstantSDNode>(Shift.getOperand(1));
-          !ShiftConst || ShiftConst->getZExtValue() != 63)
-        return SDValue();
-    }
-  } else if (LaneMask.getOperand(2) != Diff)
-    return SDValue();
-
-  SDValue StorePtr = Diff.getOperand(0);
-  SDValue ReadPtr = Diff.getOperand(1);
-
-  unsigned IntrinsicID = 0;
-  switch (EltSize) {
-  case 1:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_b;
-    break;
-  case 2:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_h;
-    break;
-  case 4:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_s;
-    break;
-  case 8:
-    IntrinsicID = Intrinsic::aarch64_sve_whilewr_d;
-    break;
-  default:
-    return SDValue();
-  }
-  SDLoc DL(Op);
-  SDValue ID = DAG.getConstant(IntrinsicID, DL, MVT::i32);
-  return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, Op.getValueType(), ID,
-                     StorePtr, ReadPtr);
-}
-
 SDValue AArch64TargetLowering::LowerVectorOR(SDValue Op,
                                              SelectionDAG &DAG) const {
-  if (SDValue SV =
-          tryWhileWRFromOR(Op, DAG, DAG.getSubtarget<AArch64Subtarget>()))
-    return SV;
-
   if (useSVEForFixedLengthVectorVT(Op.getValueType(),
                                    !Subtarget->isNeonAvailable()))
     return LowerToScalableOp(Op, DAG);
@@ -19609,7 +19542,9 @@ static bool isPredicateCCSettingOp(SDValue N) {
         N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilels ||
         N.getConstantOperandVal(0) == Intrinsic::aarch64_sve_whilelt ||
         // get_active_lane_mask is lowered to a whilelo instruction.
-        N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask)))
+        N.getConstantOperandVal(0) == Intrinsic::get_active_lane_mask ||
+        // get_alias_lane_mask is lowered to a whilewr/rw instruction.
+        N.getConstantOperandVal(0) == Intrinsic::experimental_get_alias_lane_mask)))
     return true;
 
   return false;
@@ -27175,6 +27110,7 @@ void AArch64TargetLowering::ReplaceNodeResults(
       return;
     }
     case Intrinsic::experimental_vector_match:
+    case Intrinsic::experimental_get_alias_lane_mask:
     case Intrinsic::get_active_lane_mask: {
       if (!VT.isFixedLengthVector() || VT.getVectorElementType() != MVT::i1)
         return;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index cb0b9e965277aa..b2f766b22911ff 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -297,6 +297,10 @@ enum NodeType : unsigned {
   SMAXV,
   UMAXV,
 
+  // Alias lane masks
+  WHILEWR,
+  WHILERW,
+
   SADDV_PRED,
   UADDV_PRED,
   SMAXV_PRED,
@@ -980,6 +984,8 @@ class AArch64TargetLowering : public TargetLowering {
 
   bool shouldExpandGetActiveLaneMask(EVT VT, EVT OpVT) const override;
 
+  bool shouldExpandGetAliasLaneMask(EVT VT, EVT PtrVT, unsigned EltSize) const override;
+
   bool
   shouldExpandPartialReductionIntrinsic(const IntrinsicInst *I) const override;
 
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 564fb33758ad57..99b1e0618ab34b 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -140,6 +140,11 @@ def AArch64st1q_scatter : SDNode<"AArch64ISD::SST1Q_PRED", SDT_AArch64_SCATTER_V
 // AArch64 SVE/SVE2 - the remaining node definitions
 //
 
+// Alias masks
+def SDT_AArch64Mask : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<2, 1>, SDTCVecEltisVT<0,i1>]>;
+def AArch64whilewr : SDNode<"AArch64ISD::WHILEWR", SDT_AArch64Mask>;
+def AArch64whilerw : SDNode<"AArch64ISD::WHILERW", SDT_AArch64Mask>;
+
 // SVE CNT/INC/RDVL
 def sve_rdvl_imm : Co...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/117007