[llvm] [AArch64] Allow lowering of more types to GET_ACTIVE_LANE_MASK (PR #140062)

Thu May 15 08:06:02 PDT 2025

================
@@ -3248,6 +3251,22 @@ void DAGTypeLegalizer::SplitVecRes_PARTIAL_REDUCE_MLA(SDNode *N, SDValue &Lo,
   Hi = DAG.getNode(Opcode, DL, ResultVT, AccHi, Input1Hi, Input2Hi);
 }
 
+void DAGTypeLegalizer::SplitVecRes_GET_ACTIVE_LANE_MASK(SDNode *N, SDValue &Lo,
+                                                        SDValue &Hi) {
+  SDLoc DL(N);
+  SDValue Op0 = N->getOperand(0);
+  SDValue Op1 = N->getOperand(1);
+  EVT OpVT = Op0.getValueType();
+
+  EVT LoVT, HiVT;
+  std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(N->getValueType(0));
+
+  Lo = DAG.getNode(ISD::GET_ACTIVE_LANE_MASK, DL, LoVT, Op0, Op1);
+  SDValue LoElts = DAG.getElementCount(DL, OpVT, LoVT.getVectorElementCount());
+  SDValue HiStartVal = DAG.getNode(ISD::UADDSAT, DL, OpVT, Op0, LoElts);
----------------
david-arm wrote:

According to the documentation for the node:

```
  // GET_ACTIVE_LANE_MASK - this corrosponds to the llvm.get.active.lane.mask
  // intrinsic. It creates a mask representing active and inactive vector
  // lanes, active while Base + index < Trip Count. As with the intrinsic,
  // the operands Base and Trip Count have the same scalar integer type and
  // the internal addition of Base + index cannot overflow. However, the ISD
  // node supports result types which are wider than i1, where the high
  // bits conform to getBooleanContents similar to the SETCC operator.
```

I assume here that `the internal addition of Base + index cannot overflow` is a statement that we have to generate appropriate code to ensure it does not overflow because the operation requires it. As opposed to it being a guarantee that it cannot overflow? If it's the former then UADDSAT makes sense, but if it's the latter then presumably we don't need the UADDSAT?

https://github.com/llvm/llvm-project/pull/140062