[llvm] [AArch64][SVE2] Generate urshr rounding shift rights (PR #78374)

Tue Feb 6 06:02:25 PST 2024

================
@@ -20192,6 +20248,9 @@ static SDValue performIntrinsicCombine(SDNode *N,
   case Intrinsic::aarch64_sve_uqsub_x:
     return DAG.getNode(ISD::USUBSAT, SDLoc(N), N->getValueType(0),
                        N->getOperand(1), N->getOperand(2));
+  case Intrinsic::aarch64_sve_urshr:
+    return DAG.getNode(AArch64ISD::URSHR_I_PRED, SDLoc(N), N->getValueType(0),
+                       N->getOperand(1), N->getOperand(2), N->getOperand(3));
----------------
paulwalker-arm wrote:

Can you point to an example of where a `_PRED` node expects the results of inactive lanes to take a known value? because that really shouldn't be the case (there's a comment at the top of AArch64SVEInstrInfo.td that details the naming strategy).  The intent of the `_PRED` nodes is to allow predication to be represented at the DAG level rather than waiting until instruction selection.  They have no requirement for the results of inactive lanes to free up instruction section to allow the best use of unpredicated and/or reversed instructions.

The naming is important because people will assume the documented rules implementing DAG combines or make changes to instruction selection and thus if they're not followed it's very likely to introduce bugs.  If it's important for the ISD node to model the results of the inactive lanes in accordance with the underlying SVE instruction then it should be named as such (e.g. `URSHR_I_MERGE_OP1`).

This is generally not the case and typically at the ISD level the result of inactive lanes is not important (often because an all active predicate is passed in) and thus the `_PRED` suffix is used.  When this is the case we still want to minimise the number of ISel patterns and so a PatFrags is created to match both the ISD node and the intrinsic to the same instruction (e.g. `AArch64mla_m1`).

https://github.com/llvm/llvm-project/pull/78374