[llvm] [AArch64][SVE2] Generate urshr rounding shift rights (PR #78374)

Tue Jan 30 11:19:26 PST 2024

================
@@ -20895,12 +20983,24 @@ static SDValue performUzpCombine(SDNode *N, SelectionDAG &DAG,
     }
   }
 
+  if (SDValue Urshr = tryCombineExtendRShTrunc(N, DAG))
+    return Urshr;
+
   if (SDValue Rshrnb = trySimplifySrlAddToRshrnb(Op0, DAG, Subtarget))
     return DAG.getNode(AArch64ISD::UZP1, DL, ResVT, Rshrnb, Op1);
 
   if (SDValue Rshrnb = trySimplifySrlAddToRshrnb(Op1, DAG, Subtarget))
     return DAG.getNode(AArch64ISD::UZP1, DL, ResVT, Op0, Rshrnb);
 
+  // uzp1(bitcast(x), bitcast(y)) -> uzp1(x, y)
+  if (isHalvingTruncateAndConcatOfLegalIntScalableType(N) &&
+      Op0.getOpcode() == ISD::BITCAST && Op1.getOpcode() == ISD::BITCAST) {
----------------
davemgreen wrote:

It is not obvious to me why it is valid to remove the BITCAST in all cases. Is it because the instruction is entirely defined by the output type, and so the input types do not matter? We can just remove the bitcasts, and doing so leads to simpler code?

Under big endian a BITCAST will actually swap the order of certain lanes (they are defined in terms of storing in one type and reloading in another, so are lowered to a REV). BE isn't supported for SVE yet for some reason, but we should limit this to LE.

https://github.com/llvm/llvm-project/pull/78374