[llvm] [AArch64] Generate rev16 for certain uses of __builtin_bswap16 (PR #105375)

Wed Sep 4 07:49:28 PDT 2024

================
@@ -22137,6 +22137,22 @@ static SDValue performExtendCombine(SDNode *N,
       N->getOperand(0)->getOpcode() == ISD::SETCC)
     return performSignExtendSetCCCombine(N, DCI, DAG);
 
+  // If we see (any_extend (bswap ...)) with bswap returning an i16, we know
+  // that the top half of the result register must be unused, due to the
+  // any_extend. This means that we can replace this pattern with (rev16
+  // (any_extend ...)). This saves a machine instruction compared to (lsr (rev
+  // ...)), which is what this pattern would otherwise be lowered to.
+  if (N->getOpcode() == ISD::ANY_EXTEND &&
+      N->getOperand(0).getOpcode() == ISD::BSWAP &&
+      N->getOperand(0).getValueType().isScalarInteger() &&
+      N->getOperand(0).getValueType().getFixedSizeInBits() == 16) {
----------------
adprasad-nvidia wrote:

I'm not sure I understand. `bswap`'s output type is always its input type, so `bswap->getOperand(0).getValueType()` is always i16, not i32. I just verified this by adding a check that `N->getOperand(0)->getOperand(0).getValueType() == MVT::i32`, and it made the test that checks `rev16` is generated fail i.e. we generate the old `rev` and `lsr` instead of `rev16`, because the check will evaluate to false.
We guarantee we are only dealing with REV16 for i32 because we insert an `any_extend` before the REV16, and that `any_extend` is guaranteed to be extending to i32.

https://github.com/llvm/llvm-project/pull/105375