[llvm] [AArch64] Generate rev16 for certain uses of __builtin_bswap16 (PR #105375)
via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 4 09:00:35 PDT 2024
================
@@ -22137,6 +22137,22 @@ static SDValue performExtendCombine(SDNode *N,
N->getOperand(0)->getOpcode() == ISD::SETCC)
return performSignExtendSetCCCombine(N, DCI, DAG);
+ // If we see (any_extend (bswap ...)) with bswap returning an i16, we know
+ // that the top half of the result register must be unused, due to the
+ // any_extend. This means that we can replace this pattern with (rev16
+ // (any_extend ...)). This saves a machine instruction compared to (lsr (rev
+ // ...)), which is what this pattern would otherwise be lowered to.
+ if (N->getOpcode() == ISD::ANY_EXTEND &&
+ N->getOperand(0).getOpcode() == ISD::BSWAP &&
+ N->getOperand(0).getValueType().isScalarInteger() &&
+ N->getOperand(0).getValueType().getFixedSizeInBits() == 16) {
----------------
adprasad-nvidia wrote:
OK, got it. I didn't realise that we can also get an `any_extend` from an optmised `zero_extend`.
I can, as suggested, avoid this by adding a check that the old `any_extend` output type / `rev16` input type is i32, i.e. add a check that `N.getValueType() == MVT::i32`.
But it might also be relatively simple to handle the i64 and i128 cases and still get the optimised `rev16` codegen. We could insert two `any_extend`s instead of one: one immediately before the `rev16` that extends i16 to i32, and one immediately after the `rev16` that extends i32 to whatever the value type of the old `any_extend` was (i32, i64, i128...).
Would you be happy with the second option too?
https://github.com/llvm/llvm-project/pull/105375
More information about the llvm-commits
mailing list