[PATCH] D111221: [AArch64][SVE] Improve code generation for VLS i1 masks
Paul Walker via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 17 07:32:05 PST 2021
paulwalker-arm added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15838-15840
+ SDValue Sext =
+ DAG.getNode(ISD::SIGN_EXTEND, SDLoc(N), N->getValueType(0), Unpk);
+ return Sext;
----------------
This can be just `return DAG.getNode(ISD::SIGN_EXTEND...`
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16526
+ // (sign_extend (punpklo (setcc_merge_zero ... pred ...))), 0, ne
+ // => extract_subvector (inner setcc_merge_zero)
+ SDValue Pred = N->getOperand(0);
----------------
To be consistent, either the above `punpklo` needs to be `extract_subvector` or this need should be `punpklo`.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16536
+
+ SDValue Extract = LHS->getOperand(0);
+ if (Extract->getOpcode() != ISD::EXTRACT_SUBVECTOR ||
----------------
If you go with `punpklo` above then can you please add something like `// punpklo == extract_subvector` here.
================
Comment at: llvm/test/CodeGen/AArch64/sve-punpklo-combine.ll:19-20
+ %cmp1 = call <vscale x 8 x i1> @llvm.aarch64.sve.cmpne.nxv8i16(<vscale x 8 x i1> %p1, <vscale x 8 x i16> %ext1, <vscale x 8 x i16> zeroinitializer)
+ %a = call <vscale x 8 x i8> @llvm.aarch64.sve.ld1.nxv8i8(<vscale x 8 x i1> %cmp1, i8* %ap)
+ %ext = sext <vscale x 8 x i8> %a to <vscale x 8 x i16>
+ ret <vscale x 8 x i16> %ext
----------------
I think you can simplify all the tests in this file by removing the extending load part and returning the comparison result directly. I say this because the new DAG combines don't care about them and so they shouldn't be necessary to exercise them.
================
Comment at: llvm/test/CodeGen/AArch64/sve-punpklo-combine.ll:242
+
+define <vscale x 2 x i64> @masked_load_sext_i8i64_ptrue_all(i8* %ap, <vscale x 16 x i8> %b) #0 {
+; CHECK-LABEL: masked_load_sext_i8i64_ptrue_all:
----------------
>From the function name alone it's not obvious this is a negative test. Please add a small comment that highlights this is a negative test and why the optimisation is not applicable.
I've put the comment here but it's a general one applicable to the other functions.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D111221/new/
https://reviews.llvm.org/D111221
More information about the llvm-commits
mailing list