[PATCH] D130246: [AArch64] Use neon instructions for i64/i128 ISD::PARITY calculation
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 22 08:14:01 PDT 2022
dmgreen accepted this revision.
dmgreen added a comment.
I've been trying to add up latencies to see which is better between then two sequences. I think you are right about i32 case - it is better to avoid the fpr register moves.
The code changes looks good to me. I was just not sure which is better between the i64 eor's and moving to float regs to use a cnt. It will depend on the cpu - but an eor is either a quick 1 cycle instruction, which is hard to beat with neon instructions, or it is a 2 cycle instruction and the cnt; addv and fmov's will have longer latencies.
I ended up having to get a simulator out to measure the differences. Whilst it is slower on some cpus, it seems to be quicker in more cases and by more of a margin. So looks OK to me.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:7790
-SDValue AArch64TargetLowering::LowerCTPOP(SDValue Op, SelectionDAG &DAG) const {
+SDValue AArch64TargetLowering::LowerCTPOP_PARITY(SDValue Op, SelectionDAG &DAG) const {
if (DAG.getMachineFunction().getFunction().hasFnAttribute(
----------------
Formatting.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D130246/new/
https://reviews.llvm.org/D130246
More information about the llvm-commits
mailing list