[PATCH] D130246: [AArch64] Use neon instructions for i64/i128 ISD::PARITY calculation

Fri Jul 22 08:14:01 PDT 2022

dmgreen accepted this revision.
dmgreen added a comment.

I've been trying to add up latencies to see which is better between then two sequences. I think you are right about i32 case - it is better to avoid the fpr register moves.

The code changes looks good to me. I was just not sure which is better between the i64 eor's and moving to float regs to use a cnt. It will depend on the cpu - but an eor is either a quick 1 cycle instruction, which is hard to beat with neon instructions, or it is a 2 cycle instruction and the cnt; addv and fmov's will have longer latencies.

I ended up having to get a simulator out to measure the differences. Whilst it is slower on some cpus, it seems to be quicker in more cases and by more of a margin. So looks OK to me.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:7790

-SDValue AArch64TargetLowering::LowerCTPOP(SDValue Op, SelectionDAG &DAG) const {
+SDValue AArch64TargetLowering::LowerCTPOP_PARITY(SDValue Op, SelectionDAG &DAG) const {
   if (DAG.getMachineFunction().getFunction().hasFnAttribute(
----------------
Formatting.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130246/new/

https://reviews.llvm.org/D130246