[llvm] r233689 - [SystemZ] Use POPCNT instruction on z196
Jay Foad
jay.foad at gmail.com
Tue Mar 31 07:12:48 PDT 2015
On 31 March 2015 at 13:56, Ulrich Weigand <ulrich.weigand at de.ibm.com> wrote:
>
> Author: uweigand
> Date: Tue Mar 31 07:56:33 2015
> New Revision: 233689
>
> URL: http://llvm.org/viewvc/llvm-project?rev=233689&view=rev
> Log:
> [SystemZ] Use POPCNT instruction on z196
>
> We already exploit a number of instructions specific to z196,
> but not yet POPCNT. Add support for the population-count
> facility, MC support for the POPCNT instruction, CodeGen
> support for using POPCNT, and implement the getPopcntSupport
> TargetTransformInfo hook.
>
>
> Added:
> llvm/trunk/test/CodeGen/SystemZ/ctpop-01.ll
> Modified:
> llvm/trunk/lib/Target/SystemZ/SystemZISelLowering.cpp
> llvm/trunk/lib/Target/SystemZ/SystemZISelLowering.h
> llvm/trunk/lib/Target/SystemZ/SystemZInstrInfo.td
> llvm/trunk/lib/Target/SystemZ/SystemZOperators.td
> llvm/trunk/lib/Target/SystemZ/SystemZProcessors.td
> llvm/trunk/lib/Target/SystemZ/SystemZSubtarget.cpp
> llvm/trunk/lib/Target/SystemZ/SystemZSubtarget.h
> llvm/trunk/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
> llvm/trunk/lib/Target/SystemZ/SystemZTargetTransformInfo.h
> llvm/trunk/test/MC/Disassembler/SystemZ/insns.txt
> llvm/trunk/test/MC/SystemZ/insn-bad.s
> llvm/trunk/test/MC/SystemZ/insn-good-z196.s
>
> Modified: llvm/trunk/lib/Target/SystemZ/SystemZISelLowering.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/SystemZ/SystemZISelLowering.cpp?rev=233689&r1=233688&r2=233689&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/SystemZ/SystemZISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/SystemZ/SystemZISelLowering.cpp Tue Mar 31 07:56:33 2015
> @@ -163,8 +163,13 @@ SystemZTargetLowering::SystemZTargetLowe
> // available, or if the operand is constant.
> setOperationAction(ISD::ATOMIC_LOAD_SUB, VT, Custom);
>
> + // Use POPCNT on z196 and above.
> + if (Subtarget.hasPopulationCount())
> + setOperationAction(ISD::CTPOP, VT, Custom);
> + else
> + setOperationAction(ISD::CTPOP, VT, Expand);
> +
> // No special instructions for these.
> - setOperationAction(ISD::CTPOP, VT, Expand);
> setOperationAction(ISD::CTTZ, VT, Expand);
> setOperationAction(ISD::CTTZ_ZERO_UNDEF, VT, Expand);
> setOperationAction(ISD::CTLZ_ZERO_UNDEF, VT, Expand);
> @@ -2304,6 +2309,45 @@ SDValue SystemZTargetLowering::lowerOR(S
> MVT::i64, HighOp, Low32);
> }
>
> +SDValue SystemZTargetLowering::lowerCTPOP(SDValue Op,
> + SelectionDAG &DAG) const {
> + EVT VT = Op.getValueType();
> + int64_t OrigBitSize = VT.getSizeInBits();
> + SDLoc DL(Op);
> +
> + // Get the known-zero mask for the operand.
> + Op = Op.getOperand(0);
> + APInt KnownZero, KnownOne;
> + DAG.computeKnownBits(Op, KnownZero, KnownOne);
> + uint64_t Mask = ~KnownZero.getZExtValue();
> +
> + // Skip known-zero high parts of the operand.
> + int64_t BitSize = OrigBitSize;
> + while ((Mask & ((((uint64_t)1 << (BitSize / 2)) - 1) << (BitSize / 2))) == 0)
> + BitSize = BitSize / 2;
This will loop forever if all bits are known to be zero, won't it?
To avoid looping, how about:
unsigned NumSignificantBits = (~KnownZero).getActiveBits();
unsigned BitSize = 1U << Log2_32_Ceil(NumSignificantBits);
(But you still need to defend against all bits being zero.)
> +
> + // The POPCNT instruction counts the number of bits in each byte.
> + Op = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i64, Op);
> + Op = DAG.getNode(SystemZISD::POPCNT, DL, MVT::i64, Op);
> + Op = DAG.getNode(ISD::TRUNCATE, DL, VT, Op);
> +
> + // Add up per-byte counts in a binary tree. All bits of Op at
> + // position larger than BitSize remain zero throughout.
> + for (int64_t I = BitSize / 2; I >= 8; I = I / 2) {
> + SDValue Tmp = DAG.getNode(ISD::SHL, DL, VT, Op, DAG.getConstant(I, VT));
> + if (BitSize != OrigBitSize)
> + Tmp = DAG.getNode(ISD::AND, DL, VT, Tmp,
> + DAG.getConstant(((uint64_t)1 << BitSize) - 1, VT));
> + Op = DAG.getNode(ISD::ADD, DL, VT, Op, Tmp);
> + }
> +
> + // Extract overall result from high byte.
> + if (BitSize > 8)
> + Op = DAG.getNode(ISD::SRL, DL, VT, Op, DAG.getConstant(BitSize - 8, VT));
For a 64-bit value where the high 32 bits are known to be zero you'll generate:
Op = POPCNT(Op);
Tmp = Op << 16;
Tmp &= 0xFFFFFFFF;
Op += Tmp;
Tmp = Op << 8;
Tmp &= 0xFFFFFFFF;
Op += Tmp;
Op >>= 24;
Instead of doing an AND at every loop iteration, how about generating:
Op = POPCNT(Op);
Tmp = Op >> 16;
Op += Tmp;
Tmp = Op >> 8;
Op += Tmp;
Op &= 0xFF;
I.e. SRL instead of SHL inside the loop, and AND instead of SRL to
extract the overall result.
Jay.
More information about the llvm-commits
mailing list