[PATCH] [x86] Implement a faster vector population count based on the PSHUFB in-register LUT technique.

Bruno Cardoso Lopes bruno.cardoso at gmail.com
Fri May 29 08:20:19 PDT 2015


Since we used very specific x86 idiom and carefully tweaked it to get the best out (we handle vXi16, vXi32 and vXi64 differently), my feeling is that we should measure what's best for ARM64 and custom lower it independently. Right now in ARM64 we do this very poorly for EltTy != i8 because of the current scalar expansion (which is pretty horrible):

  // CNT supports only B element sizes.
  if (VT != MVT::v8i8 && VT != MVT::v16i8)
    setOperationAction(ISD::CTPOP, VT.getSimpleVT(), Expand);

My patch to improve vector legalization for pop count from http://reviews.llvm.org/D10002 is certainly a win here, but won't certainly beat using ARM64's native popcnt on vXi8 and building the results for wider types on top of that!


http://reviews.llvm.org/D10084

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list