[PATCH] D91093: [X86 and PPC] Tune SelectionDAG CTPOP emulation via TLI hook

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 12 05:21:24 PST 2020


spatel added reviewers: nemanjai, pengfei.
spatel added a comment.

Adding more potential reviewers.
This is 2 independent patches in 1 proposal, so you may want to split it to avoid waiting on approval for both targets.
I made some general x86 comments, but someone that knows AVX512 better should have a look at those test diffs in particular.



================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:5347
+      (VT.is128BitVector() && Subtarget.hasSSE2())) {
+    return EltVT == MVT::i8 ? 4 : 6;
+  }
----------------
We should have a code comment here to describe approximately what the difference in x86 asm will be with these settings. As-is { 2, 4, 6, 10, 14 } is a set of magic numbers. Need to explain why a particular data type (i8, etc) is treated differently than other types.


================
Comment at: llvm/test/CodeGen/X86/vector-popcnt-128-ult-ugt.ll:473-484
+; AVX1-NEXT:    vpcmpeqd %xmm1, %xmm1, %xmm1
+; AVX1-NEXT:    vpaddb %xmm1, %xmm0, %xmm2
+; AVX1-NEXT:    vpand %xmm2, %xmm0, %xmm0
+; AVX1-NEXT:    vpaddb %xmm1, %xmm0, %xmm2
+; AVX1-NEXT:    vpand %xmm2, %xmm0, %xmm0
+; AVX1-NEXT:    vpaddb %xmm1, %xmm0, %xmm2
+; AVX1-NEXT:    vpand %xmm2, %xmm0, %xmm0
----------------
Are we saying that the longer sequence is better because it avoids the constant loads and/or potentially expensive vpshufb's?
Is there experimental perf data or llvm-mca to back that up? 
If not, we might opt for a more conservative setting as a first step, and then allow this case in a follow-up patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91093/new/

https://reviews.llvm.org/D91093



More information about the llvm-commits mailing list