[PATCH] D145221: [X86] Prefer `vpternlog` instead of `blendv` for `vselect` on masks.

Noah Goldstein via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 15 14:45:45 PDT 2023


goldstein.w.n added inline comments.


================
Comment at: llvm/test/CodeGen/X86/vselect-pcmp.ll:37
+; AVX512F-NEXT:    vpcmpgtw %xmm2, %xmm3, %xmm2
+; AVX512F-NEXT:    vpblendvb %xmm2, %xmm0, %xmm1, %xmm0
+; AVX512F-NEXT:    retq
----------------
RKSimon wrote:
> AVX512F basically means knights landing - and even though you'd have to use the zmm variant - vpternlogq is a LOT faster than vpblendvb on KNL
> AVX512F basically means knights landing - and even though you'd have to use the zmm variant - vpternlogq is a LOT faster than vpblendvb on KNL

Could maybe see for ymm->zmm, but xmm->zmm will then req a `vzeroupper`, will require a stall for core to prepare `zmm` usage (if no zmm around), and increase license.

Also could potentially be dangerous if its SSE encoding around it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145221/new/

https://reviews.llvm.org/D145221



More information about the llvm-commits mailing list