[PATCH] D145221: [X86] Prefer `vpternlog` instead of `blendv` for `vselect` on masks.
Noah Goldstein via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 15 14:45:45 PDT 2023
goldstein.w.n added inline comments.
================
Comment at: llvm/test/CodeGen/X86/vselect-pcmp.ll:37
+; AVX512F-NEXT: vpcmpgtw %xmm2, %xmm3, %xmm2
+; AVX512F-NEXT: vpblendvb %xmm2, %xmm0, %xmm1, %xmm0
+; AVX512F-NEXT: retq
----------------
RKSimon wrote:
> AVX512F basically means knights landing - and even though you'd have to use the zmm variant - vpternlogq is a LOT faster than vpblendvb on KNL
> AVX512F basically means knights landing - and even though you'd have to use the zmm variant - vpternlogq is a LOT faster than vpblendvb on KNL
Could maybe see for ymm->zmm, but xmm->zmm will then req a `vzeroupper`, will require a stall for core to prepare `zmm` usage (if no zmm around), and increase license.
Also could potentially be dangerous if its SSE encoding around it.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D145221/new/
https://reviews.llvm.org/D145221
More information about the llvm-commits
mailing list