[PATCH] D50074: [X86][AVX2] Prefer VPBLENDW+VPBLENDW+VPBLENDD to VPBLENDVB for v16i16 blend shuffles
Peter Cordes via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 27 09:05:50 PDT 2018
pcordes added a comment.
In https://reviews.llvm.org/D50074#1214328, @RKSimon wrote:
> TBH I reckon this could go in as it is and we improve VSELECT combines later on.
Sounds reasonable as long as we aren't pessimizing Skylake by turning `vpblendvb` into 3 uops (including 2 for port 5) instead of 2 for any port, inside a loop.
`AVX2-FAST-LABEL: PR24935:` seems to be doing that still.
Especially in manually-vectorized code, I think it would be bad to compile `_mm256_blendv_epi8` with a constant into 2x `vpblendw` + `vpblendd`. Could easily cause a performance regression in some code.
Can we add a check that only at most 2 immediate blends will be needed, as a conservative option to get the improvements in place for the cases where it is a win?
Repository:
rL LLVM
https://reviews.llvm.org/D50074
More information about the llvm-commits
mailing list