[PATCH] D50074: [X86][AVX2] Prefer VPBLENDW+VPBLENDW+VPBLENDD to VPBLENDVB for v16i16 blend shuffles

Mon Aug 27 09:05:50 PDT 2018

pcordes added a comment.

In https://reviews.llvm.org/D50074#1214328, @RKSimon wrote:

> TBH I reckon this could go in as it is and we improve VSELECT combines later on.

Sounds reasonable as long as we aren't pessimizing Skylake by turning `vpblendvb` into 3 uops (including 2 for port 5) instead of 2 for any port, inside a loop.

`AVX2-FAST-LABEL: PR24935:` seems to be doing that still.

Especially in manually-vectorized code, I think it would be bad to compile `_mm256_blendv_epi8` with a constant into 2x `vpblendw` + `vpblendd`.  Could easily cause a performance regression in some code.

Can we add a check that only at most 2 immediate blends will be needed, as a conservative option to get the improvements in place for the cases where it is a win?

Repository:
  rL LLVM

https://reviews.llvm.org/D50074