[PATCH] X86: sink splat-shuffle into block doing a shift.

Mon Feb 17 12:23:15 PST 2014

> This function answers the question: are vector shifts expensive? AFAIK 8bit shifts are expensive, so I don’t understand why there is a different handling for 8bit and 16bit shifts.  Everything else LGTM.

Ah, it should be answering a slightly different question so the
documentation is clearly iffy.

What it should be answering is: are vector shifts with a fully general
RHS more expensive than ones where each element is shifted by the same
amount.

For 8-bit shifts, it doesn't matter because there are no vector 8-bit
shifts (no "psllb" at all): something horrific will be generated
whatever you do. My reasoning there is that it's probably best not to
put a shuffle into the mix as well.

For 16-bit shifts, if the RHS is a duplicated scalar we can use
"psllw" (or equivalent right-shifts), which treats the right-hand
%xmmN as a 64-bit int and has existed since SSE2. If it's not, you end
up doing nasty things (vpmovzxwd/vpsllvd/"trunc" if you've got AVX2,
far worse if not).

I'll definitely try to improve the doxygen comment (& possibly
function name) tomorrow.

Cheers.

Tim.