[PATCH] X86: sink splat-shuffle into block doing a shift.
Tim Northover
t.p.northover at gmail.com
Mon Feb 17 12:23:15 PST 2014
> This function answers the question: are vector shifts expensive? AFAIK 8bit shifts are expensive, so I don’t understand why there is a different handling for 8bit and 16bit shifts. Everything else LGTM.
Ah, it should be answering a slightly different question so the
documentation is clearly iffy.
What it should be answering is: are vector shifts with a fully general
RHS more expensive than ones where each element is shifted by the same
amount.
For 8-bit shifts, it doesn't matter because there are no vector 8-bit
shifts (no "psllb" at all): something horrific will be generated
whatever you do. My reasoning there is that it's probably best not to
put a shuffle into the mix as well.
For 16-bit shifts, if the RHS is a duplicated scalar we can use
"psllw" (or equivalent right-shifts), which treats the right-hand
%xmmN as a 64-bit int and has existed since SSE2. If it's not, you end
up doing nasty things (vpmovzxwd/vpsllvd/"trunc" if you've got AVX2,
far worse if not).
I'll definitely try to improve the doxygen comment (& possibly
function name) tomorrow.
Cheers.
Tim.
More information about the llvm-commits
mailing list