[PATCH] [x86] Improve build_vector v8i16 codegen
chandlerc at gmail.com
Mon Jan 26 10:27:36 PST 2015
On Mon, Jan 26, 2015 at 9:38 AM, Quentin Colombet <qcolombet at apple.com>
> Hi Bruno,
> I am not sure this is the right thing to do.
> Do you see any performance improvement with the new sequence?
> My concern here is, with the new sequence, we have a complete linear
> sequence of instructions whereas the old sequence can be partly
> parallelized. Running both the new and old sequence through IACA, I see the
> following throughputs:
> - Sandy Bridge: 6.15 cycles.
> - Ivy Bridge: 6.15 cycles.
> - Haswell: 12 cycles.
> - Sandy Bridge: 13 cycles.
> - Ivy Bridge: 13 cycles.
> - Haswell: 13 cycles.
> This seems to concur my hypothesis.
FWIW, this matches my experience. I have seen pinsrw and pextrw chains have
really been astonishingly slow to execute.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits