[PATCH] [x86] Improve build_vector v8i16 codegen

Mon Jan 26 10:27:36 PST 2015

On Mon, Jan 26, 2015 at 9:38 AM, Quentin Colombet <qcolombet at apple.com>
wrote:

> Hi Bruno,
>
> I am not sure this is the right thing to do.
> Do you see any performance improvement with the new sequence?
>
> My concern here is, with the new sequence, we have a complete linear
> sequence of instructions whereas the old sequence can be partly
> parallelized. Running both the new and old sequence through IACA, I see the
> following throughputs:
> Old:
>
> - Sandy Bridge: 6.15 cycles.
> - Ivy Bridge: 6.15 cycles.
> - Haswell: 12 cycles.
>
> New:
>
> - Sandy Bridge: 13 cycles.
> - Ivy Bridge: 13 cycles.
> - Haswell: 13 cycles.
>
> This seems to concur my hypothesis.
>

FWIW, this matches my experience. I have seen pinsrw and pextrw chains have
really been astonishingly slow to execute.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150126/db300f7d/attachment.html>