[PATCH] [x86] Improve build_vector v8i16 codegen

Mon Jan 26 09:38:24 PST 2015

Hi Bruno,

I am not sure this is the right thing to do.
Do you see any performance improvement with the new sequence?

My concern here is, with the new sequence, we have a complete linear sequence of instructions whereas the old sequence can be partly parallelized. Running both the new and old sequence through IACA, I see the following throughputs:
Old:

- Sandy Bridge: 6.15 cycles.
- Ivy Bridge: 6.15 cycles.
- Haswell: 12 cycles.

New:

- Sandy Bridge: 13 cycles.
- Ivy Bridge: 13 cycles.
- Haswell: 13 cycles.

This seems to concur my hypothesis.

Thanks,
-Quentin

http://reviews.llvm.org/D7177

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/