[PATCH] [x86] Improve build_vector v8i16 codegen
qcolombet at apple.com
Mon Jan 26 09:38:24 PST 2015
I am not sure this is the right thing to do.
Do you see any performance improvement with the new sequence?
My concern here is, with the new sequence, we have a complete linear sequence of instructions whereas the old sequence can be partly parallelized. Running both the new and old sequence through IACA, I see the following throughputs:
- Sandy Bridge: 6.15 cycles.
- Ivy Bridge: 6.15 cycles.
- Haswell: 12 cycles.
- Sandy Bridge: 13 cycles.
- Ivy Bridge: 13 cycles.
- Haswell: 13 cycles.
This seems to concur my hypothesis.
More information about the llvm-commits