[PATCH] [x86] Improve build_vector v8i16 codegen
Quentin Colombet
qcolombet at apple.com
Mon Jan 26 09:38:24 PST 2015
Hi Bruno,
I am not sure this is the right thing to do.
Do you see any performance improvement with the new sequence?
My concern here is, with the new sequence, we have a complete linear sequence of instructions whereas the old sequence can be partly parallelized. Running both the new and old sequence through IACA, I see the following throughputs:
Old:
- Sandy Bridge: 6.15 cycles.
- Ivy Bridge: 6.15 cycles.
- Haswell: 12 cycles.
New:
- Sandy Bridge: 13 cycles.
- Ivy Bridge: 13 cycles.
- Haswell: 13 cycles.
This seems to concur my hypothesis.
Thanks,
-Quentin
http://reviews.llvm.org/D7177
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list