[PATCH] [X86] tranform insertps to blendps when possible for better performance
Sanjay Patel
spatel at rotateright.com
Thu Feb 26 15:19:36 PST 2015
I wrote a test program that executes 100 of each inst type to confirm that we do achieve double the throughput on SandyBridge using blendps:
blendps : 5406907580 cycles for 150000000 iterations (36.05 cycles/iter).
insertps: 10869956010 cycles for 150000000 iterations (72.47 cycles/iter).
I also found that on AMD Jaguar (btver2), blendps has half the latency of insertps, so it's an even bigger win there.
http://reviews.llvm.org/D7866
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list