[PATCH] [X86] tranform insertps to blendps when possible for better performance

Sanjay Patel spatel at rotateright.com
Thu Feb 26 15:19:36 PST 2015


I wrote a test program that executes 100 of each inst type to confirm that we do achieve double the throughput on SandyBridge using blendps:
blendps : 5406907580 cycles for 150000000 iterations (36.05 cycles/iter).
insertps: 10869956010 cycles for 150000000 iterations (72.47 cycles/iter).

I also found that on AMD Jaguar (btver2), blendps has half the latency of insertps, so it's an even bigger win there.


http://reviews.llvm.org/D7866

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list