[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Robert Lougher
rob.lougher at gmail.com
Fri Sep 5 09:32:13 PDT 2014
Hi Chandler,
I've done some informal benchmarking on an AMD Jaguar core (amd16h)
with and without the experimental flag. The tests were a mixture of
FP and Integer tests. I didn't see any significant performance
regression, with most of the differances being in the noise (less than
1%). One test, however, did show a performance improvement of ~4%.
Unfortunately, another team, while doing internal testing has seen the
new path generating illegal insertps masks. A sample here:
vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = xmm4[0,1],xmm1[2],xmm4[3]
vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 =
xmm6[0,1],xmm13[2],xmm6[3]
vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 = xmm7[0,1],xmm0[2],xmm7[3]
We'll continue to look into this and do additional testing.
Thanks,
Rob.
--
Robert Lougher
SN Systems - Sony Computer Entertainment Group
More information about the llvm-dev
mailing list