[PATCH] [X86][SSE] Keep 4i32 vector insertions in integer domain on pre-SSE4.1 targets
Simon Pilgrim
llvm-dev at redking.me.uk
Sun Dec 7 06:45:47 PST 2014
So, I've done some simple loop timing tests (doing paddd's before + after the shuffle code to ensure we're using the integer domain) on the following older cpus:
Intel Core 2 Duo 1.83 GHz (T5600) Merom
Intel Core 2 Duo 3.06 GHz (E7600) Wolfdale
Pentium M 1.60 GHz Deron
And after all that I'm seeing no discernable difference in performance between the 2 implementations - the movss version can be made faster if we don't have to generate the zero (e.g. if its already generated and this is the last use of the register) but that is it.
With that in mind I'm recommending that we do go ahead with this patch, primarily for the lower use of registers and that it matches the general rule of avoiding domain swaps - but don't expect any big improvement on old hardware!
http://reviews.llvm.org/D6526
More information about the llvm-commits
mailing list