[PATCH] Prefer blendps over insertps codegen for one special case [X86]
Sanjay Patel
spatel at rotateright.com
Fri Mar 13 14:49:42 PDT 2015
Hi chandlerc, qcolombet, mkuper, ab,
I had originally made this a FIXME in D7866, but we're attacking the problem from different angles now. If we don't have a target-specific combine on insertps, we need to generate the right code in the first place.
With this patch, for this one exact case, we'll generate:
blendps %xmm0, %xmm1, $1
instead of:
insertps %xmm0, %xmm1, $0
If there's a memory operand available for load folding and we're optimizing for size, we'll still generate the insertps.
The detailed performance data motivation for this may be found in D7866; in summary, blendps has 2-3x throughput vs. insertps on widely used chips.
http://reviews.llvm.org/D8332
Files:
lib/Target/X86/X86ISelLowering.cpp
test/CodeGen/X86/sse41.ll
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D8332.21953.patch
Type: text/x-patch
Size: 5080 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150313/9ff85860/attachment.bin>
More information about the llvm-commits
mailing list