[PATCH] Prefer blendps over insertps codegen for one special case [X86]

Sanjay Patel spatel at rotateright.com
Fri Mar 13 14:49:42 PDT 2015


Hi chandlerc, qcolombet, mkuper, ab,

I had originally made this a FIXME in D7866, but we're attacking the problem from different angles now. If we don't have a target-specific combine on insertps, we need to generate the right code in the first place.

With this patch, for this one exact case, we'll generate:
   blendps %xmm0, %xmm1, $1

instead of:
   insertps %xmm0, %xmm1, $0

If there's a memory operand available for load folding and we're optimizing for size, we'll still generate the insertps.

The detailed performance data motivation for this may be found in D7866; in summary, blendps has 2-3x throughput vs. insertps on widely used chips.

http://reviews.llvm.org/D8332

Files:
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/sse41.ll

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D8332.21953.patch
Type: text/x-patch
Size: 5080 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150313/9ff85860/attachment.bin>


More information about the llvm-commits mailing list