[PATCH] [X86] tranform insertps to blendps when possible for better performance

Tue Feb 24 13:41:38 PST 2015

Hi mkuper, chandlerc, RKSimon,

This patch adds a target-specific combine to transform insertps nodes into blendi nodes. We just have to check to see if a translation of the immediate mask is possible. 

Insertps has less potential throughput than blendps on all x86 chips that I have surveyed. For example on Haswell, we can execute blendps on 3 different ports, but insertps is limited to 1. On Sandybridge, PIledriver, and Bulldozer, it's 2 vs. 1.

Doing this transform also reduces the number of patterns we have to match when optimizing scalar SSE code.

http://reviews.llvm.org/D7866

Files:
  lib/Target/X86/X86ISelLowering.cpp
  lib/Target/X86/X86InstrSSE.td
  test/CodeGen/X86/avx-load-store.ll
  test/CodeGen/X86/sse41.ll

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D7866.20619.patch
Type: text/x-patch
Size: 8893 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150224/d3dd7a4c/attachment.bin>