[PATCH] [X86] tranform insertps to blendps when possible for better performance
Sanjay Patel
spatel at rotateright.com
Tue Feb 24 13:41:38 PST 2015
Hi mkuper, chandlerc, RKSimon,
This patch adds a target-specific combine to transform insertps nodes into blendi nodes. We just have to check to see if a translation of the immediate mask is possible.
Insertps has less potential throughput than blendps on all x86 chips that I have surveyed. For example on Haswell, we can execute blendps on 3 different ports, but insertps is limited to 1. On Sandybridge, PIledriver, and Bulldozer, it's 2 vs. 1.
Doing this transform also reduces the number of patterns we have to match when optimizing scalar SSE code.
http://reviews.llvm.org/D7866
Files:
lib/Target/X86/X86ISelLowering.cpp
lib/Target/X86/X86InstrSSE.td
test/CodeGen/X86/avx-load-store.ll
test/CodeGen/X86/sse41.ll
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D7866.20619.patch
Type: text/x-patch
Size: 8893 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150224/d3dd7a4c/attachment.bin>
More information about the llvm-commits
mailing list