[PATCH] D21148: [X86][SSE] Added support for combining target shuffles to (V)PSHUFD/VPERMILPD/VPERMILPS immediate permute

Fri Jun 24 13:50:47 PDT 2016

ab added a comment.

In http://reviews.llvm.org/D21148#465731, @RKSimon wrote:

> Updated to prefer binary shuffle (unpck mainly) over permutes - although this will prevent some folding its shouldn't affect register pressure.

The permutes are indeed better because they either enable folding, or have similar performance to the binary shuffles (at least on the recent Intel CPUs I look at).  If there's no other reason, revert to the previous patch?  That one LGTM.
Very sorry I wasn't clear; was just curious!

> We don't handle i32 unpcks as pshufd typically has similar performance. The other changes you see from unary shuffle (e.g. movddup to pshufd) are typically because the target shuffle combine is a bit more ruthless at looking through bitcasts, hopefully reducing domain stalls.

Sounds good, thanks!

================
Comment at: test/CodeGen/X86/vector-shuffle-128-v2.ll:2
@@ +1,3 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 | FileCheck %s --check-prefix=ALL --check-prefix=SSE --check-prefix=SSE2
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+sse3 | FileCheck %s --check-prefix=ALL --check-prefix=SSE --check-prefix=SSE3
----------------
Extra NOTE

Repository:
  rL LLVM

http://reviews.llvm.org/D21148