[PATCH] D12561: [X86][SSE] Match zero/any extension shuffles that don't start from the first element
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 18 09:01:16 PDT 2015
RKSimon marked 5 inline comments as done.
RKSimon added a comment.
Regarding the tests - the over use of PSHUFB can be a performance issue - you often have to load the shuffle mask (>5cy) and many targets (AMD + Intel) have a poor latency/throughput executing it (>3cy). In comparison PSHUFD/PSRLDQ as well as PMOVZX are pretty light instructions (1-2cy each).
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7359
@@ +7358,3 @@
+ auto SafeOffset = [&](int Idx) {
+ return OffsetLane == (Idx / NumEltsPerLane);
+ };
----------------
I've pulled the 'OffsetLane' constant out - I'd prefer to keep the lambda though as its used in quite a few places and avoids cluttering the code.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:7378
@@ -7350,1 +7377,3 @@
if (Subtarget->hasSSE41()) {
+ // Not worth offseting 128-bit vectors if scale == 2, a pattern using
+ // PUNPCK will catch this in a later shuffle match.
----------------
This is definitely a borderline case - without the earlyout most of the time we are just replacing a XOR (zero)+PUNCKH with a PSHUFD+PMOVZX. There's next to nothing in it so I went with avoiding a change.
Repository:
rL LLVM
http://reviews.llvm.org/D12561
More information about the llvm-commits
mailing list