[llvm] r211892 - [x86] Teach the target combine step to aggressively fold pshufd insturcions.
Chandler Carruth
chandlerc at google.com
Wed Jul 2 18:02:12 PDT 2014
Thanks for fixing!
On Wed, Jul 2, 2014 at 8:18 AM, Benjamin Kramer <benny.kra at gmail.com> wrote:
> On Wed, Jul 2, 2014 at 2:56 PM, Patrik Hägglund H
> <patrik.h.hagglund at ericsson.com> wrote:
> > Hi Chandler,
> >
> > This commit is causing a regression found by llvm-stress:
> >
> > bin/llvm-stress -size 300 -seed 17093 | bin/llc -march=x86-64
> -mcpu=corei7 -o /dev/null
> > llc: ../lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5730: void
> llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDValue, llvm::SDValue):
> Assertion `From != To.getNode() && "Cannot replace uses of with self"'
> failed.
> > 0 llc 0x0000000001129a75
> llvm::sys::PrintStackTrace(_IO_FILE*) + 37
> > 1 llc 0x0000000001129eb3
> > 2 libpthread.so.0 0x00007f211fcae7c0
> > 3 libc.so.6 0x00007f211efb2b35 gsignal + 53
> > 4 libc.so.6 0x00007f211efb4111 abort + 385
> > 5 libc.so.6 0x00007f211efab9f0 __assert_fail + 240
> > 6 llc 0x0000000000fc73af
> > 7 llc 0x0000000000fc7444
> llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode*, llvm::SDValue const*)
> + 52
> > 8 llc 0x0000000000f27ff9
> > 9 llc 0x0000000000f28203
> llvm::TargetLowering::DAGCombinerInfo::CombineTo(llvm::SDNode*,
> llvm::SDValue, bool) + 35
> > 10 llc 0x0000000000af4308
> llvm::X86TargetLowering::PerformDAGCombine(llvm::SDNode*,
> llvm::TargetLowering::DAGCombinerInfo&) const + 65080
> > 11 llc 0x0000000000f28f7e
> > 12 llc 0x0000000000f288bb
> llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AliasAnalysis&,
> llvm::CodeGenOpt::Level) + 939
> > 13 llc 0x000000000101819b
> llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 3259
> > 14 llc 0x0000000001016aa8
> llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 7096
> > 15 llc 0x0000000001014154
> llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1332
> > 16 llc 0x0000000000a6ce26
> > 17 llc 0x0000000000c6348c
> llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 124
> > 18 llc 0x0000000000e559ca
> llvm::FPPassManager::runOnFunction(llvm::Function&) + 362
> > 19 llc 0x0000000000e55c5b
> llvm::FPPassManager::runOnModule(llvm::Module&) + 43
> > 20 llc 0x0000000000e561f7
> llvm::legacy::PassManagerImpl::run(llvm::Module&) + 999
> > 21 llc 0x000000000056d071 main + 6817
> > 22 libc.so.6 0x00007f211ef9ec16 __libc_start_main + 230
> > 23 llc 0x000000000056b4e9
> > Stack dump:
> > 0. Program arguments: bin/llc -march=x86-64 -mcpu=corei7 -o
> /dev/null
> > 1. Running pass 'Function Pass Manager' on module '<stdin>'.
> > 2. Running pass 'X86 DAG->DAG Instruction Selection' on function
> '@autogen_SD17093'
> > Abort
>
> Fixed in r212181.
>
> - Ben
> >
> > /Patrik Hägglund
> > -----Original Message-----
> > From: llvm-commits-bounces at cs.uiuc.edu [mailto:
> llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Chandler Carruth
> > Sent: den 27 juni 2014 13:40
> > To: llvm-commits at cs.uiuc.edu
> > Subject: [llvm] r211892 - [x86] Teach the target combine step to
> aggressively fold pshufd insturcions.
> >
> > Author: chandlerc
> > Date: Fri Jun 27 06:40:13 2014
> > New Revision: 211892
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=211892&view=rev
> > Log:
> > [x86] Teach the target combine step to aggressively fold pshufd
> insturcions.
> >
> > Summary:
> > This allows it to fold pshufd instructions across intervening
> > half-shuffles and other noise. This pattern actually shows up in the
> > generic lowering tests, but I've also added direct tests using
> > intrinsics to make sure that the specific desired functionality is
> > working even if the lowering stuff changes in the future.
> >
> > Differential Revision: http://reviews.llvm.org/D4292
> >
> > Modified:
> > llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> > llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll
> > llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll
> >
> > Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=211892&r1=211891&r2=211892&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> > +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Fri Jun 27 06:40:13
> 2014
> > @@ -19061,6 +19061,79 @@ static SmallVector<int, 4> getPSHUFShuff
> > }
> > }
> >
> > +/// \brief Search for a combinable shuffle across a chain ending in
> pshufd.
> > +///
> > +/// We walk up the chain and look for a combinable shuffle, skipping
> over
> > +/// shuffles that we could hoist this shuffle's transformation past
> without
> > +/// altering anything.
> > +static bool combineRedundantDWordShuffle(SDValue N,
> MutableArrayRef<int> Mask,
> > + SelectionDAG &DAG,
> > +
> TargetLowering::DAGCombinerInfo &DCI) {
> > + assert(N.getOpcode() == X86ISD::PSHUFD &&
> > + "Called with something other than an x86 128-bit half
> shuffle!");
> > + SDLoc DL(N);
> > +
> > + // Walk up a single-use chain looking for a combinable shuffle.
> > + SDValue V = N.getOperand(0);
> > + for (; V.hasOneUse(); V = V.getOperand(0)) {
> > + switch (V.getOpcode()) {
> > + default:
> > + return false; // Nothing combined!
> > +
> > + case ISD::BITCAST:
> > + // Skip bitcasts as we always know the type for the target
> specific
> > + // instructions.
> > + continue;
> > +
> > + case X86ISD::PSHUFD:
> > + // Found another dword shuffle.
> > + break;
> > +
> > + case X86ISD::PSHUFLW:
> > + // Check that the low words (being shuffled) are the identity in
> the
> > + // dword shuffle, and the high words are self-contained.
> > + if (Mask[0] != 0 || Mask[1] != 1 ||
> > + !(Mask[2] >= 2 && Mask[2] < 4 && Mask[3] >= 2 && Mask[3] < 4))
> > + return false;
> > +
> > + continue;
> > +
> > + case X86ISD::PSHUFHW:
> > + // Check that the high words (being shuffled) are the identity in
> the
> > + // dword shuffle, and the low words are self-contained.
> > + if (Mask[2] != 2 || Mask[3] != 3 ||
> > + !(Mask[0] >= 0 && Mask[0] < 2 && Mask[1] >= 0 && Mask[1] < 2))
> > + return false;
> > +
> > + continue;
> > + }
> > + // Break out of the loop if we break out of the switch.
> > + break;
> > + }
> > +
> > + if (!V.hasOneUse())
> > + // We fell out of the loop without finding a viable combining
> instruction.
> > + return false;
> > +
> > + // Record the old value to use in RAUW-ing.
> > + SDValue Old = V;
> > +
> > + // Merge this node's mask and our incoming mask.
> > + SmallVector<int, 4> VMask = getPSHUFShuffleMask(V);
> > + for (int &M : Mask)
> > + M = VMask[M];
> > + V = DAG.getNode(X86ISD::PSHUFD, DL, MVT::v4i32, V.getOperand(0),
> > + getV4X86ShuffleImm8ForMask(Mask, DAG));
> > +
> > + // Replace N with its operand as we're going to combine that shuffle
> away.
> > + DAG.ReplaceAllUsesWith(N, N.getOperand(0));
> > +
> > + // Replace the combinable shuffle with the combined one, updating all
> users
> > + // so that we re-evaluate the chain here.
> > + DCI.CombineTo(Old.getNode(), V, /*AddTo*/ true);
> > + return true;
> > +}
> > +
> > /// \brief Search for a combinable shuffle across a chain ending in
> pshuflw or pshufhw.
> > ///
> > /// We walk up the chain, skipping shuffles of the other half and
> looking
> > @@ -19194,18 +19267,11 @@ static SDValue PerformTargetShuffleCombi
> > return DAG.getNode(ISD::BITCAST, DL, MVT::v8i16, V);
> > }
> >
> > - // Fallthrough
> > + break;
> > +
> > case X86ISD::PSHUFD:
> > - if (V.getOpcode() == N.getOpcode()) {
> > - // If we have two sequential shuffles of the same kind we can
> always fold
> > - // them. Even if there are multiple uses, this is beneficial
> because it
> > - // breaks a dependency.
> > - SmallVector<int, 4> VMask = getPSHUFShuffleMask(V);
> > - for (int &M : Mask)
> > - M = VMask[M];
> > - return DAG.getNode(N.getOpcode(), DL, VT, V.getOperand(0),
> > - getV4X86ShuffleImm8ForMask(Mask, DAG));
> > - }
> > + if (combineRedundantDWordShuffle(N, Mask, DAG, DCI))
> > + return SDValue(); // We combined away this shuffle.
> >
> > break;
> > }
> >
> > Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll?rev=211892&r1=211891&r2=211892&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll (original)
> > +++ llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll Fri Jun 27
> 06:40:13 2014
> > @@ -157,9 +157,8 @@ define <8 x i16> @shuffle_v8i16_26401375
> > ; CHECK-SSE2: # BB#0:
> > ; CHECK-SSE2-NEXT: pshuflw {{.*}} # xmm0 = xmm0[0,2,1,3,4,5,6,7]
> > ; CHECK-SSE2-NEXT: pshufhw {{.*}} # xmm0 = xmm0[0,1,2,3,7,5,4,6]
> > -; CHECK-SSE2-NEXT: pshufd {{.*}} # xmm0 = xmm0[0,3,2,1]
> > +; CHECK-SSE2-NEXT: pshufd {{.*}} # xmm0 = xmm0[0,3,1,2]
> > ; CHECK-SSE2-NEXT: pshuflw {{.*}} # xmm0 = xmm0[1,3,2,0,4,5,6,7]
> > -; CHECK-SSE2-NEXT: pshufd {{.*}} # xmm0 = xmm0[0,1,3,2]
> > ; CHECK-SSE2-NEXT: retq
> > %shuffle = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32
> 2, i32 6, i32 4, i32 0, i32 1, i32 3, i32 7, i32 5>
> > ret <8 x i16> %shuffle
> >
> > Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll?rev=211892&r1=211891&r2=211892&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll (original)
> > +++ llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll Fri Jun 27
> 06:40:13 2014
> > @@ -3,9 +3,69 @@
> > target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> > target triple = "x86_64-unknown-unknown"
> >
> > +declare <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
> > declare <8 x i16> @llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
> > declare <8 x i16> @llvm.x86.sse2.pshufh.w(<8 x i16>, i8)
> >
> > +define <4 x i32> @combine_pshufd1(<4 x i32> %a) {
> > +; CHECK-SSE2-LABEL: @combine_pshufd1
> > +; CHECK-SSE2: # BB#0:
> > +; CHECK-SSE2-NEXT: retq
> > + %b = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %a, i8 27)
> > + %c = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %b, i8 27)
> > + ret <4 x i32> %c
> > +}
> > +
> > +define <4 x i32> @combine_pshufd2(<4 x i32> %a) {
> > +; CHECK-SSE2-LABEL: @combine_pshufd2
> > +; CHECK-SSE2: # BB#0:
> > +; CHECK-SSE2-NEXT: retq
> > + %b = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %a, i8 27)
> > + %b.cast = bitcast <4 x i32> %b to <8 x i16>
> > + %c = call <8 x i16> @llvm.x86.sse2.pshufl.w(<8 x i16> %b.cast, i8 -28)
> > + %c.cast = bitcast <8 x i16> %c to <4 x i32>
> > + %d = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %c.cast, i8 27)
> > + ret <4 x i32> %d
> > +}
> > +
> > +define <4 x i32> @combine_pshufd3(<4 x i32> %a) {
> > +; CHECK-SSE2-LABEL: @combine_pshufd3
> > +; CHECK-SSE2: # BB#0:
> > +; CHECK-SSE2-NEXT: retq
> > + %b = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %a, i8 27)
> > + %b.cast = bitcast <4 x i32> %b to <8 x i16>
> > + %c = call <8 x i16> @llvm.x86.sse2.pshufh.w(<8 x i16> %b.cast, i8 -28)
> > + %c.cast = bitcast <8 x i16> %c to <4 x i32>
> > + %d = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %c.cast, i8 27)
> > + ret <4 x i32> %d
> > +}
> > +
> > +define <4 x i32> @combine_pshufd4(<4 x i32> %a) {
> > +; CHECK-SSE2-LABEL: @combine_pshufd4
> > +; CHECK-SSE2: # BB#0:
> > +; CHECK-SSE2-NEXT: pshufhw {{.*}} # xmm0 = xmm0[0,1,2,3,7,6,5,4]
> > +; CHECK-SSE2-NEXT: retq
> > + %b = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %a, i8 -31)
> > + %b.cast = bitcast <4 x i32> %b to <8 x i16>
> > + %c = call <8 x i16> @llvm.x86.sse2.pshufh.w(<8 x i16> %b.cast, i8 27)
> > + %c.cast = bitcast <8 x i16> %c to <4 x i32>
> > + %d = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %c.cast, i8 -31)
> > + ret <4 x i32> %d
> > +}
> > +
> > +define <4 x i32> @combine_pshufd5(<4 x i32> %a) {
> > +; CHECK-SSE2-LABEL: @combine_pshufd5
> > +; CHECK-SSE2: # BB#0:
> > +; CHECK-SSE2-NEXT: pshuflw {{.*}} # xmm0 = xmm0[3,2,1,0,4,5,6,7]
> > +; CHECK-SSE2-NEXT: retq
> > + %b = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %a, i8 -76)
> > + %b.cast = bitcast <4 x i32> %b to <8 x i16>
> > + %c = call <8 x i16> @llvm.x86.sse2.pshufl.w(<8 x i16> %b.cast, i8 27)
> > + %c.cast = bitcast <8 x i16> %c to <4 x i32>
> > + %d = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %c.cast, i8 -76)
> > + ret <4 x i32> %d
> > +}
> > +
> > define <8 x i16> @combine_pshuflw1(<8 x i16> %a) {
> > ; CHECK-SSE2-LABEL: @combine_pshuflw1
> > ; CHECK-SSE2: # BB#0:
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140702/e0d2c10a/attachment.html>
More information about the llvm-commits
mailing list