[llvm] r211892 - [x86] Teach the target combine step to aggressively fold pshufd insturcions.
Patrik Hägglund H
patrik.h.hagglund at ericsson.com
Wed Jul 2 23:03:22 PDT 2014
Hi Benjamin,
Thanks for the fix!
-----Original Message-----
From: Benjamin Kramer [mailto:benny.kra at gmail.com]
Sent: den 2 juli 2014 17:19
To: Patrik Hägglund H
Cc: Chandler Carruth; llvm-commits at cs.uiuc.edu
Subject: Re: [llvm] r211892 - [x86] Teach the target combine step to aggressively fold pshufd insturcions.
On Wed, Jul 2, 2014 at 2:56 PM, Patrik Hägglund H
<patrik.h.hagglund at ericsson.com> wrote:
> Hi Chandler,
>
> This commit is causing a regression found by llvm-stress:
>
> bin/llvm-stress -size 300 -seed 17093 | bin/llc -march=x86-64 -mcpu=corei7 -o /dev/null
> llc: ../lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5730: void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDValue, llvm::SDValue): Assertion `From != To.getNode() && "Cannot replace uses of with self"' failed.
> 0 llc 0x0000000001129a75 llvm::sys::PrintStackTrace(_IO_FILE*) + 37
> 1 llc 0x0000000001129eb3
> 2 libpthread.so.0 0x00007f211fcae7c0
> 3 libc.so.6 0x00007f211efb2b35 gsignal + 53
> 4 libc.so.6 0x00007f211efb4111 abort + 385
> 5 libc.so.6 0x00007f211efab9f0 __assert_fail + 240
> 6 llc 0x0000000000fc73af
> 7 llc 0x0000000000fc7444 llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode*, llvm::SDValue const*) + 52
> 8 llc 0x0000000000f27ff9
> 9 llc 0x0000000000f28203 llvm::TargetLowering::DAGCombinerInfo::CombineTo(llvm::SDNode*, llvm::SDValue, bool) + 35
> 10 llc 0x0000000000af4308 llvm::X86TargetLowering::PerformDAGCombine(llvm::SDNode*, llvm::TargetLowering::DAGCombinerInfo&) const + 65080
> 11 llc 0x0000000000f28f7e
> 12 llc 0x0000000000f288bb llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AliasAnalysis&, llvm::CodeGenOpt::Level) + 939
> 13 llc 0x000000000101819b llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 3259
> 14 llc 0x0000000001016aa8 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 7096
> 15 llc 0x0000000001014154 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1332
> 16 llc 0x0000000000a6ce26
> 17 llc 0x0000000000c6348c llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 124
> 18 llc 0x0000000000e559ca llvm::FPPassManager::runOnFunction(llvm::Function&) + 362
> 19 llc 0x0000000000e55c5b llvm::FPPassManager::runOnModule(llvm::Module&) + 43
> 20 llc 0x0000000000e561f7 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 999
> 21 llc 0x000000000056d071 main + 6817
> 22 libc.so.6 0x00007f211ef9ec16 __libc_start_main + 230
> 23 llc 0x000000000056b4e9
> Stack dump:
> 0. Program arguments: bin/llc -march=x86-64 -mcpu=corei7 -o /dev/null
> 1. Running pass 'Function Pass Manager' on module '<stdin>'.
> 2. Running pass 'X86 DAG->DAG Instruction Selection' on function '@autogen_SD17093'
> Abort
Fixed in r212181.
- Ben
>
> /Patrik Hägglund
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Chandler Carruth
> Sent: den 27 juni 2014 13:40
> To: llvm-commits at cs.uiuc.edu
> Subject: [llvm] r211892 - [x86] Teach the target combine step to aggressively fold pshufd insturcions.
>
> Author: chandlerc
> Date: Fri Jun 27 06:40:13 2014
> New Revision: 211892
>
> URL: http://llvm.org/viewvc/llvm-project?rev=211892&view=rev
> Log:
> [x86] Teach the target combine step to aggressively fold pshufd insturcions.
>
> Summary:
> This allows it to fold pshufd instructions across intervening
> half-shuffles and other noise. This pattern actually shows up in the
> generic lowering tests, but I've also added direct tests using
> intrinsics to make sure that the specific desired functionality is
> working even if the lowering stuff changes in the future.
>
> Differential Revision: http://reviews.llvm.org/D4292
>
> Modified:
> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll
> llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=211892&r1=211891&r2=211892&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Fri Jun 27 06:40:13 2014
> @@ -19061,6 +19061,79 @@ static SmallVector<int, 4> getPSHUFShuff
> }
> }
>
> +/// \brief Search for a combinable shuffle across a chain ending in pshufd.
> +///
> +/// We walk up the chain and look for a combinable shuffle, skipping over
> +/// shuffles that we could hoist this shuffle's transformation past without
> +/// altering anything.
> +static bool combineRedundantDWordShuffle(SDValue N, MutableArrayRef<int> Mask,
> + SelectionDAG &DAG,
> + TargetLowering::DAGCombinerInfo &DCI) {
> + assert(N.getOpcode() == X86ISD::PSHUFD &&
> + "Called with something other than an x86 128-bit half shuffle!");
> + SDLoc DL(N);
> +
> + // Walk up a single-use chain looking for a combinable shuffle.
> + SDValue V = N.getOperand(0);
> + for (; V.hasOneUse(); V = V.getOperand(0)) {
> + switch (V.getOpcode()) {
> + default:
> + return false; // Nothing combined!
> +
> + case ISD::BITCAST:
> + // Skip bitcasts as we always know the type for the target specific
> + // instructions.
> + continue;
> +
> + case X86ISD::PSHUFD:
> + // Found another dword shuffle.
> + break;
> +
> + case X86ISD::PSHUFLW:
> + // Check that the low words (being shuffled) are the identity in the
> + // dword shuffle, and the high words are self-contained.
> + if (Mask[0] != 0 || Mask[1] != 1 ||
> + !(Mask[2] >= 2 && Mask[2] < 4 && Mask[3] >= 2 && Mask[3] < 4))
> + return false;
> +
> + continue;
> +
> + case X86ISD::PSHUFHW:
> + // Check that the high words (being shuffled) are the identity in the
> + // dword shuffle, and the low words are self-contained.
> + if (Mask[2] != 2 || Mask[3] != 3 ||
> + !(Mask[0] >= 0 && Mask[0] < 2 && Mask[1] >= 0 && Mask[1] < 2))
> + return false;
> +
> + continue;
> + }
> + // Break out of the loop if we break out of the switch.
> + break;
> + }
> +
> + if (!V.hasOneUse())
> + // We fell out of the loop without finding a viable combining instruction.
> + return false;
> +
> + // Record the old value to use in RAUW-ing.
> + SDValue Old = V;
> +
> + // Merge this node's mask and our incoming mask.
> + SmallVector<int, 4> VMask = getPSHUFShuffleMask(V);
> + for (int &M : Mask)
> + M = VMask[M];
> + V = DAG.getNode(X86ISD::PSHUFD, DL, MVT::v4i32, V.getOperand(0),
> + getV4X86ShuffleImm8ForMask(Mask, DAG));
> +
> + // Replace N with its operand as we're going to combine that shuffle away.
> + DAG.ReplaceAllUsesWith(N, N.getOperand(0));
> +
> + // Replace the combinable shuffle with the combined one, updating all users
> + // so that we re-evaluate the chain here.
> + DCI.CombineTo(Old.getNode(), V, /*AddTo*/ true);
> + return true;
> +}
> +
> /// \brief Search for a combinable shuffle across a chain ending in pshuflw or pshufhw.
> ///
> /// We walk up the chain, skipping shuffles of the other half and looking
> @@ -19194,18 +19267,11 @@ static SDValue PerformTargetShuffleCombi
> return DAG.getNode(ISD::BITCAST, DL, MVT::v8i16, V);
> }
>
> - // Fallthrough
> + break;
> +
> case X86ISD::PSHUFD:
> - if (V.getOpcode() == N.getOpcode()) {
> - // If we have two sequential shuffles of the same kind we can always fold
> - // them. Even if there are multiple uses, this is beneficial because it
> - // breaks a dependency.
> - SmallVector<int, 4> VMask = getPSHUFShuffleMask(V);
> - for (int &M : Mask)
> - M = VMask[M];
> - return DAG.getNode(N.getOpcode(), DL, VT, V.getOperand(0),
> - getV4X86ShuffleImm8ForMask(Mask, DAG));
> - }
> + if (combineRedundantDWordShuffle(N, Mask, DAG, DCI))
> + return SDValue(); // We combined away this shuffle.
>
> break;
> }
>
> Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll?rev=211892&r1=211891&r2=211892&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll Fri Jun 27 06:40:13 2014
> @@ -157,9 +157,8 @@ define <8 x i16> @shuffle_v8i16_26401375
> ; CHECK-SSE2: # BB#0:
> ; CHECK-SSE2-NEXT: pshuflw {{.*}} # xmm0 = xmm0[0,2,1,3,4,5,6,7]
> ; CHECK-SSE2-NEXT: pshufhw {{.*}} # xmm0 = xmm0[0,1,2,3,7,5,4,6]
> -; CHECK-SSE2-NEXT: pshufd {{.*}} # xmm0 = xmm0[0,3,2,1]
> +; CHECK-SSE2-NEXT: pshufd {{.*}} # xmm0 = xmm0[0,3,1,2]
> ; CHECK-SSE2-NEXT: pshuflw {{.*}} # xmm0 = xmm0[1,3,2,0,4,5,6,7]
> -; CHECK-SSE2-NEXT: pshufd {{.*}} # xmm0 = xmm0[0,1,3,2]
> ; CHECK-SSE2-NEXT: retq
> %shuffle = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 2, i32 6, i32 4, i32 0, i32 1, i32 3, i32 7, i32 5>
> ret <8 x i16> %shuffle
>
> Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll?rev=211892&r1=211891&r2=211892&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll Fri Jun 27 06:40:13 2014
> @@ -3,9 +3,69 @@
> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> target triple = "x86_64-unknown-unknown"
>
> +declare <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
> declare <8 x i16> @llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
> declare <8 x i16> @llvm.x86.sse2.pshufh.w(<8 x i16>, i8)
>
> +define <4 x i32> @combine_pshufd1(<4 x i32> %a) {
> +; CHECK-SSE2-LABEL: @combine_pshufd1
> +; CHECK-SSE2: # BB#0:
> +; CHECK-SSE2-NEXT: retq
> + %b = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %a, i8 27)
> + %c = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %b, i8 27)
> + ret <4 x i32> %c
> +}
> +
> +define <4 x i32> @combine_pshufd2(<4 x i32> %a) {
> +; CHECK-SSE2-LABEL: @combine_pshufd2
> +; CHECK-SSE2: # BB#0:
> +; CHECK-SSE2-NEXT: retq
> + %b = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %a, i8 27)
> + %b.cast = bitcast <4 x i32> %b to <8 x i16>
> + %c = call <8 x i16> @llvm.x86.sse2.pshufl.w(<8 x i16> %b.cast, i8 -28)
> + %c.cast = bitcast <8 x i16> %c to <4 x i32>
> + %d = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %c.cast, i8 27)
> + ret <4 x i32> %d
> +}
> +
> +define <4 x i32> @combine_pshufd3(<4 x i32> %a) {
> +; CHECK-SSE2-LABEL: @combine_pshufd3
> +; CHECK-SSE2: # BB#0:
> +; CHECK-SSE2-NEXT: retq
> + %b = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %a, i8 27)
> + %b.cast = bitcast <4 x i32> %b to <8 x i16>
> + %c = call <8 x i16> @llvm.x86.sse2.pshufh.w(<8 x i16> %b.cast, i8 -28)
> + %c.cast = bitcast <8 x i16> %c to <4 x i32>
> + %d = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %c.cast, i8 27)
> + ret <4 x i32> %d
> +}
> +
> +define <4 x i32> @combine_pshufd4(<4 x i32> %a) {
> +; CHECK-SSE2-LABEL: @combine_pshufd4
> +; CHECK-SSE2: # BB#0:
> +; CHECK-SSE2-NEXT: pshufhw {{.*}} # xmm0 = xmm0[0,1,2,3,7,6,5,4]
> +; CHECK-SSE2-NEXT: retq
> + %b = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %a, i8 -31)
> + %b.cast = bitcast <4 x i32> %b to <8 x i16>
> + %c = call <8 x i16> @llvm.x86.sse2.pshufh.w(<8 x i16> %b.cast, i8 27)
> + %c.cast = bitcast <8 x i16> %c to <4 x i32>
> + %d = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %c.cast, i8 -31)
> + ret <4 x i32> %d
> +}
> +
> +define <4 x i32> @combine_pshufd5(<4 x i32> %a) {
> +; CHECK-SSE2-LABEL: @combine_pshufd5
> +; CHECK-SSE2: # BB#0:
> +; CHECK-SSE2-NEXT: pshuflw {{.*}} # xmm0 = xmm0[3,2,1,0,4,5,6,7]
> +; CHECK-SSE2-NEXT: retq
> + %b = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %a, i8 -76)
> + %b.cast = bitcast <4 x i32> %b to <8 x i16>
> + %c = call <8 x i16> @llvm.x86.sse2.pshufl.w(<8 x i16> %b.cast, i8 27)
> + %c.cast = bitcast <8 x i16> %c to <4 x i32>
> + %d = call <4 x i32> @llvm.x86.sse2.pshuf.d(<4 x i32> %c.cast, i8 -76)
> + ret <4 x i32> %d
> +}
> +
> define <8 x i16> @combine_pshuflw1(<8 x i16> %a) {
> ; CHECK-SSE2-LABEL: @combine_pshuflw1
> ; CHECK-SSE2: # BB#0:
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list