[llvm] r213342 - X86: Constant fold converting vector setcc results to float.
Jim Grosbach
grosbach at apple.com
Wed Jul 23 11:14:14 PDT 2014
Interesting. Thanks, I’ll have a look.
-Jim
> On Jul 23, 2014, at 7:45 AM, Patrik Hägglund H <patrik.h.hagglund at ericsson.com> wrote:
>
> Hi Jim,
>
> This commit is causing a regression as shown by llvm-stress:
>
>> bin/llvm-stress -size 300 -seed 30215 | bin/llc -march=x86-64 -mcpu=corei7 -o /dev/null
> llc: ../lib/CodeGen/SelectionDAG/SelectionDAG.cpp:2905: llvm::SDValue llvm::SelectionDAG::getNode(unsigned int, llvm::SDLoc, llvm::EVT, llvm::SDValue): Assertion `VT.getSizeInBits() == Operand.getValueType().getSizeInBits() && "Cannot BITCAST between types of different sizes!"' failed.
> 0 llc 0x00000000011e2fa5 llvm::sys::PrintStackTrace(_IO_FILE*) + 37
> 1 llc 0x00000000011e33e3
> 2 libpthread.so.0 0x00007f52d8be17c0
> 3 libc.so.6 0x00007f52d7ee5b35 gsignal + 53
> 4 libc.so.6 0x00007f52d7ee7111 abort + 385
> 5 libc.so.6 0x00007f52d7ede9f0 __assert_fail + 240
> 6 llc 0x000000000105a902
> 7 llc 0x0000000000b55feb llvm::X86TargetLowering::PerformDAGCombine(llvm::SDNode*, llvm::TargetLowering::DAGCombinerInfo&) const + 11771
> 8 llc 0x0000000000fd8fae
> 9 llc 0x0000000000fd88eb llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AliasAnalysis&, llvm::CodeGenOpt::Level) + 939
> 10 llc 0x00000000010ccb5e llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 910
> 11 llc 0x00000000010cbd98 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 7096
> 12 llc 0x00000000010c9444 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1332
> 13 llc 0x0000000000ae0ab6
> 14 llc 0x0000000000ce7d2c llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 124
> 15 llc 0x0000000000ee7bca llvm::FPPassManager::runOnFunction(llvm::Function&) + 362
> 16 llc 0x0000000000ee7e5b llvm::FPPassManager::runOnModule(llvm::Module&) + 43
> 17 llc 0x0000000000ee83f7 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 999
> 18 llc 0x0000000000570690 main + 6832
> 19 libc.so.6 0x00007f52d7ed1c16 __libc_start_main + 230
> 20 llc 0x000000000056eaf9
> Stack dump:
> 0. Program arguments: bin/llc -march=x86-64 -mcpu=corei7 -o /dev/null
> 1. Running pass 'Function Pass Manager' on module '<stdin>'.
> 2. Running pass 'X86 DAG->DAG Instruction Selection' on function '@autogen_SD30215'
> Abort
>
> /Patrik Hägglund
>
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Jim Grosbach
> Sent: den 18 juli 2014 02:41
> To: llvm-commits at cs.uiuc.edu
> Subject: [llvm] r213342 - X86: Constant fold converting vector setcc results to float.
>
> Author: grosbach
> Date: Thu Jul 17 19:40:56 2014
> New Revision: 213342
>
> URL: http://llvm.org/viewvc/llvm-project?rev=213342&view=rev
> Log:
> X86: Constant fold converting vector setcc results to float.
>
> Since the result of a SETCC for X86 is 0 or -1 in each lane, we can
> move unary operations, in this case [su]int_to_fp through the mask
> operation and constant fold the operation away. Generally speaking:
> UNARYOP(AND(VECTOR_CMP(x,y), constant))
> --> AND(VECTOR_CMP(x,y), constant2)
> where constant2 is UNARYOP(constant).
>
> This implements the transform where UNARYOP is [su]int_to_fp.
>
> For example, consider the simple function:
> define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
> %cmp = fcmp oeq <4 x float> %val, %test
> %ext = zext <4 x i1> %cmp to <4 x i32>
> %result = sitofp <4 x i32> %ext to <4 x float>
> ret <4 x float> %result
> }
>
> Before this change, the SSE code is generated as:
> LCPI0_0:
> .long 1 ## 0x1
> .long 1 ## 0x1
> .long 1 ## 0x1
> .long 1 ## 0x1
> .section __TEXT,__text,regular,pure_instructions
> .globl _foo
> .align 4, 0x90
> _foo: ## @foo
> cmpeqps %xmm1, %xmm0
> andps LCPI0_0(%rip), %xmm0
> cvtdq2ps %xmm0, %xmm0
> retq
>
> After, the code is improved to:
> LCPI0_0:
> .long 1065353216 ## float 1.000000e+00
> .long 1065353216 ## float 1.000000e+00
> .long 1065353216 ## float 1.000000e+00
> .long 1065353216 ## float 1.000000e+00
> .section __TEXT,__text,regular,pure_instructions
> .globl _foo
> .align 4, 0x90
> _foo: ## @foo
> cmpeqps %xmm1, %xmm0
> andps LCPI0_0(%rip), %xmm0
> retq
>
> The cvtdq2ps has been constant folded away and the floating point 1.0f
> vector lanes are materialized directly via the ModRM operand of andps.
>
> Added:
> llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll
> Modified:
> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=213342&r1=213341&r2=213342&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Thu Jul 17 19:40:56 2014
> @@ -21847,8 +21847,59 @@ static SDValue PerformBrCondCombine(SDNo
> return SDValue();
> }
>
> +static SDValue performVectorCompareAndMaskUnaryOpCombine(SDNode *N,
> + SelectionDAG &DAG) {
> + // Take advantage of vector comparisons producing 0 or -1 in each lane to
> + // optimize away operation when it's from a constant.
> + //
> + // The general transformation is:
> + // UNARYOP(AND(VECTOR_CMP(x,y), constant)) -->
> + // AND(VECTOR_CMP(x,y), constant2)
> + // constant2 = UNARYOP(constant)
> +
> + // Early exit if this isn't a vector operation or if the operand of the
> + // unary operation isn't a bitwise AND.
> + EVT VT = N->getValueType(0);
> + if (!VT.isVector() || N->getOperand(0)->getOpcode() != ISD::AND ||
> + N->getOperand(0)->getOperand(0)->getOpcode() != ISD::SETCC)
> + return SDValue();
> +
> + // Now check that the other operand of the AND is a constant splat. We could
> + // make the transformation for non-constant splats as well, but it's unclear
> + // that would be a benefit as it would not eliminate any operations, just
> + // perform one more step in scalar code before moving to the vector unit.
> + if (BuildVectorSDNode *BV =
> + dyn_cast<BuildVectorSDNode>(N->getOperand(0)->getOperand(1))) {
> + // Bail out if the vector isn't a constant splat.
> + if (!BV->getConstantSplatNode())
> + return SDValue();
> +
> + // Everything checks out. Build up the new and improved node.
> + SDLoc DL(N);
> + EVT IntVT = BV->getValueType(0);
> + // Create a new constant of the appropriate type for the transformed
> + // DAG.
> + SDValue SourceConst = DAG.getNode(N->getOpcode(), DL, VT, SDValue(BV, 0));
> + // The AND node needs bitcasts to/from an integer vector type around it.
> + SDValue MaskConst = DAG.getNode(ISD::BITCAST, DL, IntVT, SourceConst);
> + SDValue NewAnd = DAG.getNode(ISD::AND, DL, IntVT,
> + N->getOperand(0)->getOperand(0), MaskConst);
> + SDValue Res = DAG.getNode(ISD::BITCAST, DL, VT, NewAnd);
> + return Res;
> + }
> +
> + return SDValue();
> +}
> +
> static SDValue PerformSINT_TO_FPCombine(SDNode *N, SelectionDAG &DAG,
> const X86TargetLowering *XTLI) {
> + // First try to optimize away the conversion entirely when it's
> + // conditionally from a constant. Vectors only.
> + SDValue Res = performVectorCompareAndMaskUnaryOpCombine(N, DAG);
> + if (Res != SDValue())
> + return Res;
> +
> + // Now move on to more general possibilities.
> SDValue Op0 = N->getOperand(0);
> EVT InVT = Op0->getValueType(0);
>
>
> Added: llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll?rev=213342&view=auto
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll (added)
> +++ llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll Thu Jul 17 19:40:56 2014
> @@ -0,0 +1,18 @@
> +; RUN: llc < %s -mtriple=x86_64-apple-darwin | FileCheck %s
> +
> +define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
> +; CHECK-LABEL: LCPI0_0
> +; CHECK-NEXT: .long 1065353216 ## float 1.000000e+00
> +; CHECK-NEXT: .long 1065353216 ## float 1.000000e+00
> +; CHECK-NEXT: .long 1065353216 ## float 1.000000e+00
> +; CHECK-NEXT: .long 1065353216 ## float 1.000000e+00
> +; CHECK-LABEL: foo:
> +; CHECK: cmpeqps %xmm1, %xmm0
> +; CHECK-NEXT: andps LCPI0_0(%rip), %xmm0
> +; CHECK-NEXT: retq
> +
> + %cmp = fcmp oeq <4 x float> %val, %test
> + %ext = zext <4 x i1> %cmp to <4 x i32>
> + %result = sitofp <4 x i32> %ext to <4 x float>
> + ret <4 x float> %result
> +}
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list