[llvm] r213342 - X86: Constant fold converting vector setcc results to float.
Jim Grosbach
grosbach at apple.com
Wed Jul 23 13:57:17 PDT 2014
I believe this is fixed in r213799.
-Jim
> On Jul 23, 2014, at 11:14 AM, Jim Grosbach <grosbach at apple.com> wrote:
>
> Interesting. Thanks, I’ll have a look.
>
> -Jim
>
>> On Jul 23, 2014, at 7:45 AM, Patrik Hägglund H <patrik.h.hagglund at ericsson.com> wrote:
>>
>> Hi Jim,
>>
>> This commit is causing a regression as shown by llvm-stress:
>>
>>> bin/llvm-stress -size 300 -seed 30215 | bin/llc -march=x86-64 -mcpu=corei7 -o /dev/null
>> llc: ../lib/CodeGen/SelectionDAG/SelectionDAG.cpp:2905: llvm::SDValue llvm::SelectionDAG::getNode(unsigned int, llvm::SDLoc, llvm::EVT, llvm::SDValue): Assertion `VT.getSizeInBits() == Operand.getValueType().getSizeInBits() && "Cannot BITCAST between types of different sizes!"' failed.
>> 0 llc 0x00000000011e2fa5 llvm::sys::PrintStackTrace(_IO_FILE*) + 37
>> 1 llc 0x00000000011e33e3
>> 2 libpthread.so.0 0x00007f52d8be17c0
>> 3 libc.so.6 0x00007f52d7ee5b35 gsignal + 53
>> 4 libc.so.6 0x00007f52d7ee7111 abort + 385
>> 5 libc.so.6 0x00007f52d7ede9f0 __assert_fail + 240
>> 6 llc 0x000000000105a902
>> 7 llc 0x0000000000b55feb llvm::X86TargetLowering::PerformDAGCombine(llvm::SDNode*, llvm::TargetLowering::DAGCombinerInfo&) const + 11771
>> 8 llc 0x0000000000fd8fae
>> 9 llc 0x0000000000fd88eb llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AliasAnalysis&, llvm::CodeGenOpt::Level) + 939
>> 10 llc 0x00000000010ccb5e llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 910
>> 11 llc 0x00000000010cbd98 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 7096
>> 12 llc 0x00000000010c9444 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1332
>> 13 llc 0x0000000000ae0ab6
>> 14 llc 0x0000000000ce7d2c llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 124
>> 15 llc 0x0000000000ee7bca llvm::FPPassManager::runOnFunction(llvm::Function&) + 362
>> 16 llc 0x0000000000ee7e5b llvm::FPPassManager::runOnModule(llvm::Module&) + 43
>> 17 llc 0x0000000000ee83f7 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 999
>> 18 llc 0x0000000000570690 main + 6832
>> 19 libc.so.6 0x00007f52d7ed1c16 __libc_start_main + 230
>> 20 llc 0x000000000056eaf9
>> Stack dump:
>> 0. Program arguments: bin/llc -march=x86-64 -mcpu=corei7 -o /dev/null
>> 1. Running pass 'Function Pass Manager' on module '<stdin>'.
>> 2. Running pass 'X86 DAG->DAG Instruction Selection' on function '@autogen_SD30215'
>> Abort
>>
>> /Patrik Hägglund
>>
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Jim Grosbach
>> Sent: den 18 juli 2014 02:41
>> To: llvm-commits at cs.uiuc.edu
>> Subject: [llvm] r213342 - X86: Constant fold converting vector setcc results to float.
>>
>> Author: grosbach
>> Date: Thu Jul 17 19:40:56 2014
>> New Revision: 213342
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=213342&view=rev
>> Log:
>> X86: Constant fold converting vector setcc results to float.
>>
>> Since the result of a SETCC for X86 is 0 or -1 in each lane, we can
>> move unary operations, in this case [su]int_to_fp through the mask
>> operation and constant fold the operation away. Generally speaking:
>> UNARYOP(AND(VECTOR_CMP(x,y), constant))
>> --> AND(VECTOR_CMP(x,y), constant2)
>> where constant2 is UNARYOP(constant).
>>
>> This implements the transform where UNARYOP is [su]int_to_fp.
>>
>> For example, consider the simple function:
>> define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
>> %cmp = fcmp oeq <4 x float> %val, %test
>> %ext = zext <4 x i1> %cmp to <4 x i32>
>> %result = sitofp <4 x i32> %ext to <4 x float>
>> ret <4 x float> %result
>> }
>>
>> Before this change, the SSE code is generated as:
>> LCPI0_0:
>> .long 1 ## 0x1
>> .long 1 ## 0x1
>> .long 1 ## 0x1
>> .long 1 ## 0x1
>> .section __TEXT,__text,regular,pure_instructions
>> .globl _foo
>> .align 4, 0x90
>> _foo: ## @foo
>> cmpeqps %xmm1, %xmm0
>> andps LCPI0_0(%rip), %xmm0
>> cvtdq2ps %xmm0, %xmm0
>> retq
>>
>> After, the code is improved to:
>> LCPI0_0:
>> .long 1065353216 ## float 1.000000e+00
>> .long 1065353216 ## float 1.000000e+00
>> .long 1065353216 ## float 1.000000e+00
>> .long 1065353216 ## float 1.000000e+00
>> .section __TEXT,__text,regular,pure_instructions
>> .globl _foo
>> .align 4, 0x90
>> _foo: ## @foo
>> cmpeqps %xmm1, %xmm0
>> andps LCPI0_0(%rip), %xmm0
>> retq
>>
>> The cvtdq2ps has been constant folded away and the floating point 1.0f
>> vector lanes are materialized directly via the ModRM operand of andps.
>>
>> Added:
>> llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll
>> Modified:
>> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>
>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=213342&r1=213341&r2=213342&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Thu Jul 17 19:40:56 2014
>> @@ -21847,8 +21847,59 @@ static SDValue PerformBrCondCombine(SDNo
>> return SDValue();
>> }
>>
>> +static SDValue performVectorCompareAndMaskUnaryOpCombine(SDNode *N,
>> + SelectionDAG &DAG) {
>> + // Take advantage of vector comparisons producing 0 or -1 in each lane to
>> + // optimize away operation when it's from a constant.
>> + //
>> + // The general transformation is:
>> + // UNARYOP(AND(VECTOR_CMP(x,y), constant)) -->
>> + // AND(VECTOR_CMP(x,y), constant2)
>> + // constant2 = UNARYOP(constant)
>> +
>> + // Early exit if this isn't a vector operation or if the operand of the
>> + // unary operation isn't a bitwise AND.
>> + EVT VT = N->getValueType(0);
>> + if (!VT.isVector() || N->getOperand(0)->getOpcode() != ISD::AND ||
>> + N->getOperand(0)->getOperand(0)->getOpcode() != ISD::SETCC)
>> + return SDValue();
>> +
>> + // Now check that the other operand of the AND is a constant splat. We could
>> + // make the transformation for non-constant splats as well, but it's unclear
>> + // that would be a benefit as it would not eliminate any operations, just
>> + // perform one more step in scalar code before moving to the vector unit.
>> + if (BuildVectorSDNode *BV =
>> + dyn_cast<BuildVectorSDNode>(N->getOperand(0)->getOperand(1))) {
>> + // Bail out if the vector isn't a constant splat.
>> + if (!BV->getConstantSplatNode())
>> + return SDValue();
>> +
>> + // Everything checks out. Build up the new and improved node.
>> + SDLoc DL(N);
>> + EVT IntVT = BV->getValueType(0);
>> + // Create a new constant of the appropriate type for the transformed
>> + // DAG.
>> + SDValue SourceConst = DAG.getNode(N->getOpcode(), DL, VT, SDValue(BV, 0));
>> + // The AND node needs bitcasts to/from an integer vector type around it.
>> + SDValue MaskConst = DAG.getNode(ISD::BITCAST, DL, IntVT, SourceConst);
>> + SDValue NewAnd = DAG.getNode(ISD::AND, DL, IntVT,
>> + N->getOperand(0)->getOperand(0), MaskConst);
>> + SDValue Res = DAG.getNode(ISD::BITCAST, DL, VT, NewAnd);
>> + return Res;
>> + }
>> +
>> + return SDValue();
>> +}
>> +
>> static SDValue PerformSINT_TO_FPCombine(SDNode *N, SelectionDAG &DAG,
>> const X86TargetLowering *XTLI) {
>> + // First try to optimize away the conversion entirely when it's
>> + // conditionally from a constant. Vectors only.
>> + SDValue Res = performVectorCompareAndMaskUnaryOpCombine(N, DAG);
>> + if (Res != SDValue())
>> + return Res;
>> +
>> + // Now move on to more general possibilities.
>> SDValue Op0 = N->getOperand(0);
>> EVT InVT = Op0->getValueType(0);
>>
>>
>> Added: llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll?rev=213342&view=auto
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll (added)
>> +++ llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll Thu Jul 17 19:40:56 2014
>> @@ -0,0 +1,18 @@
>> +; RUN: llc < %s -mtriple=x86_64-apple-darwin | FileCheck %s
>> +
>> +define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
>> +; CHECK-LABEL: LCPI0_0
>> +; CHECK-NEXT: .long 1065353216 ## float 1.000000e+00
>> +; CHECK-NEXT: .long 1065353216 ## float 1.000000e+00
>> +; CHECK-NEXT: .long 1065353216 ## float 1.000000e+00
>> +; CHECK-NEXT: .long 1065353216 ## float 1.000000e+00
>> +; CHECK-LABEL: foo:
>> +; CHECK: cmpeqps %xmm1, %xmm0
>> +; CHECK-NEXT: andps LCPI0_0(%rip), %xmm0
>> +; CHECK-NEXT: retq
>> +
>> + %cmp = fcmp oeq <4 x float> %val, %test
>> + %ext = zext <4 x i1> %cmp to <4 x i32>
>> + %result = sitofp <4 x i32> %ext to <4 x float>
>> + ret <4 x float> %result
>> +}
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list