[llvm] r213342 - X86: Constant fold converting vector setcc results to float.

Wed Jul 23 13:57:17 PDT 2014

I believe this is fixed in r213799.

-Jim
> On Jul 23, 2014, at 11:14 AM, Jim Grosbach <grosbach at apple.com> wrote:
> 
> Interesting. Thanks, I’ll have a look.
> 
> -Jim
> 
>> On Jul 23, 2014, at 7:45 AM, Patrik Hägglund H <patrik.h.hagglund at ericsson.com> wrote:
>> 
>> Hi Jim,
>> 
>> This commit is causing a regression as shown by llvm-stress:
>> 
>>> bin/llvm-stress -size 300 -seed 30215 | bin/llc -march=x86-64 -mcpu=corei7 -o /dev/null
>> llc: ../lib/CodeGen/SelectionDAG/SelectionDAG.cpp:2905: llvm::SDValue llvm::SelectionDAG::getNode(unsigned int, llvm::SDLoc, llvm::EVT, llvm::SDValue): Assertion `VT.getSizeInBits() == Operand.getValueType().getSizeInBits() && "Cannot BITCAST between types of different sizes!"' failed.
>> 0  llc             0x00000000011e2fa5 llvm::sys::PrintStackTrace(_IO_FILE*) + 37
>> 1  llc             0x00000000011e33e3
>> 2  libpthread.so.0 0x00007f52d8be17c0
>> 3  libc.so.6       0x00007f52d7ee5b35 gsignal + 53
>> 4  libc.so.6       0x00007f52d7ee7111 abort + 385
>> 5  libc.so.6       0x00007f52d7ede9f0 __assert_fail + 240
>> 6  llc             0x000000000105a902
>> 7  llc             0x0000000000b55feb llvm::X86TargetLowering::PerformDAGCombine(llvm::SDNode*, llvm::TargetLowering::DAGCombinerInfo&) const + 11771
>> 8  llc             0x0000000000fd8fae
>> 9  llc             0x0000000000fd88eb llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AliasAnalysis&, llvm::CodeGenOpt::Level) + 939
>> 10 llc             0x00000000010ccb5e llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 910
>> 11 llc             0x00000000010cbd98 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 7096
>> 12 llc             0x00000000010c9444 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1332
>> 13 llc             0x0000000000ae0ab6
>> 14 llc             0x0000000000ce7d2c llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 124
>> 15 llc             0x0000000000ee7bca llvm::FPPassManager::runOnFunction(llvm::Function&) + 362
>> 16 llc             0x0000000000ee7e5b llvm::FPPassManager::runOnModule(llvm::Module&) + 43
>> 17 llc             0x0000000000ee83f7 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 999
>> 18 llc             0x0000000000570690 main + 6832
>> 19 libc.so.6       0x00007f52d7ed1c16 __libc_start_main + 230
>> 20 llc             0x000000000056eaf9
>> Stack dump:
>> 0.      Program arguments: bin/llc -march=x86-64 -mcpu=corei7 -o /dev/null
>> 1.      Running pass 'Function Pass Manager' on module '<stdin>'.
>> 2.      Running pass 'X86 DAG->DAG Instruction Selection' on function '@autogen_SD30215'
>> Abort
>> 
>> /Patrik Hägglund
>> 
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Jim Grosbach
>> Sent: den 18 juli 2014 02:41
>> To: llvm-commits at cs.uiuc.edu
>> Subject: [llvm] r213342 - X86: Constant fold converting vector setcc results to float.
>> 
>> Author: grosbach
>> Date: Thu Jul 17 19:40:56 2014
>> New Revision: 213342
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=213342&view=rev
>> Log:
>> X86: Constant fold converting vector setcc results to float.
>> 
>> Since the result of a SETCC for X86 is 0 or -1 in each lane, we can
>> move unary operations, in this case [su]int_to_fp through the mask
>> operation and constant fold the operation away. Generally speaking:
>> UNARYOP(AND(VECTOR_CMP(x,y), constant))
>>     --> AND(VECTOR_CMP(x,y), constant2)
>> where constant2 is UNARYOP(constant).
>> 
>> This implements the transform where UNARYOP is [su]int_to_fp.
>> 
>> For example, consider the simple function:
>> define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
>> %cmp = fcmp oeq <4 x float> %val, %test
>> %ext = zext <4 x i1> %cmp to <4 x i32>
>> %result = sitofp <4 x i32> %ext to <4 x float>
>> ret <4 x float> %result
>> }
>> 
>> Before this change, the SSE code is generated as:
>> LCPI0_0:
>> .long 1                       ## 0x1
>> .long 1                       ## 0x1
>> .long 1                       ## 0x1
>> .long 1                       ## 0x1
>> .section  __TEXT,__text,regular,pure_instructions
>> .globl  _foo
>> .align  4, 0x90
>> _foo:                                   ## @foo
>> cmpeqps %xmm1, %xmm0
>> andps LCPI0_0(%rip), %xmm0
>> cvtdq2ps  %xmm0, %xmm0
>> retq
>> 
>> After, the code is improved to:
>> LCPI0_0:
>> .long 1065353216              ## float 1.000000e+00
>> .long 1065353216              ## float 1.000000e+00
>> .long 1065353216              ## float 1.000000e+00
>> .long 1065353216              ## float 1.000000e+00
>> .section  __TEXT,__text,regular,pure_instructions
>> .globl  _foo
>> .align  4, 0x90
>> _foo:                                   ## @foo
>> cmpeqps %xmm1, %xmm0
>> andps LCPI0_0(%rip), %xmm0
>> retq
>> 
>> The cvtdq2ps has been constant folded away and the floating point 1.0f
>> vector lanes are materialized directly via the ModRM operand of andps.
>> 
>> Added:
>>   llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll
>> Modified:
>>   llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> 
>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=213342&r1=213341&r2=213342&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Thu Jul 17 19:40:56 2014
>> @@ -21847,8 +21847,59 @@ static SDValue PerformBrCondCombine(SDNo
>>  return SDValue();
>> }
>> 
>> +static SDValue performVectorCompareAndMaskUnaryOpCombine(SDNode *N,
>> +                                                         SelectionDAG &DAG) {
>> +  // Take advantage of vector comparisons producing 0 or -1 in each lane to
>> +  // optimize away operation when it's from a constant.
>> +  //
>> +  // The general transformation is:
>> +  //    UNARYOP(AND(VECTOR_CMP(x,y), constant)) -->
>> +  //       AND(VECTOR_CMP(x,y), constant2)
>> +  //    constant2 = UNARYOP(constant)
>> +
>> +  // Early exit if this isn't a vector operation or if the operand of the
>> +  // unary operation isn't a bitwise AND.
>> +  EVT VT = N->getValueType(0);
>> +  if (!VT.isVector() || N->getOperand(0)->getOpcode() != ISD::AND ||
>> +      N->getOperand(0)->getOperand(0)->getOpcode() != ISD::SETCC)
>> +    return SDValue();
>> +
>> +  // Now check that the other operand of the AND is a constant splat. We could
>> +  // make the transformation for non-constant splats as well, but it's unclear
>> +  // that would be a benefit as it would not eliminate any operations, just
>> +  // perform one more step in scalar code before moving to the vector unit.
>> +  if (BuildVectorSDNode *BV =
>> +          dyn_cast<BuildVectorSDNode>(N->getOperand(0)->getOperand(1))) {
>> +    // Bail out if the vector isn't a constant splat.
>> +    if (!BV->getConstantSplatNode())
>> +      return SDValue();
>> +
>> +    // Everything checks out. Build up the new and improved node.
>> +    SDLoc DL(N);
>> +    EVT IntVT = BV->getValueType(0);
>> +    // Create a new constant of the appropriate type for the transformed
>> +    // DAG.
>> +    SDValue SourceConst = DAG.getNode(N->getOpcode(), DL, VT, SDValue(BV, 0));
>> +    // The AND node needs bitcasts to/from an integer vector type around it.
>> +    SDValue MaskConst = DAG.getNode(ISD::BITCAST, DL, IntVT, SourceConst);
>> +    SDValue NewAnd = DAG.getNode(ISD::AND, DL, IntVT,
>> +                                 N->getOperand(0)->getOperand(0), MaskConst);
>> +    SDValue Res = DAG.getNode(ISD::BITCAST, DL, VT, NewAnd);
>> +    return Res;
>> +  }
>> +
>> +  return SDValue();
>> +}
>> +
>> static SDValue PerformSINT_TO_FPCombine(SDNode *N, SelectionDAG &DAG,
>>                                        const X86TargetLowering *XTLI) {
>> +  // First try to optimize away the conversion entirely when it's
>> +  // conditionally from a constant. Vectors only.
>> +  SDValue Res = performVectorCompareAndMaskUnaryOpCombine(N, DAG);
>> +  if (Res != SDValue())
>> +    return Res;
>> +
>> +  // Now move on to more general possibilities.
>>  SDValue Op0 = N->getOperand(0);
>>  EVT InVT = Op0->getValueType(0);
>> 
>> 
>> Added: llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll?rev=213342&view=auto
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll (added)
>> +++ llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll Thu Jul 17 19:40:56 2014
>> @@ -0,0 +1,18 @@
>> +; RUN: llc < %s -mtriple=x86_64-apple-darwin | FileCheck %s
>> +
>> +define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
>> +; CHECK-LABEL: LCPI0_0
>> +; CHECK-NEXT: .long 1065353216              ## float 1.000000e+00
>> +; CHECK-NEXT: .long 1065353216              ## float 1.000000e+00
>> +; CHECK-NEXT: .long 1065353216              ## float 1.000000e+00
>> +; CHECK-NEXT: .long 1065353216              ## float 1.000000e+00
>> +; CHECK-LABEL: foo:
>> +; CHECK: cmpeqps %xmm1, %xmm0
>> +; CHECK-NEXT: andps LCPI0_0(%rip), %xmm0
>> +; CHECK-NEXT: retq
>> +
>> +  %cmp = fcmp oeq <4 x float> %val, %test
>> +  %ext = zext <4 x i1> %cmp to <4 x i32>
>> +  %result = sitofp <4 x i32> %ext to <4 x float>
>> +  ret <4 x float> %result
>> +}
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits