[llvm] r207338 - DAGCombiner: Simplify code a bit, make more transforms work with vectors.

Fri Jul 11 05:19:01 PDT 2014

Thanks.

This is r212808.

-Quentin
> On Jul 11, 2014, at 3:38 AM, Benjamin Kramer <benny.kra at gmail.com> wrote:
> 
> On Fri, Jul 11, 2014 at 12:05 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>> Hi Benjamin,
>> 
>> Attached is the rebased patch.
>> 
>> I’ve kept your if/else construction for the setting of the mask, but I can
>> apply the array of array approach of my initial proposal if you prefer.
> 
> I like the if/else approach more. The code has enough obscurity
> already, a small bit of duplication doesn't hurt.
> 
>> 
>> Note that like the original patch, this patch also corrects the low <-> high
>> inversion.
>> This result in a diff on idiv test case for SSE variant that I believe is
>> correct.
>> Indeed, there is no reason that we see the fix up of pmuludq in that test,
>> since we are just reading the low bits!
>> 
>> Is it OK to commit?
> 
> LGTM. Thanks a lot for fixing this.
> 
> - Ben
>> 
>> Thanks,
>> -Quentin
>> 
>> 
>> On Jul 10, 2014, at 10:59 AM, Benjamin Kramer <benny.kra at gmail.com> wrote:
>> 
>> 
>> On 10.07.2014, at 19:46, Quentin Colombet <qcolombet at apple.com> wrote:
>> 
>> Hi Benjamin,
>> 
>> I have narrowed down a failure of the factor test case (from the llvm
>> test-suite: SingleSource/UnitTests/SignlessTypes) to this commit.
>> 
>> The failure happens at Os on a haswell machine running macosx.
>> 
>> I have reduced the problem to the attached test.ll and I proposed to fix
>> that with the attached patch.
>> Basically the masks for the shuffle are wrong for vectors with 8 elements.
>> 
>> ** To Reproduce **
>> 
>> Basically this test takes 8 numbers: 8, 9, …, 15 and divide each of them by
>> 7 and print the result.
>> 
>> On a haswell machine
>> llc -mattr=-avx2 test.ll -o ok.s
>> llc -mattr=+avx2 test.ll -o ko.s
>> 
>> clang ok.s -arch x86_64 -o ok.out
>> clang ko.s -arch x86_64 -march core-avx2 -o ko.out
>> 
>> ./ok.out
>> 1
>> 1
>> 1
>> 1
>> 1
>> 1
>> 2
>> 2
>> 
>> 
>> ./ko.out
>> 1
>> 1
>> 1
>> 1
>> 1
>> 1
>> 1  <-- 14/7 == 1!!!
>> 2
>> 
>> ** Feedback Needed **
>> 
>> Could you review the attached patch?
>> 
>> Moreover, because of this patch, vector-idiv.ll is failing. To me the output
>> seems reasonable but could you take a look?
>> I’ll update the patch to match the new pattern if they are fine.
>> 
>> 
>> Your patch is correct, and vector-idiv.ll was buggy. I think this bug is
>> already fixed with r212611 (the shufflemasks I had in place for AVX2 were
>> completely bogus, I should've noticed this earlier). However, I like the
>> effort that you put into documenting the lowering with verbose comments.
>> Would you mind rebasing them on top of r212611?
>> 
>> Really sorry that you had to spent so much time on this silly oversight of
>> mine.
>> - Ben
>> 
>> 
>> Thanks,
>> -Quentin
>> <lower_mullohi.patch><test.ll>
>> 
>> This commit (and more generally the custom lowering of MUL_LOHI)
>> 
>> On May 2, 2014, at 5:42 AM, Benjamin Kramer <benny.kra at gmail.com> wrote:
>> 
>> 
>> On 02.05.2014, at 13:48, Patrik Hägglund H <patrik.h.hagglund at ericsson.com>
>> wrote:
>> 
>> Hi Benjamin,
>> 
>> This commit is causing a regression for an llvm-stress test, as shown below.
>> 
>> 
>> r207835 fixes the crash. Thanks for the test case!
>> 
>> - Ben
>> 
>> 
>> /Patrik Hägglund
>> 
>> bin/llvm-stress -size 185 -seed 7564 | bin/llc -march=x86-64 -mcpu=corei7 -o
>> /dev/null
>> 
>> llc: ../lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5728: void
>> llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode *, llvm::SDNode *):
>> Assertion `(!From->hasAnyUseOfValue(i) || From->getValueType(i) ==
>> To->getValueType(i)) && "Cannot use this version of ReplaceAllUsesWith!"'
>> failed.
>> 0  llc             0x00000000010b5a55 llvm::sys::PrintStackTrace(_IO_FILE*)
>> + 37
>> 1  llc             0x00000000010b5e93
>> 2  libpthread.so.0 0x00007f7d65bcb7c0
>> 3  libc.so.6       0x00007f7d64ecfb35 gsignal + 53
>> 4  libc.so.6       0x00007f7d64ed1111 abort + 385
>> 5  libc.so.6       0x00007f7d64ec89f0 __assert_fail + 240
>> 6  llc             0x0000000000f3a524
>> 7  llc             0x0000000000ec4e45
>> llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AliasAnalysis&,
>> llvm::CodeGenOpt::Level) + 1573
>> 8  llc             0x0000000000fad035
>> llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 901
>> 9  llc             0x0000000000fac368
>> llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 6504
>> 10 llc             0x0000000000fa9c64
>> llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1332
>> 11 llc             0x0000000000c3261c
>> llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 124
>> 12 llc             0x0000000000e1f20a
>> llvm::FPPassManager::runOnFunction(llvm::Function&) + 362
>> 13 llc             0x0000000000e1f49b
>> llvm::FPPassManager::runOnModule(llvm::Module&) + 43
>> 14 llc             0x0000000000e1fa37
>> llvm::legacy::PassManagerImpl::run(llvm::Module&) + 999
>> 15 llc             0x000000000056d835 main + 6549
>> 16 libc.so.6       0x00007f7d64ebbc16 __libc_start_main + 230
>> 17 llc             0x000000000056bdb9
>> Stack dump:
>> 0.      Program arguments: bin/llc -march=x86-64 -mcpu=corei7 -o /dev/null
>> 1.      Running pass 'Function Pass Manager' on module '<stdin>'.
>> 2.      Running pass 'X86 DAG->DAG Instruction Selection' on function
>> '@autogen_SD7564'
>> Abort
>> 
>> 
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu
>> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Benjamin Kramer
>> Sent: den 27 april 2014 01:10
>> To: llvm-commits at cs.uiuc.edu
>> Subject: [llvm] r207338 - DAGCombiner: Simplify code a bit, make more
>> transforms work with vectors.
>> 
>> Author: d0k
>> Date: Sat Apr 26 18:09:49 2014
>> New Revision: 207338
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=207338&view=rev
>> Log:
>> DAGCombiner: Simplify code a bit, make more transforms work with vectors.
>> 
>> Modified:
>> llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>> llvm/trunk/test/CodeGen/X86/vector-idiv.ll
>> 
>> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=207338&r1=207337&r2=207338&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)
>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Sat Apr 26 18:09:49
>> 2014
>> @@ -644,8 +644,13 @@ static ConstantSDNode *isConstOrConstSpl
>> if (ConstantSDNode *CN = dyn_cast<ConstantSDNode>(N))
>> return CN;
>> 
>> -  if (BuildVectorSDNode *BV = dyn_cast<BuildVectorSDNode>(N))
>> -    return BV->getConstantSplatValue();
>> +  if (BuildVectorSDNode *BV = dyn_cast<BuildVectorSDNode>(N)) {
>> +    ConstantSDNode *CN = BV->getConstantSplatValue();
>> +
>> +    // BuildVectors can truncate their operands. Ignore that case here.
>> +    if (CN && CN->getValueType(0) == N.getValueType().getScalarType())
>> +      return CN;
>> +  }
>> 
>> return nullptr;
>> }
>> @@ -1957,8 +1962,8 @@ SDValue DAGCombiner::visitMUL(SDNode *N)
>> SDValue DAGCombiner::visitSDIV(SDNode *N) {
>> SDValue N0 = N->getOperand(0);
>> SDValue N1 = N->getOperand(1);
>> -  ConstantSDNode *N0C = dyn_cast<ConstantSDNode>(N0.getNode());
>> -  ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1.getNode());
>> +  ConstantSDNode *N0C = isConstOrConstSplat(N0);
>> +  ConstantSDNode *N1C = isConstOrConstSplat(N1);
>> EVT VT = N->getValueType(0);
>> 
>> // fold vector ops
>> @@ -1985,25 +1990,15 @@ SDValue DAGCombiner::visitSDIV(SDNode *N
>>                      N0, N1);
>> }
>> 
>> -  const APInt *Divisor = nullptr;
>> -  if (N1C) {
>> -    Divisor = &N1C->getAPIntValue();
>> -  } else if (N1.getValueType().isVector() &&
>> -             N1->getOpcode() == ISD::BUILD_VECTOR) {
>> -    BuildVectorSDNode *BV = cast<BuildVectorSDNode>(N->getOperand(1));
>> -    if (ConstantSDNode *C = BV->getConstantSplatValue())
>> -      Divisor = &C->getAPIntValue();
>> -  }
>> -
>> // fold (sdiv X, pow2) -> simple ops after legalize
>> -  if (Divisor && !!*Divisor &&
>> -      (Divisor->isPowerOf2() || (-*Divisor).isPowerOf2())) {
>> +  if (N1C && !N1C->isNullValue() && (N1C->getAPIntValue().isPowerOf2() ||
>> +                                     (-N1C->getAPIntValue()).isPowerOf2()))
>> {
>> // If dividing by powers of two is cheap, then don't perform the following
>> // fold.
>> if (TLI.isPow2DivCheap())
>>   return SDValue();
>> 
>> -    unsigned lg2 = Divisor->countTrailingZeros();
>> +    unsigned lg2 = N1C->getAPIntValue().countTrailingZeros();
>> 
>> // Splat the sign bit into the register
>> SDValue SGN =
>> @@ -2025,7 +2020,7 @@ SDValue DAGCombiner::visitSDIV(SDNode *N
>> 
>> // If we're dividing by a positive value, we're done.  Otherwise, we must
>> // negate the result.
>> -    if (Divisor->isNonNegative())
>> +    if (N1C->getAPIntValue().isNonNegative())
>>   return SRA;
>> 
>> AddToWorkList(SRA.getNode());
>> @@ -2034,7 +2029,7 @@ SDValue DAGCombiner::visitSDIV(SDNode *N
>> 
>> // if integer divide is expensive and we satisfy the requirements, emit an
>> // alternate sequence.
>> -  if ((N1C || N1->getOpcode() == ISD::BUILD_VECTOR) &&
>> !TLI.isIntDivCheap()) {
>> +  if (N1C && !TLI.isIntDivCheap()) {
>> SDValue Op = BuildSDIV(N);
>> if (Op.getNode()) return Op;
>> }
>> @@ -2052,8 +2047,8 @@ SDValue DAGCombiner::visitSDIV(SDNode *N
>> SDValue DAGCombiner::visitUDIV(SDNode *N) {
>> SDValue N0 = N->getOperand(0);
>> SDValue N1 = N->getOperand(1);
>> -  ConstantSDNode *N0C = dyn_cast<ConstantSDNode>(N0.getNode());
>> -  ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1.getNode());
>> +  ConstantSDNode *N0C = isConstOrConstSplat(N0);
>> +  ConstantSDNode *N1C = isConstOrConstSplat(N1);
>> EVT VT = N->getValueType(0);
>> 
>> // fold vector ops
>> @@ -2086,7 +2081,7 @@ SDValue DAGCombiner::visitUDIV(SDNode *N
>> }
>> }
>> // fold (udiv x, c) -> alternate
>> -  if ((N1C || N1->getOpcode() == ISD::BUILD_VECTOR) &&
>> !TLI.isIntDivCheap()) {
>> +  if (N1C && !TLI.isIntDivCheap()) {
>> SDValue Op = BuildUDIV(N);
>> if (Op.getNode()) return Op;
>> }
>> @@ -2104,8 +2099,8 @@ SDValue DAGCombiner::visitUDIV(SDNode *N
>> SDValue DAGCombiner::visitSREM(SDNode *N) {
>> SDValue N0 = N->getOperand(0);
>> SDValue N1 = N->getOperand(1);
>> -  ConstantSDNode *N0C = dyn_cast<ConstantSDNode>(N0);
>> -  ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1);
>> +  ConstantSDNode *N0C = isConstOrConstSplat(N0);
>> +  ConstantSDNode *N1C = isConstOrConstSplat(N1);
>> EVT VT = N->getValueType(0);
>> 
>> // fold (srem c1, c2) -> c1%c2
>> @@ -2146,8 +2141,8 @@ SDValue DAGCombiner::visitSREM(SDNode *N
>> SDValue DAGCombiner::visitUREM(SDNode *N) {
>> SDValue N0 = N->getOperand(0);
>> SDValue N1 = N->getOperand(1);
>> -  ConstantSDNode *N0C = dyn_cast<ConstantSDNode>(N0);
>> -  ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1);
>> +  ConstantSDNode *N0C = isConstOrConstSplat(N0);
>> +  ConstantSDNode *N1C = isConstOrConstSplat(N1);
>> EVT VT = N->getValueType(0);
>> 
>> // fold (urem c1, c2) -> c1%c2
>> @@ -11187,28 +11182,20 @@ SDValue DAGCombiner::SimplifySetCC(EVT V
>> /// multiplying by a magic number.  See:
>> /// <http://the.wall.riscom.net/books/proc/ppc/cwg/code2.html>
>> SDValue DAGCombiner::BuildSDIV(SDNode *N) {
>> -  const APInt *Divisor;
>> -  if (N->getValueType(0).isVector()) {
>> -    // Handle splat vectors.
>> -    BuildVectorSDNode *BV = cast<BuildVectorSDNode>(N->getOperand(1));
>> -    if (ConstantSDNode *C = BV->getConstantSplatValue())
>> -      Divisor = &C->getAPIntValue();
>> -    else
>> -      return SDValue();
>> -  } else {
>> -    Divisor = &cast<ConstantSDNode>(N->getOperand(1))->getAPIntValue();
>> -  }
>> +  ConstantSDNode *C = isConstOrConstSplat(N->getOperand(1));
>> +  if (!C)
>> +    return SDValue();
>> 
>> // Avoid division by zero.
>> -  if (!*Divisor)
>> +  if (!C->getAPIntValue())
>> return SDValue();
>> 
>> std::vector<SDNode*> Built;
>> -  SDValue S = TLI.BuildSDIV(N, *Divisor, DAG, LegalOperations, &Built);
>> +  SDValue S =
>> +      TLI.BuildSDIV(N, C->getAPIntValue(), DAG, LegalOperations, &Built);
>> 
>> -  for (std::vector<SDNode*>::iterator ii = Built.begin(), ee = Built.end();
>> -       ii != ee; ++ii)
>> -    AddToWorkList(*ii);
>> +  for (SDNode *N : Built)
>> +    AddToWorkList(N);
>> return S;
>> }
>> 
>> @@ -11217,28 +11204,20 @@ SDValue DAGCombiner::BuildSDIV(SDNode *N
>> /// multiplying by a magic number.  See:
>> /// <http://the.wall.riscom.net/books/proc/ppc/cwg/code2.html>
>> SDValue DAGCombiner::BuildUDIV(SDNode *N) {
>> -  const APInt *Divisor;
>> -  if (N->getValueType(0).isVector()) {
>> -    // Handle splat vectors.
>> -    BuildVectorSDNode *BV = cast<BuildVectorSDNode>(N->getOperand(1));
>> -    if (ConstantSDNode *C = BV->getConstantSplatValue())
>> -      Divisor = &C->getAPIntValue();
>> -    else
>> -      return SDValue();
>> -  } else {
>> -    Divisor = &cast<ConstantSDNode>(N->getOperand(1))->getAPIntValue();
>> -  }
>> +  ConstantSDNode *C = isConstOrConstSplat(N->getOperand(1));
>> +  if (!C)
>> +    return SDValue();
>> 
>> // Avoid division by zero.
>> -  if (!*Divisor)
>> +  if (!C->getAPIntValue())
>> return SDValue();
>> 
>> std::vector<SDNode*> Built;
>> -  SDValue S = TLI.BuildUDIV(N, *Divisor, DAG, LegalOperations, &Built);
>> +  SDValue S =
>> +      TLI.BuildUDIV(N, C->getAPIntValue(), DAG, LegalOperations, &Built);
>> 
>> -  for (std::vector<SDNode*>::iterator ii = Built.begin(), ee = Built.end();
>> -       ii != ee; ++ii)
>> -    AddToWorkList(*ii);
>> +  for (SDNode *N : Built)
>> +    AddToWorkList(N);
>> return S;
>> }
>> 
>> 
>> Modified: llvm/trunk/test/CodeGen/X86/vector-idiv.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-idiv.ll?rev=207338&r1=207337&r2=207338&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/vector-idiv.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/vector-idiv.ll Sat Apr 26 18:09:49 2014
>> @@ -151,3 +151,38 @@ define <8 x i32> @test9(<8 x i32> %a) {
>> ; AVX: vpsrad $2
>> ; AVX: vpadd
>> }
>> +
>> +define <8 x i32> @test10(<8 x i32> %a) {
>> +  %rem = urem <8 x i32> %a, <i32 7, i32 7, i32 7, i32 7,i32 7, i32 7, i32
>> 7, i32 7>
>> +  ret <8 x i32> %rem
>> +
>> +; AVX-LABEL: test10:
>> +; AVX: vpermd
>> +; AVX: vpmuludq
>> +; AVX: vshufps $-35
>> +; AVX: vpmuludq
>> +; AVX: vshufps $-35
>> +; AVX: vpsubd
>> +; AVX: vpsrld $1
>> +; AVX: vpadd
>> +; AVX: vpsrld $2
>> +; AVX: vpmulld
>> +}
>> +
>> +define <8 x i32> @test11(<8 x i32> %a) {
>> +  %rem = srem <8 x i32> %a, <i32 7, i32 7, i32 7, i32 7,i32 7, i32 7, i32
>> 7, i32 7>
>> +  ret <8 x i32> %rem
>> +
>> +; AVX-LABEL: test11:
>> +; AVX: vpermd
>> +; AVX: vpmuldq
>> +; AVX: vshufps $-35
>> +; AVX: vpmuldq
>> +; AVX: vshufps $-35
>> +; AVX: vpshufd $-40
>> +; AVX: vpadd
>> +; AVX: vpsrld $31
>> +; AVX: vpsrad $2
>> +; AVX: vpadd
>> +; AVX: vpmulld
>> +}
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
>> 
>>