[llvm] r207338 - DAGCombiner: Simplify code a bit, make more transforms work with vectors.

Fri Jul 11 03:05:16 PDT 2014

Hi Benjamin,

Attached is the rebased patch.

I’ve kept your if/else construction for the setting of the mask, but I can apply the array of array approach of my initial proposal if you prefer.

Note that like the original patch, this patch also corrects the low <-> high inversion.
This result in a diff on idiv test case for SSE variant that I believe is correct.
Indeed, there is no reason that we see the fix up of pmuludq in that test, since we are just reading the low bits!

Is it OK to commit?

Thanks,
-Quentin

> On Jul 10, 2014, at 10:59 AM, Benjamin Kramer <benny.kra at gmail.com> wrote:
> 
> 
> On 10.07.2014, at 19:46, Quentin Colombet <qcolombet at apple.com> wrote:
> 
>> Hi Benjamin,
>> 
>> I have narrowed down a failure of the factor test case (from the llvm test-suite: SingleSource/UnitTests/SignlessTypes) to this commit.
>> 
>> The failure happens at Os on a haswell machine running macosx.
>> 
>> I have reduced the problem to the attached test.ll and I proposed to fix that with the attached patch.
>> Basically the masks for the shuffle are wrong for vectors with 8 elements.
>> 
>> ** To Reproduce **
>> 
>> Basically this test takes 8 numbers: 8, 9, …, 15 and divide each of them by 7 and print the result.
>> 
>> On a haswell machine
>> llc -mattr=-avx2 test.ll -o ok.s
>> llc -mattr=+avx2 test.ll -o ko.s
>> 
>> clang ok.s -arch x86_64 -o ok.out
>> clang ko.s -arch x86_64 -march core-avx2 -o ko.out
>> 
>> ./ok.out
>> 1
>> 1
>> 1
>> 1
>> 1
>> 1
>> 2
>> 2
>> 
>> 
>> ./ko.out
>> 1
>> 1
>> 1
>> 1
>> 1
>> 1
>> 1  <-- 14/7 == 1!!!
>> 2
>> 
>> ** Feedback Needed **
>> 
>> Could you review the attached patch?
>> 
>> Moreover, because of this patch, vector-idiv.ll is failing. To me the output seems reasonable but could you take a look?
>> I’ll update the patch to match the new pattern if they are fine.
> 
> Your patch is correct, and vector-idiv.ll was buggy. I think this bug is already fixed with r212611 (the shufflemasks I had in place for AVX2 were completely bogus, I should've noticed this earlier). However, I like the effort that you put into documenting the lowering with verbose comments. Would you mind rebasing them on top of r212611?
> 
> Really sorry that you had to spent so much time on this silly oversight of mine.
> - Ben
> 
>> 
>> Thanks,
>> -Quentin
>> <lower_mullohi.patch><test.ll>
>> 
>> This commit (and more generally the custom lowering of MUL_LOHI) 
>>> On May 2, 2014, at 5:42 AM, Benjamin Kramer <benny.kra at gmail.com> wrote:
>>> 
>>> 
>>> On 02.05.2014, at 13:48, Patrik Hägglund H <patrik.h.hagglund at ericsson.com> wrote:
>>> 
>>>> Hi Benjamin,
>>>> 
>>>> This commit is causing a regression for an llvm-stress test, as shown below.
>>> 
>>> r207835 fixes the crash. Thanks for the test case!
>>> 
>>> - Ben
>>> 
>>>> 
>>>> /Patrik Hägglund
>>>> 
>>>> bin/llvm-stress -size 185 -seed 7564 | bin/llc -march=x86-64 -mcpu=corei7 -o /dev/null
>>>> 
>>>> llc: ../lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5728: void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode *, llvm::SDNode *): Assertion `(!From->hasAnyUseOfValue(i) || From->getValueType(i) == To->getValueType(i)) && "Cannot use this version of ReplaceAllUsesWith!"' failed.
>>>> 0  llc             0x00000000010b5a55 llvm::sys::PrintStackTrace(_IO_FILE*) + 37
>>>> 1  llc             0x00000000010b5e93
>>>> 2  libpthread.so.0 0x00007f7d65bcb7c0
>>>> 3  libc.so.6       0x00007f7d64ecfb35 gsignal + 53
>>>> 4  libc.so.6       0x00007f7d64ed1111 abort + 385
>>>> 5  libc.so.6       0x00007f7d64ec89f0 __assert_fail + 240
>>>> 6  llc             0x0000000000f3a524
>>>> 7  llc             0x0000000000ec4e45 llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AliasAnalysis&, llvm::CodeGenOpt::Level) + 1573
>>>> 8  llc             0x0000000000fad035 llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 901
>>>> 9  llc             0x0000000000fac368 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 6504
>>>> 10 llc             0x0000000000fa9c64 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1332
>>>> 11 llc             0x0000000000c3261c llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 124
>>>> 12 llc             0x0000000000e1f20a llvm::FPPassManager::runOnFunction(llvm::Function&) + 362
>>>> 13 llc             0x0000000000e1f49b llvm::FPPassManager::runOnModule(llvm::Module&) + 43
>>>> 14 llc             0x0000000000e1fa37 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 999
>>>> 15 llc             0x000000000056d835 main + 6549
>>>> 16 libc.so.6       0x00007f7d64ebbc16 __libc_start_main + 230
>>>> 17 llc             0x000000000056bdb9
>>>> Stack dump:
>>>> 0.      Program arguments: bin/llc -march=x86-64 -mcpu=corei7 -o /dev/null
>>>> 1.      Running pass 'Function Pass Manager' on module '<stdin>'.
>>>> 2.      Running pass 'X86 DAG->DAG Instruction Selection' on function '@autogen_SD7564'
>>>> Abort
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Benjamin Kramer
>>>> Sent: den 27 april 2014 01:10
>>>> To: llvm-commits at cs.uiuc.edu
>>>> Subject: [llvm] r207338 - DAGCombiner: Simplify code a bit, make more transforms work with vectors.
>>>> 
>>>> Author: d0k
>>>> Date: Sat Apr 26 18:09:49 2014
>>>> New Revision: 207338
>>>> 
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=207338&view=rev
>>>> Log:
>>>> DAGCombiner: Simplify code a bit, make more transforms work with vectors.
>>>> 
>>>> Modified:
>>>> llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>> llvm/trunk/test/CodeGen/X86/vector-idiv.ll
>>>> 
>>>> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=207338&r1=207337&r2=207338&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)
>>>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Sat Apr 26 18:09:49 2014
>>>> @@ -644,8 +644,13 @@ static ConstantSDNode *isConstOrConstSpl
>>>> if (ConstantSDNode *CN = dyn_cast<ConstantSDNode>(N))
>>>>  return CN;
>>>> 
>>>> -  if (BuildVectorSDNode *BV = dyn_cast<BuildVectorSDNode>(N))
>>>> -    return BV->getConstantSplatValue();
>>>> +  if (BuildVectorSDNode *BV = dyn_cast<BuildVectorSDNode>(N)) {
>>>> +    ConstantSDNode *CN = BV->getConstantSplatValue();
>>>> +
>>>> +    // BuildVectors can truncate their operands. Ignore that case here.
>>>> +    if (CN && CN->getValueType(0) == N.getValueType().getScalarType())
>>>> +      return CN;
>>>> +  }
>>>> 
>>>> return nullptr;
>>>> }
>>>> @@ -1957,8 +1962,8 @@ SDValue DAGCombiner::visitMUL(SDNode *N)
>>>> SDValue DAGCombiner::visitSDIV(SDNode *N) {
>>>> SDValue N0 = N->getOperand(0);
>>>> SDValue N1 = N->getOperand(1);
>>>> -  ConstantSDNode *N0C = dyn_cast<ConstantSDNode>(N0.getNode());
>>>> -  ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1.getNode());
>>>> +  ConstantSDNode *N0C = isConstOrConstSplat(N0);
>>>> +  ConstantSDNode *N1C = isConstOrConstSplat(N1);
>>>> EVT VT = N->getValueType(0);
>>>> 
>>>> // fold vector ops
>>>> @@ -1985,25 +1990,15 @@ SDValue DAGCombiner::visitSDIV(SDNode *N
>>>>                       N0, N1);
>>>> }
>>>> 
>>>> -  const APInt *Divisor = nullptr;
>>>> -  if (N1C) {
>>>> -    Divisor = &N1C->getAPIntValue();
>>>> -  } else if (N1.getValueType().isVector() &&
>>>> -             N1->getOpcode() == ISD::BUILD_VECTOR) {
>>>> -    BuildVectorSDNode *BV = cast<BuildVectorSDNode>(N->getOperand(1));
>>>> -    if (ConstantSDNode *C = BV->getConstantSplatValue())
>>>> -      Divisor = &C->getAPIntValue();
>>>> -  }
>>>> -
>>>> // fold (sdiv X, pow2) -> simple ops after legalize
>>>> -  if (Divisor && !!*Divisor &&
>>>> -      (Divisor->isPowerOf2() || (-*Divisor).isPowerOf2())) {
>>>> +  if (N1C && !N1C->isNullValue() && (N1C->getAPIntValue().isPowerOf2() ||
>>>> +                                     (-N1C->getAPIntValue()).isPowerOf2())) {
>>>>  // If dividing by powers of two is cheap, then don't perform the following
>>>>  // fold.
>>>>  if (TLI.isPow2DivCheap())
>>>>    return SDValue();
>>>> 
>>>> -    unsigned lg2 = Divisor->countTrailingZeros();
>>>> +    unsigned lg2 = N1C->getAPIntValue().countTrailingZeros();
>>>> 
>>>>  // Splat the sign bit into the register
>>>>  SDValue SGN =
>>>> @@ -2025,7 +2020,7 @@ SDValue DAGCombiner::visitSDIV(SDNode *N
>>>> 
>>>>  // If we're dividing by a positive value, we're done.  Otherwise, we must
>>>>  // negate the result.
>>>> -    if (Divisor->isNonNegative())
>>>> +    if (N1C->getAPIntValue().isNonNegative())
>>>>    return SRA;
>>>> 
>>>>  AddToWorkList(SRA.getNode());
>>>> @@ -2034,7 +2029,7 @@ SDValue DAGCombiner::visitSDIV(SDNode *N
>>>> 
>>>> // if integer divide is expensive and we satisfy the requirements, emit an
>>>> // alternate sequence.
>>>> -  if ((N1C || N1->getOpcode() == ISD::BUILD_VECTOR) && !TLI.isIntDivCheap()) {
>>>> +  if (N1C && !TLI.isIntDivCheap()) {
>>>>  SDValue Op = BuildSDIV(N);
>>>>  if (Op.getNode()) return Op;
>>>> }
>>>> @@ -2052,8 +2047,8 @@ SDValue DAGCombiner::visitSDIV(SDNode *N
>>>> SDValue DAGCombiner::visitUDIV(SDNode *N) {
>>>> SDValue N0 = N->getOperand(0);
>>>> SDValue N1 = N->getOperand(1);
>>>> -  ConstantSDNode *N0C = dyn_cast<ConstantSDNode>(N0.getNode());
>>>> -  ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1.getNode());
>>>> +  ConstantSDNode *N0C = isConstOrConstSplat(N0);
>>>> +  ConstantSDNode *N1C = isConstOrConstSplat(N1);
>>>> EVT VT = N->getValueType(0);
>>>> 
>>>> // fold vector ops
>>>> @@ -2086,7 +2081,7 @@ SDValue DAGCombiner::visitUDIV(SDNode *N
>>>>  }
>>>> }
>>>> // fold (udiv x, c) -> alternate
>>>> -  if ((N1C || N1->getOpcode() == ISD::BUILD_VECTOR) && !TLI.isIntDivCheap()) {
>>>> +  if (N1C && !TLI.isIntDivCheap()) {
>>>>  SDValue Op = BuildUDIV(N);
>>>>  if (Op.getNode()) return Op;
>>>> }
>>>> @@ -2104,8 +2099,8 @@ SDValue DAGCombiner::visitUDIV(SDNode *N
>>>> SDValue DAGCombiner::visitSREM(SDNode *N) {
>>>> SDValue N0 = N->getOperand(0);
>>>> SDValue N1 = N->getOperand(1);
>>>> -  ConstantSDNode *N0C = dyn_cast<ConstantSDNode>(N0);
>>>> -  ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1);
>>>> +  ConstantSDNode *N0C = isConstOrConstSplat(N0);
>>>> +  ConstantSDNode *N1C = isConstOrConstSplat(N1);
>>>> EVT VT = N->getValueType(0);
>>>> 
>>>> // fold (srem c1, c2) -> c1%c2
>>>> @@ -2146,8 +2141,8 @@ SDValue DAGCombiner::visitSREM(SDNode *N
>>>> SDValue DAGCombiner::visitUREM(SDNode *N) {
>>>> SDValue N0 = N->getOperand(0);
>>>> SDValue N1 = N->getOperand(1);
>>>> -  ConstantSDNode *N0C = dyn_cast<ConstantSDNode>(N0);
>>>> -  ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1);
>>>> +  ConstantSDNode *N0C = isConstOrConstSplat(N0);
>>>> +  ConstantSDNode *N1C = isConstOrConstSplat(N1);
>>>> EVT VT = N->getValueType(0);
>>>> 
>>>> // fold (urem c1, c2) -> c1%c2
>>>> @@ -11187,28 +11182,20 @@ SDValue DAGCombiner::SimplifySetCC(EVT V
>>>> /// multiplying by a magic number.  See:
>>>> /// <http://the.wall.riscom.net/books/proc/ppc/cwg/code2.html>
>>>> SDValue DAGCombiner::BuildSDIV(SDNode *N) {
>>>> -  const APInt *Divisor;
>>>> -  if (N->getValueType(0).isVector()) {
>>>> -    // Handle splat vectors.
>>>> -    BuildVectorSDNode *BV = cast<BuildVectorSDNode>(N->getOperand(1));
>>>> -    if (ConstantSDNode *C = BV->getConstantSplatValue())
>>>> -      Divisor = &C->getAPIntValue();
>>>> -    else
>>>> -      return SDValue();
>>>> -  } else {
>>>> -    Divisor = &cast<ConstantSDNode>(N->getOperand(1))->getAPIntValue();
>>>> -  }
>>>> +  ConstantSDNode *C = isConstOrConstSplat(N->getOperand(1));
>>>> +  if (!C)
>>>> +    return SDValue();
>>>> 
>>>> // Avoid division by zero.
>>>> -  if (!*Divisor)
>>>> +  if (!C->getAPIntValue())
>>>>  return SDValue();
>>>> 
>>>> std::vector<SDNode*> Built;
>>>> -  SDValue S = TLI.BuildSDIV(N, *Divisor, DAG, LegalOperations, &Built);
>>>> +  SDValue S =
>>>> +      TLI.BuildSDIV(N, C->getAPIntValue(), DAG, LegalOperations, &Built);
>>>> 
>>>> -  for (std::vector<SDNode*>::iterator ii = Built.begin(), ee = Built.end();
>>>> -       ii != ee; ++ii)
>>>> -    AddToWorkList(*ii);
>>>> +  for (SDNode *N : Built)
>>>> +    AddToWorkList(N);
>>>> return S;
>>>> }
>>>> 
>>>> @@ -11217,28 +11204,20 @@ SDValue DAGCombiner::BuildSDIV(SDNode *N
>>>> /// multiplying by a magic number.  See:
>>>> /// <http://the.wall.riscom.net/books/proc/ppc/cwg/code2.html>
>>>> SDValue DAGCombiner::BuildUDIV(SDNode *N) {
>>>> -  const APInt *Divisor;
>>>> -  if (N->getValueType(0).isVector()) {
>>>> -    // Handle splat vectors.
>>>> -    BuildVectorSDNode *BV = cast<BuildVectorSDNode>(N->getOperand(1));
>>>> -    if (ConstantSDNode *C = BV->getConstantSplatValue())
>>>> -      Divisor = &C->getAPIntValue();
>>>> -    else
>>>> -      return SDValue();
>>>> -  } else {
>>>> -    Divisor = &cast<ConstantSDNode>(N->getOperand(1))->getAPIntValue();
>>>> -  }
>>>> +  ConstantSDNode *C = isConstOrConstSplat(N->getOperand(1));
>>>> +  if (!C)
>>>> +    return SDValue();
>>>> 
>>>> // Avoid division by zero.
>>>> -  if (!*Divisor)
>>>> +  if (!C->getAPIntValue())
>>>>  return SDValue();
>>>> 
>>>> std::vector<SDNode*> Built;
>>>> -  SDValue S = TLI.BuildUDIV(N, *Divisor, DAG, LegalOperations, &Built);
>>>> +  SDValue S =
>>>> +      TLI.BuildUDIV(N, C->getAPIntValue(), DAG, LegalOperations, &Built);
>>>> 
>>>> -  for (std::vector<SDNode*>::iterator ii = Built.begin(), ee = Built.end();
>>>> -       ii != ee; ++ii)
>>>> -    AddToWorkList(*ii);
>>>> +  for (SDNode *N : Built)
>>>> +    AddToWorkList(N);
>>>> return S;
>>>> }
>>>> 
>>>> 
>>>> Modified: llvm/trunk/test/CodeGen/X86/vector-idiv.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-idiv.ll?rev=207338&r1=207337&r2=207338&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/X86/vector-idiv.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/vector-idiv.ll Sat Apr 26 18:09:49 2014
>>>> @@ -151,3 +151,38 @@ define <8 x i32> @test9(<8 x i32> %a) {
>>>> ; AVX: vpsrad $2
>>>> ; AVX: vpadd
>>>> }
>>>> +
>>>> +define <8 x i32> @test10(<8 x i32> %a) {
>>>> +  %rem = urem <8 x i32> %a, <i32 7, i32 7, i32 7, i32 7,i32 7, i32 7, i32 7, i32 7>
>>>> +  ret <8 x i32> %rem
>>>> +
>>>> +; AVX-LABEL: test10:
>>>> +; AVX: vpermd
>>>> +; AVX: vpmuludq
>>>> +; AVX: vshufps	$-35
>>>> +; AVX: vpmuludq
>>>> +; AVX: vshufps	$-35
>>>> +; AVX: vpsubd
>>>> +; AVX: vpsrld $1
>>>> +; AVX: vpadd
>>>> +; AVX: vpsrld $2
>>>> +; AVX: vpmulld
>>>> +}
>>>> +
>>>> +define <8 x i32> @test11(<8 x i32> %a) {
>>>> +  %rem = srem <8 x i32> %a, <i32 7, i32 7, i32 7, i32 7,i32 7, i32 7, i32 7, i32 7>
>>>> +  ret <8 x i32> %rem
>>>> +
>>>> +; AVX-LABEL: test11:
>>>> +; AVX: vpermd
>>>> +; AVX: vpmuldq
>>>> +; AVX: vshufps	$-35
>>>> +; AVX: vpmuldq
>>>> +; AVX: vshufps	$-35
>>>> +; AVX: vpshufd	$-40
>>>> +; AVX: vpadd
>>>> +; AVX: vpsrld $31
>>>> +; AVX: vpsrad $2
>>>> +; AVX: vpadd
>>>> +; AVX: vpmulld
>>>> +}
>>>> 
>>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> 
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140711/96854216/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lower_mullohi.patch
Type: application/octet-stream
Size: 4335 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140711/96854216/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140711/96854216/attachment-0001.html>