[llvm-commits] [llvm] r166716 - in /llvm/trunk: lib/Transforms/Vectorize/BBVectorize.cpp test/Transforms/BBVectorize/loop1.ll test/Transforms/BBVectorize/simple.ll

David Tweed david.tweed at arm.com
Fri Oct 26 08:55:05 PDT 2012


The "--targets=arm" probably has something to do with it, since if I try the tests on an ARM machine configured without it I the tests pass. However, it doesn't look the program is straight-aborting, it's running to completion and not performing the expected transformation. The relevant snipped segments from

http://lab.llvm.org:8011/builders/clang-native-arm-cortex-a9/builds/3392/steps/check-all/logs/stdio

is

******************** TEST 'LLVM :: Transforms/BBVectorize/simple.ll' FAILED ********************
Script:
--
/home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/Release+Asserts/bin/opt < /home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/test/Transforms/BBVectorize/simple.ll -bb-vectorize -bb-vectorize-req-chain-depth=3 -instcombine -gvn -S | /home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/Release+Asserts/bin/FileCheck /home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/test/Transforms/BBVectorize/simple.ll
/home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/Release+Asserts/bin/opt < /home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/test/Transforms/BBVectorize/simple.ll -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 -bb-vectorize -bb-vectorize-req-chain-depth=3 -instcombine -gvn -S | /home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/Release+Asserts/bin/FileCheck /home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/test/Transforms/BBVectorize/simple.ll -check-prefix=CHECK-TI
--
Exit Code: 1
Command Output (stderr):
--
<stdin>:79:8: error: CHECK-TI-NOT: string occurred!
 %X1 = sub <16 x i8> %X1.v.i0, %X1.v.i1
       ^
/home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/test/Transforms/BBVectorize/simple.ll:154:30: note: CHECK-TI-NOT: pattern specified here
; CHECK-TI-NOT: sub <16 x i8>
                             ^
--

********************

******************** TEST 'LLVM :: Transforms/LoopVectorize/cost-model.ll' FAILED ********************
Script:
--
/home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/Release+Asserts/bin/opt < /home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/test/Transforms/LoopVectorize/cost-model.ll  -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx -S | /home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/Release+Asserts/bin/FileCheck /home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/test/Transforms/LoopVectorize/cost-model.ll
--
Exit Code: 1
Command Output (stderr):
--
/home/buildslave/slave_as-bldslv2/clang-native-arm-cortex-a9/llvm/test/Transforms/LoopVectorize/cost-model.ll:12:9: error: expected string not found in input
;CHECK: <4 x i32>
        ^
<stdin>:10:26: note: scanning from here
define void @cost_model_1() nounwind uwtable noinline ssp {
                         ^
<stdin>:17:39: note: possible intended match here
 %arrayidx = getelementptr inbounds [2048 x i32]* @c, i64 0, i64 %0
                                      ^
--

********************

-----Original Message-----
From: Hal Finkel [mailto:hfinkel at anl.gov] 
Sent: 26 October 2012 15:45
To: David Tweed
Cc: llvm-commits at cs.uiuc.edu; Nadav Rotem
Subject: Re: [llvm-commits] [llvm] r166716 - in /llvm/trunk: lib/Transforms/Vectorize/BBVectorize.cpp test/Transforms/BBVectorize/loop1.ll test/Transforms/BBVectorize/simple.ll

Dave,

If I had to guess, we probably need to make sure that the relevant backend is builtin to run the tests. The X86 target is not built on the ARM bots, right? There is no way to REQUIRE for a particular RUN line, right?

Thanks again,
Hal

----- Original Message -----
> From: "David Tweed" <david.tweed at arm.com>
> To: "Hal Finkel" <hfinkel at anl.gov>, llvm-commits at cs.uiuc.edu
> Sent: Friday, October 26, 2012 3:52:39 AM
> Subject: RE: [llvm-commits] [llvm] r166716 - in /llvm/trunk: lib/Transforms/Vectorize/BBVectorize.cpp
> test/Transforms/BBVectorize/loop1.ll test/Transforms/BBVectorize/simple.ll
> 
> Hi,
> 
> I've noticed that both on my ARM test machine and the public ARM
> buildbots
> these tests BBVectorize/simple.ll and cost-model.ll have started to
> appear
> as failures. From the error messages it's not exactly clear why.
> 
> Thanks,
> Dave
> 
> 
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu
> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel
> Sent: 25 October 2012 22:12
> To: llvm-commits at cs.uiuc.edu
> Subject: [llvm-commits] [llvm] r166716 - in /llvm/trunk:
> lib/Transforms/Vectorize/BBVectorize.cpp
> test/Transforms/BBVectorize/loop1.ll
> test/Transforms/BBVectorize/simple.ll
> 
> Author: hfinkel
> Date: Thu Oct 25 16:12:23 2012
> New Revision: 166716
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=166716&view=rev
> Log:
> Begin incorporating target information into BBVectorize.
> 
> This is the first of several steps to incorporate information from
> the new
> TargetTransformInfo infrastructure into BBVectorize. Two things are
> done
> here:
> 
>  1. Target information is used to determine if it is profitable to
>  fuse two
>     instructions. This means that the cost of the vector operation
>     must not
>     be more expensive than the cost of the two original operations.
>     Pairs
> that
>     are not profitable are no longer considered (because current cost
> information
>     is incomplete, for intrinsics for example, equal-cost pairs are
>     still
>     considered).
> 
>  2. The 'cost savings' computed for the profitability check are also
>  used to
>     rank the DAGs that represent the potential vectorization plans.
> Specifically,
>     for nodes of non-trivial depth, the cost savings is used as the
>     node
>     weight.
> 
> The next step will be to incorporate the shuffle costs into the DAG
> weighting;
> this will give the edges of the DAG weights as well. Once that is
> done, when
> target information is available, we should be able to dispense with
> the
> depth heuristic.
> 
> Modified:
>     llvm/trunk/lib/Transforms/Vectorize/BBVectorize.cpp
>     llvm/trunk/test/Transforms/BBVectorize/loop1.ll
>     llvm/trunk/test/Transforms/BBVectorize/simple.ll
> 
> Modified: llvm/trunk/lib/Transforms/Vectorize/BBVectorize.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/BBVe
> ctorize.cpp?rev=166716&r1=166715&r2=166716&view=diff
> ============================================================================
> ==
> --- llvm/trunk/lib/Transforms/Vectorize/BBVectorize.cpp (original)
> +++ llvm/trunk/lib/Transforms/Vectorize/BBVectorize.cpp Thu Oct 25
> 16:12:23
> 2012
> @@ -43,12 +43,17 @@
>  #include "llvm/Support/raw_ostream.h"
>  #include "llvm/Support/ValueHandle.h"
>  #include "llvm/DataLayout.h"
> +#include "llvm/TargetTransformInfo.h"
>  #include "llvm/Transforms/Utils/Local.h"
>  #include "llvm/Transforms/Vectorize.h"
>  #include <algorithm>
>  #include <map>
>  using namespace llvm;
>  
> +static cl::opt<bool>
> +IgnoreTargetInfo("bb-vectorize-ignore-target-info",
>  cl::init(false),
> +  cl::Hidden, cl::desc("Ignore target information"));
> +
>  static cl::opt<unsigned>
>  ReqChainDepth("bb-vectorize-req-chain-depth", cl::init(6),
>  cl::Hidden,
>    cl::desc("The required chain depth for vectorization"));
> @@ -181,9 +186,13 @@
>        DT = &P->getAnalysis<DominatorTree>();
>        SE = &P->getAnalysis<ScalarEvolution>();
>        TD = P->getAnalysisIfAvailable<DataLayout>();
> +      TTI = IgnoreTargetInfo ? 0 :
> +        P->getAnalysisIfAvailable<TargetTransformInfo>();
> +      VTTI = TTI ? TTI->getVectorTargetTransformInfo() : 0;
>      }
>  
>      typedef std::pair<Value *, Value *> ValuePair;
> +    typedef std::pair<ValuePair, int> ValuePairWithCost;
>      typedef std::pair<ValuePair, size_t> ValuePairWithDepth;
>      typedef std::pair<ValuePair, ValuePair> VPPair; // A ValuePair
>      pair
>      typedef std::pair<std::multimap<Value *, Value *>::iterator,
> @@ -196,6 +205,8 @@
>      DominatorTree *DT;
>      ScalarEvolution *SE;
>      DataLayout *TD;
> +    TargetTransformInfo *TTI;
> +    const VectorTargetTransformInfo *VTTI;
>  
>      // FIXME: const correct?
>  
> @@ -204,6 +215,7 @@
>      bool getCandidatePairs(BasicBlock &BB,
>                         BasicBlock::iterator &Start,
>                         std::multimap<Value *, Value *>
>                         &CandidatePairs,
> +                       DenseMap<ValuePair, int>
> &CandidatePairCostSavings,
>                         std::vector<Value *> &PairableInsts, bool
> NonPow2Len);
>  
>      void computeConnectedPairs(std::multimap<Value *, Value *>
> &CandidatePairs,
> @@ -216,6 +228,7 @@
>                         DenseSet<ValuePair> &PairableInstUsers);
>  
>      void choosePairs(std::multimap<Value *, Value *>
>      &CandidatePairs,
> +                        DenseMap<ValuePair, int>
> &CandidatePairCostSavings,
>                          std::vector<Value *> &PairableInsts,
>                          std::multimap<ValuePair, ValuePair>
> &ConnectedPairs,
>                          DenseSet<ValuePair> &PairableInstUsers,
> @@ -228,7 +241,8 @@
>      bool isInstVectorizable(Instruction *I, bool
>      &IsSimpleLoadStore);
>  
>      bool areInstsCompatible(Instruction *I, Instruction *J,
> -                       bool IsSimpleLoadStore, bool NonPow2Len);
> +                       bool IsSimpleLoadStore, bool NonPow2Len,
> +                       int &CostSavings);
>  
>      bool trackUsesOfI(DenseSet<Value *> &Users,
>                        AliasSetTracker &WriteSet, Instruction *I,
> @@ -270,13 +284,14 @@
>  
>      void findBestTreeFor(
>                        std::multimap<Value *, Value *>
>                        &CandidatePairs,
> +                      DenseMap<ValuePair, int>
> &CandidatePairCostSavings,
>                        std::vector<Value *> &PairableInsts,
>                        std::multimap<ValuePair, ValuePair>
>                        &ConnectedPairs,
>                        DenseSet<ValuePair> &PairableInstUsers,
>                        std::multimap<ValuePair, ValuePair>
> &PairableInstUserMap,
>                        DenseMap<Value *, Value *> &ChosenPairs,
>                        DenseSet<ValuePair> &BestTree, size_t
>                        &BestMaxDepth,
> -                      size_t &BestEffSize, VPIteratorPair
> ChoiceRange,
> +                      int &BestEffSize, VPIteratorPair ChoiceRange,
>                        bool UseCycleCheck);
>  
>      Value *getReplacementPointerInput(LLVMContext& Context,
>      Instruction *I,
> @@ -339,13 +354,16 @@
>          return false;
>        }
>  
> +      DEBUG(if (VTTI) dbgs() << "BBV: using target information\n");
> +
>        bool changed = false;
>        // Iterate a sufficient number of times to merge types of size
>        1 bit,
>        // then 2 bits, then 4, etc. up to half of the target vector
>        width of
> the
>        // target vector register.
>        unsigned n = 1;
>        for (unsigned v = 2;
> -           v <= Config.VectorBits && (!Config.MaxIter || n <=
> Config.MaxIter);
> +           (VTTI || v <= Config.VectorBits) &&
> +           (!Config.MaxIter || n <= Config.MaxIter);
>             v *= 2, ++n) {
>          DEBUG(dbgs() << "BBV: fusing loop #" << n <<
>                " for " << BB.getName() << " in " <<
> @@ -375,6 +393,9 @@
>        DT = &getAnalysis<DominatorTree>();
>        SE = &getAnalysis<ScalarEvolution>();
>        TD = getAnalysisIfAvailable<DataLayout>();
> +      TTI = IgnoreTargetInfo ? 0 :
> +        getAnalysisIfAvailable<TargetTransformInfo>();
> +      VTTI = TTI ? TTI->getVectorTargetTransformInfo() : 0;
>  
>        return vectorizeBB(BB);
>      }
> @@ -427,6 +448,10 @@
>          T2 = cast<CastInst>(I)->getSrcTy();
>        else
>          T2 = T1;
> +
> +      if (SelectInst *SI = dyn_cast<SelectInst>(I)) {
> +        T2 = SI->getCondition()->getType();
> +      }
>      }
>  
>      // Returns the weight associated with the provided value. A
>      chain of
> @@ -465,18 +490,25 @@
>      // directly after J.
>      bool getPairPtrInfo(Instruction *I, Instruction *J,
>          Value *&IPtr, Value *&JPtr, unsigned &IAlignment, unsigned
> &JAlignment,
> +        unsigned &IAddressSpace, unsigned &JAddressSpace,
>          int64_t &OffsetInElmts) {
>        OffsetInElmts = 0;
> -      if (isa<LoadInst>(I)) {
> -        IPtr = cast<LoadInst>(I)->getPointerOperand();
> -        JPtr = cast<LoadInst>(J)->getPointerOperand();
> -        IAlignment = cast<LoadInst>(I)->getAlignment();
> -        JAlignment = cast<LoadInst>(J)->getAlignment();
> +      if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
> +        LoadInst *LJ = cast<LoadInst>(J);
> +        IPtr = LI->getPointerOperand();
> +        JPtr = LJ->getPointerOperand();
> +        IAlignment = LI->getAlignment();
> +        JAlignment = LJ->getAlignment();
> +        IAddressSpace = LI->getPointerAddressSpace();
> +        JAddressSpace = LJ->getPointerAddressSpace();
>        } else {
> -        IPtr = cast<StoreInst>(I)->getPointerOperand();
> -        JPtr = cast<StoreInst>(J)->getPointerOperand();
> -        IAlignment = cast<StoreInst>(I)->getAlignment();
> -        JAlignment = cast<StoreInst>(J)->getAlignment();
> +        StoreInst *SI = cast<StoreInst>(I), *SJ =
> cast<StoreInst>(J);
> +        IPtr = SI->getPointerOperand();
> +        JPtr = SJ->getPointerOperand();
> +        IAlignment = SI->getAlignment();
> +        JAlignment = SJ->getAlignment();
> +        IAddressSpace = SI->getPointerAddressSpace();
> +        JAddressSpace = SJ->getPointerAddressSpace();
>        }
>  
>        const SCEV *IPtrSCEV = SE->getSCEV(IPtr);
> @@ -562,7 +594,9 @@
>      do {
>        std::vector<Value *> PairableInsts;
>        std::multimap<Value *, Value *> CandidatePairs;
> +      DenseMap<ValuePair, int> CandidatePairCostSavings;
>        ShouldContinue = getCandidatePairs(BB, Start, CandidatePairs,
> +                                         CandidatePairCostSavings,
>                                           PairableInsts, NonPow2Len);
>        if (PairableInsts.empty()) continue;
>  
> @@ -590,7 +624,8 @@
>        // variables.
>  
>        DenseMap<Value *, Value *> ChosenPairs;
> -      choosePairs(CandidatePairs, PairableInsts, ConnectedPairs,
> +      choosePairs(CandidatePairs, CandidatePairCostSavings,
> +        PairableInsts, ConnectedPairs,
>          PairableInstUsers, ChosenPairs);
>  
>        if (ChosenPairs.empty()) continue;
> @@ -679,15 +714,22 @@
>          !(VectorType::isValidElementType(T2) || T2->isVectorTy()))
>        return false;
>  
> -    if (T1->getScalarSizeInBits() == 1 && T2->getScalarSizeInBits()
> == 1) {
> +    if (T1->getScalarSizeInBits() == 1) {
>        if (!Config.VectorizeBools)
>          return false;
>      } else {
> -      if (!Config.VectorizeInts
> -          && (T1->isIntOrIntVectorTy() || T2->isIntOrIntVectorTy()))
> +      if (!Config.VectorizeInts && T1->isIntOrIntVectorTy())
>          return false;
>      }
> -
> +
> +    if (T2->getScalarSizeInBits() == 1) {
> +      if (!Config.VectorizeBools)
> +        return false;
> +    } else {
> +      if (!Config.VectorizeInts && T2->isIntOrIntVectorTy())
> +        return false;
> +    }
> +
>      if (!Config.VectorizeFloats
>          && (T1->isFPOrFPVectorTy() || T2->isFPOrFPVectorTy()))
>        return false;
> @@ -703,8 +745,8 @@
>           T2->getScalarType()->isPointerTy()))
>        return false;
>  
> -    if (T1->getPrimitiveSizeInBits() >= Config.VectorBits ||
> -        T2->getPrimitiveSizeInBits() >= Config.VectorBits)
> +    if (!VTTI && (T1->getPrimitiveSizeInBits() >= Config.VectorBits
> ||
> +                  T2->getPrimitiveSizeInBits() >=
> Config.VectorBits))
>        return false;
>  
>      return true;
> @@ -715,10 +757,13 @@
>    // that I has already been determined to be vectorizable and that
>    J is
> not
>    // in the use tree of I.
>    bool BBVectorize::areInstsCompatible(Instruction *I, Instruction
>    *J,
> -                       bool IsSimpleLoadStore, bool NonPow2Len) {
> +                       bool IsSimpleLoadStore, bool NonPow2Len,
> +                       int &CostSavings) {
>      DEBUG(if (DebugInstructionExamination) dbgs() << "BBV: looking
>      at " <<
> *I <<
>                       " <-> " << *J << "\n");
>  
> +    CostSavings = 0;
> +
>      // Loads and stores can be merged if they have different
>      alignments,
>      // but are otherwise the same.
>      if (!J->isSameOperationAs(I,
>      Instruction::CompareIgnoringAlignment |
> @@ -731,38 +776,62 @@
>      unsigned MaxTypeBits = std::max(
>        IT1->getPrimitiveSizeInBits() + JT1->getPrimitiveSizeInBits(),
>        IT2->getPrimitiveSizeInBits() +
>        JT2->getPrimitiveSizeInBits());
> -    if (MaxTypeBits > Config.VectorBits)
> +    if (!VTTI && MaxTypeBits > Config.VectorBits)
>        return false;
>  
>      // FIXME: handle addsub-type operations!
>  
>      if (IsSimpleLoadStore) {
>        Value *IPtr, *JPtr;
> -      unsigned IAlignment, JAlignment;
> +      unsigned IAlignment, JAlignment, IAddressSpace, JAddressSpace;
>        int64_t OffsetInElmts = 0;
>        if (getPairPtrInfo(I, J, IPtr, JPtr, IAlignment, JAlignment,
> +            IAddressSpace, JAddressSpace,
>              OffsetInElmts) && abs64(OffsetInElmts) == 1) {
> -        if (Config.AlignedOnly) {
> -          Type *aTypeI = isa<StoreInst>(I) ?
> -            cast<StoreInst>(I)->getValueOperand()->getType() :
> I->getType();
> -          Type *aTypeJ = isa<StoreInst>(J) ?
> -            cast<StoreInst>(J)->getValueOperand()->getType() :
> J->getType();
> +        unsigned BottomAlignment = IAlignment;
> +        if (OffsetInElmts < 0) BottomAlignment = JAlignment;
>  
> +        Type *aTypeI = isa<StoreInst>(I) ?
> +          cast<StoreInst>(I)->getValueOperand()->getType() :
> I->getType();
> +        Type *aTypeJ = isa<StoreInst>(J) ?
> +          cast<StoreInst>(J)->getValueOperand()->getType() :
> J->getType();
> +        Type *VType = getVecTypeForPair(aTypeI, aTypeJ);
> +
> +        if (Config.AlignedOnly) {
>            // An aligned load or store is possible only if the
>            instruction
>            // with the lower offset has an alignment suitable for the
>            // vector type.
>  
> -          unsigned BottomAlignment = IAlignment;
> -          if (OffsetInElmts < 0) BottomAlignment = JAlignment;
> -
> -          Type *VType = getVecTypeForPair(aTypeI, aTypeJ);
>            unsigned VecAlignment = TD->getPrefTypeAlignment(VType);
>            if (BottomAlignment < VecAlignment)
>              return false;
>          }
> +
> +        if (VTTI) {
> +          unsigned ICost = VTTI->getMemoryOpCost(I->getOpcode(),
> I->getType(),
> +                                                 IAlignment,
> IAddressSpace);
> +          unsigned JCost = VTTI->getMemoryOpCost(J->getOpcode(),
> J->getType(),
> +                                                 JAlignment,
> JAddressSpace);
> +          unsigned VCost = VTTI->getMemoryOpCost(I->getOpcode(),
> VType,
> +                                                 BottomAlignment,
> +                                                 IAddressSpace);
> +          if (VCost > ICost + JCost)
> +            return false;
> +          CostSavings = ICost + JCost - VCost;
> +        }
>        } else {
>          return false;
>        }
> +    } else if (VTTI) {
> +      unsigned ICost = VTTI->getInstrCost(I->getOpcode(), IT1, IT2);
> +      unsigned JCost = VTTI->getInstrCost(J->getOpcode(), JT1, JT2);
> +      Type *VT1 = getVecTypeForPair(IT1, JT1),
> +           *VT2 = getVecTypeForPair(IT2, JT2);
> +      unsigned VCost = VTTI->getInstrCost(I->getOpcode(), VT1, VT2);
> +
> +      if (VCost > ICost + JCost)
> +        return false;
> +      CostSavings = ICost + JCost - VCost;
>      }
>  
>      // The powi intrinsic is special because only the first argument
>      is
> @@ -845,6 +914,7 @@
>    bool BBVectorize::getCandidatePairs(BasicBlock &BB,
>                         BasicBlock::iterator &Start,
>                         std::multimap<Value *, Value *>
>                         &CandidatePairs,
> +                       DenseMap<ValuePair, int>
> &CandidatePairCostSavings,
>                         std::vector<Value *> &PairableInsts, bool
> NonPow2Len) {
>      BasicBlock::iterator E = BB.end();
>      if (Start == E) return false;
> @@ -881,7 +951,9 @@
>  
>          // J does not use I, and comes before the first use of I, so
>          it can
> be
>          // merged with I if the instructions are compatible.
> -        if (!areInstsCompatible(I, J, IsSimpleLoadStore,
> NonPow2Len))
> continue;
> +        int CostSavings;
> +        if (!areInstsCompatible(I, J, IsSimpleLoadStore, NonPow2Len,
> +            CostSavings)) continue;
>  
>          // J is a candidate for merging with I.
>          if (!PairableInsts.size() ||
> @@ -890,6 +962,9 @@
>          }
>  
>          CandidatePairs.insert(ValuePair(I, J));
> +        if (VTTI)
> +
>          CandidatePairCostSavings.insert(ValuePairWithCost(ValuePair(I,
> J),
> +
>                                                            CostSavings));
>  
>          // The next call to this function must start after the last
> instruction
>          // selected during this invocation.
> @@ -899,7 +974,8 @@
>          }
>  
>          DEBUG(if (DebugCandidateSelection) dbgs() << "BBV: candidate
>          pair "
> -                     << *I << " <-> " << *J << "\n");
> +                     << *I << " <-> " << *J << " (cost savings: " <<
> +                     CostSavings << ")\n");
>  
>          // If we have already found too many pairs, break here and
>          this
> function
>          // will be called again starting after the last instruction
> selected
> @@ -1353,13 +1429,14 @@
>    // pairs, given the choice of root pairs as an iterator range.
>    void BBVectorize::findBestTreeFor(
>                        std::multimap<Value *, Value *>
>                        &CandidatePairs,
> +                      DenseMap<ValuePair, int>
> &CandidatePairCostSavings,
>                        std::vector<Value *> &PairableInsts,
>                        std::multimap<ValuePair, ValuePair>
>                        &ConnectedPairs,
>                        DenseSet<ValuePair> &PairableInstUsers,
>                        std::multimap<ValuePair, ValuePair>
> &PairableInstUserMap,
>                        DenseMap<Value *, Value *> &ChosenPairs,
>                        DenseSet<ValuePair> &BestTree, size_t
>                        &BestMaxDepth,
> -                      size_t &BestEffSize, VPIteratorPair
> ChoiceRange,
> +                      int &BestEffSize, VPIteratorPair ChoiceRange,
>                        bool UseCycleCheck) {
>      for (std::multimap<Value *, Value *>::iterator J =
>      ChoiceRange.first;
>           J != ChoiceRange.second; ++J) {
> @@ -1409,17 +1486,26 @@
>                     PairableInstUsers, PairableInstUserMap,
>                     ChosenPairs,
> Tree,
>                     PrunedTree, *J, UseCycleCheck);
>  
> -      size_t EffSize = 0;
> -      for (DenseSet<ValuePair>::iterator S = PrunedTree.begin(),
> -           E = PrunedTree.end(); S != E; ++S)
> -        EffSize += getDepthFactor(S->first);
> +      int EffSize = 0;
> +      if (VTTI) {
> +        for (DenseSet<ValuePair>::iterator S = PrunedTree.begin(),
> +             E = PrunedTree.end(); S != E; ++S) {
> +          if (getDepthFactor(S->first))
> +            EffSize += CandidatePairCostSavings.find(*S)->second;
> +        }
> +      } else {
> +        for (DenseSet<ValuePair>::iterator S = PrunedTree.begin(),
> +             E = PrunedTree.end(); S != E; ++S)
> +          EffSize += (int) getDepthFactor(S->first);
> +      }
>  
>        DEBUG(if (DebugPairSelection)
>               dbgs() << "BBV: found pruned Tree for pair {"
>               << *J->first << " <-> " << *J->second << "} of depth "
>               <<
>               MaxDepth << " and size " << PrunedTree.size() <<
>              " (effective size: " << EffSize << ")\n");
> -      if (MaxDepth >= Config.ReqChainDepth && EffSize > BestEffSize)
> {
> +      if (MaxDepth >= Config.ReqChainDepth &&
> +          EffSize > 0 && EffSize > BestEffSize) {
>          BestMaxDepth = MaxDepth;
>          BestEffSize = EffSize;
>          BestTree = PrunedTree;
> @@ -1431,6 +1517,7 @@
>    // that will be fused into vector instructions.
>    void BBVectorize::choosePairs(
>                        std::multimap<Value *, Value *>
>                        &CandidatePairs,
> +                      DenseMap<ValuePair, int>
> &CandidatePairCostSavings,
>                        std::vector<Value *> &PairableInsts,
>                        std::multimap<ValuePair, ValuePair>
>                        &ConnectedPairs,
>                        DenseSet<ValuePair> &PairableInstUsers,
> @@ -1447,9 +1534,11 @@
>        VPIteratorPair ChoiceRange = CandidatePairs.equal_range(*I);
>  
>        // The best pair to choose and its tree:
> -      size_t BestMaxDepth = 0, BestEffSize = 0;
> +      size_t BestMaxDepth = 0;
> +      int BestEffSize = 0;
>        DenseSet<ValuePair> BestTree;
> -      findBestTreeFor(CandidatePairs, PairableInsts, ConnectedPairs,
> +      findBestTreeFor(CandidatePairs, CandidatePairCostSavings,
> +                      PairableInsts, ConnectedPairs,
>                        PairableInstUsers, PairableInstUserMap,
>                        ChosenPairs,
>                        BestTree, BestMaxDepth, BestEffSize,
>                        ChoiceRange,
>                        UseCycleCheck);
> @@ -1505,12 +1594,13 @@
>                       Instruction *I, Instruction *J, unsigned o,
>                       bool FlipMemInputs) {
>      Value *IPtr, *JPtr;
> -    unsigned IAlignment, JAlignment;
> +    unsigned IAlignment, JAlignment, IAddressSpace, JAddressSpace;
>      int64_t OffsetInElmts;
>  
>      // Note: the analysis might fail here, that is why FlipMemInputs
>      has
>      // been precomputed (OffsetInElmts must be unused here).
>      (void) getPairPtrInfo(I, J, IPtr, JPtr, IAlignment, JAlignment,
> +                          IAddressSpace, JAddressSpace,
>                            OffsetInElmts);
>  
>      // The pointer value is taken to be the one with the lowest
>      offset.
> @@ -2212,9 +2302,10 @@
>          continue;
>  
>        Value *IPtr, *JPtr;
> -      unsigned IAlignment, JAlignment;
> +      unsigned IAlignment, JAlignment, IAddressSpace, JAddressSpace;
>        int64_t OffsetInElmts;
>        if (!getPairPtrInfo(I, J, IPtr, JPtr, IAlignment, JAlignment,
> +                          IAddressSpace, JAddressSpace,
>                            OffsetInElmts) || abs64(OffsetInElmts) !=
>                            1)
>          llvm_unreachable("Pre-fusion pointer analysis failed");
>  
> 
> Modified: llvm/trunk/test/Transforms/BBVectorize/loop1.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/BBVectorize/l
> oop1.ll?rev=166716&r1=166715&r2=166716&view=diff
> ============================================================================
> ==
> --- llvm/trunk/test/Transforms/BBVectorize/loop1.ll (original)
> +++ llvm/trunk/test/Transforms/BBVectorize/loop1.ll Thu Oct 25
> 16:12:23 2012
> @@ -1,8 +1,11 @@
>  target datalayout =
> "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:6
> 4-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
>  target triple = "x86_64-unknown-linux-gnu"
>  ; RUN: opt < %s -bb-vectorize -bb-vectorize-req-chain-depth=3
>  -instcombine
> -gvn -S | FileCheck %s
> +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -bb-vectorize
> -bb-vectorize-req-chain-depth=3 -instcombine -gvn -S | FileCheck %s
>  ; RUN: opt < %s -basicaa -loop-unroll -unroll-threshold=45
> -unroll-allow-partial -bb-vectorize -bb-vectorize-req-chain-depth=3
> -instcombine -gvn -S | FileCheck %s -check-prefix=CHECK-UNRL
> +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -basicaa
> -loop-unroll
> -unroll-threshold=45 -unroll-allow-partial -bb-vectorize
> -bb-vectorize-req-chain-depth=3 -instcombine -gvn -S | FileCheck %s
> -check-prefix=CHECK-UNRL
>  ; The second check covers the use of alias analysis (with loop
>  unrolling).
> +; Both checks are run with and without target information.
>  
>  define void @test1(double* noalias %out, double* noalias %in1,
>  double*
> noalias %in2) nounwind uwtable {
>  entry:
> 
> Modified: llvm/trunk/test/Transforms/BBVectorize/simple.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/BBVectorize/s
> imple.ll?rev=166716&r1=166715&r2=166716&view=diff
> ============================================================================
> ==
> --- llvm/trunk/test/Transforms/BBVectorize/simple.ll (original)
> +++ llvm/trunk/test/Transforms/BBVectorize/simple.ll Thu Oct 25
> 16:12:23
> 2012
> @@ -1,5 +1,6 @@
>  target datalayout =
> "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:6
> 4-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
>  ; RUN: opt < %s -bb-vectorize -bb-vectorize-req-chain-depth=3
>  -instcombine
> -gvn -S | FileCheck %s
> +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -bb-vectorize
> -bb-vectorize-req-chain-depth=3 -instcombine -gvn -S | FileCheck %s
> -check-prefix=CHECK-TI
>  
>  ; Basic depth-3 chain
>  define double @test1(double %A1, double %A2, double %B1, double %B2)
>  {
> @@ -23,6 +24,9 @@
>  ; CHECK: %R = fmul double %Z1.v.r1, %Z1.v.r2
>  	ret double %R
>  ; CHECK: ret double %R
> +; CHECK-TI: @test1
> +; CHECK-TI: fsub <2 x double>
> +; CHECK-TI: ret double
>  }
>  
>  ; Basic depth-3 chain (last pair permuted)
> @@ -146,6 +150,9 @@
>  ; CHECK: %R = mul <8 x i8> %Q1.v.r1, %Q1.v.r2
>  	ret <8 x i8> %R
>  ; CHECK: ret <8 x i8> %R
> +; CHECK-TI: @test6
> +; CHECK-TI-NOT: sub <16 x i8>
> +; CHECK-TI: ret <8 x i8>
>  }
>  
>  
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> 
> 
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory








More information about the llvm-commits mailing list