[llvm-commits] [llvm] r166716 - in /llvm/trunk: lib/Transforms/Vectorize/BBVectorize.cpp test/Transforms/BBVectorize/loop1.ll test/Transforms/BBVectorize/simple.ll
Hal Finkel
hfinkel at anl.gov
Fri Oct 26 12:18:29 PDT 2012
----- Original Message -----
> From: "Nadav Rotem" <nrotem at apple.com>
> To: "Jim Grosbach" <grosbach at apple.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, llvm-commits at cs.uiuc.edu
> Sent: Friday, October 26, 2012 1:50:29 PM
> Subject: Re: [llvm-commits] [llvm] r166716 - in /llvm/trunk: lib/Transforms/Vectorize/BBVectorize.cpp
> test/Transforms/BBVectorize/loop1.ll test/Transforms/BBVectorize/simple.ll
>
> I spoke with Daniel and also read Jim's email now. I agree that this
> is the way to go. I will commit the directory structure to the loop
> vectorizer tests soon.
I'll do the same for the BBVectorize tests...
-Hal
>
> Thanks,
> Nadav
>
> On Oct 26, 2012, at 11:16 AM, Jim Grosbach <grosbach at apple.com>
> wrote:
>
> >
> > On Oct 26, 2012, at 7:45 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> >
> >> Dave,
> >>
> >> If I had to guess, we probably need to make sure that the relevant
> >> backend is builtin to run the tests. The X86 target is not built
> >> on the ARM bots, right? There is no way to REQUIRE for a
> >> particular RUN line, right?
> >
> > Right. That's typically handled by a target-specific sub-directory
> > in the tests. See the lit config files in the subdirectories
> > test/CodeGen and test/MC for examples.
> >
> > -Jim
> >
> >>
> >> Thanks again,
> >> Hal
> >>
> >> ----- Original Message -----
> >>> From: "David Tweed" <david.tweed at arm.com>
> >>> To: "Hal Finkel" <hfinkel at anl.gov>, llvm-commits at cs.uiuc.edu
> >>> Sent: Friday, October 26, 2012 3:52:39 AM
> >>> Subject: RE: [llvm-commits] [llvm] r166716 - in /llvm/trunk:
> >>> lib/Transforms/Vectorize/BBVectorize.cpp
> >>> test/Transforms/BBVectorize/loop1.ll
> >>> test/Transforms/BBVectorize/simple.ll
> >>>
> >>> Hi,
> >>>
> >>> I've noticed that both on my ARM test machine and the public ARM
> >>> buildbots
> >>> these tests BBVectorize/simple.ll and cost-model.ll have started
> >>> to
> >>> appear
> >>> as failures. From the error messages it's not exactly clear why.
> >>>
> >>> Thanks,
> >>> Dave
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: llvm-commits-bounces at cs.uiuc.edu
> >>> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel
> >>> Sent: 25 October 2012 22:12
> >>> To: llvm-commits at cs.uiuc.edu
> >>> Subject: [llvm-commits] [llvm] r166716 - in /llvm/trunk:
> >>> lib/Transforms/Vectorize/BBVectorize.cpp
> >>> test/Transforms/BBVectorize/loop1.ll
> >>> test/Transforms/BBVectorize/simple.ll
> >>>
> >>> Author: hfinkel
> >>> Date: Thu Oct 25 16:12:23 2012
> >>> New Revision: 166716
> >>>
> >>> URL: http://llvm.org/viewvc/llvm-project?rev=166716&view=rev
> >>> Log:
> >>> Begin incorporating target information into BBVectorize.
> >>>
> >>> This is the first of several steps to incorporate information
> >>> from
> >>> the new
> >>> TargetTransformInfo infrastructure into BBVectorize. Two things
> >>> are
> >>> done
> >>> here:
> >>>
> >>> 1. Target information is used to determine if it is profitable to
> >>> fuse two
> >>> instructions. This means that the cost of the vector operation
> >>> must not
> >>> be more expensive than the cost of the two original operations.
> >>> Pairs
> >>> that
> >>> are not profitable are no longer considered (because current
> >>> cost
> >>> information
> >>> is incomplete, for intrinsics for example, equal-cost pairs are
> >>> still
> >>> considered).
> >>>
> >>> 2. The 'cost savings' computed for the profitability check are
> >>> also
> >>> used to
> >>> rank the DAGs that represent the potential vectorization plans.
> >>> Specifically,
> >>> for nodes of non-trivial depth, the cost savings is used as the
> >>> node
> >>> weight.
> >>>
> >>> The next step will be to incorporate the shuffle costs into the
> >>> DAG
> >>> weighting;
> >>> this will give the edges of the DAG weights as well. Once that is
> >>> done, when
> >>> target information is available, we should be able to dispense
> >>> with
> >>> the
> >>> depth heuristic.
> >>>
> >>> Modified:
> >>> llvm/trunk/lib/Transforms/Vectorize/BBVectorize.cpp
> >>> llvm/trunk/test/Transforms/BBVectorize/loop1.ll
> >>> llvm/trunk/test/Transforms/BBVectorize/simple.ll
> >>>
> >>> Modified: llvm/trunk/lib/Transforms/Vectorize/BBVectorize.cpp
> >>> URL:
> >>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/BBVe
> >>> ctorize.cpp?rev=166716&r1=166715&r2=166716&view=diff
> >>> ============================================================================
> >>> ==
> >>> --- llvm/trunk/lib/Transforms/Vectorize/BBVectorize.cpp
> >>> (original)
> >>> +++ llvm/trunk/lib/Transforms/Vectorize/BBVectorize.cpp Thu Oct
> >>> 25
> >>> 16:12:23
> >>> 2012
> >>> @@ -43,12 +43,17 @@
> >>> #include "llvm/Support/raw_ostream.h"
> >>> #include "llvm/Support/ValueHandle.h"
> >>> #include "llvm/DataLayout.h"
> >>> +#include "llvm/TargetTransformInfo.h"
> >>> #include "llvm/Transforms/Utils/Local.h"
> >>> #include "llvm/Transforms/Vectorize.h"
> >>> #include <algorithm>
> >>> #include <map>
> >>> using namespace llvm;
> >>>
> >>> +static cl::opt<bool>
> >>> +IgnoreTargetInfo("bb-vectorize-ignore-target-info",
> >>> cl::init(false),
> >>> + cl::Hidden, cl::desc("Ignore target information"));
> >>> +
> >>> static cl::opt<unsigned>
> >>> ReqChainDepth("bb-vectorize-req-chain-depth", cl::init(6),
> >>> cl::Hidden,
> >>> cl::desc("The required chain depth for vectorization"));
> >>> @@ -181,9 +186,13 @@
> >>> DT = &P->getAnalysis<DominatorTree>();
> >>> SE = &P->getAnalysis<ScalarEvolution>();
> >>> TD = P->getAnalysisIfAvailable<DataLayout>();
> >>> + TTI = IgnoreTargetInfo ? 0 :
> >>> + P->getAnalysisIfAvailable<TargetTransformInfo>();
> >>> + VTTI = TTI ? TTI->getVectorTargetTransformInfo() : 0;
> >>> }
> >>>
> >>> typedef std::pair<Value *, Value *> ValuePair;
> >>> + typedef std::pair<ValuePair, int> ValuePairWithCost;
> >>> typedef std::pair<ValuePair, size_t> ValuePairWithDepth;
> >>> typedef std::pair<ValuePair, ValuePair> VPPair; // A ValuePair
> >>> pair
> >>> typedef std::pair<std::multimap<Value *, Value *>::iterator,
> >>> @@ -196,6 +205,8 @@
> >>> DominatorTree *DT;
> >>> ScalarEvolution *SE;
> >>> DataLayout *TD;
> >>> + TargetTransformInfo *TTI;
> >>> + const VectorTargetTransformInfo *VTTI;
> >>>
> >>> // FIXME: const correct?
> >>>
> >>> @@ -204,6 +215,7 @@
> >>> bool getCandidatePairs(BasicBlock &BB,
> >>> BasicBlock::iterator &Start,
> >>> std::multimap<Value *, Value *>
> >>> &CandidatePairs,
> >>> + DenseMap<ValuePair, int>
> >>> &CandidatePairCostSavings,
> >>> std::vector<Value *> &PairableInsts, bool
> >>> NonPow2Len);
> >>>
> >>> void computeConnectedPairs(std::multimap<Value *, Value *>
> >>> &CandidatePairs,
> >>> @@ -216,6 +228,7 @@
> >>> DenseSet<ValuePair> &PairableInstUsers);
> >>>
> >>> void choosePairs(std::multimap<Value *, Value *>
> >>> &CandidatePairs,
> >>> + DenseMap<ValuePair, int>
> >>> &CandidatePairCostSavings,
> >>> std::vector<Value *> &PairableInsts,
> >>> std::multimap<ValuePair, ValuePair>
> >>> &ConnectedPairs,
> >>> DenseSet<ValuePair> &PairableInstUsers,
> >>> @@ -228,7 +241,8 @@
> >>> bool isInstVectorizable(Instruction *I, bool
> >>> &IsSimpleLoadStore);
> >>>
> >>> bool areInstsCompatible(Instruction *I, Instruction *J,
> >>> - bool IsSimpleLoadStore, bool NonPow2Len);
> >>> + bool IsSimpleLoadStore, bool NonPow2Len,
> >>> + int &CostSavings);
> >>>
> >>> bool trackUsesOfI(DenseSet<Value *> &Users,
> >>> AliasSetTracker &WriteSet, Instruction *I,
> >>> @@ -270,13 +284,14 @@
> >>>
> >>> void findBestTreeFor(
> >>> std::multimap<Value *, Value *>
> >>> &CandidatePairs,
> >>> + DenseMap<ValuePair, int>
> >>> &CandidatePairCostSavings,
> >>> std::vector<Value *> &PairableInsts,
> >>> std::multimap<ValuePair, ValuePair>
> >>> &ConnectedPairs,
> >>> DenseSet<ValuePair> &PairableInstUsers,
> >>> std::multimap<ValuePair, ValuePair>
> >>> &PairableInstUserMap,
> >>> DenseMap<Value *, Value *> &ChosenPairs,
> >>> DenseSet<ValuePair> &BestTree, size_t
> >>> &BestMaxDepth,
> >>> - size_t &BestEffSize, VPIteratorPair
> >>> ChoiceRange,
> >>> + int &BestEffSize, VPIteratorPair
> >>> ChoiceRange,
> >>> bool UseCycleCheck);
> >>>
> >>> Value *getReplacementPointerInput(LLVMContext& Context,
> >>> Instruction *I,
> >>> @@ -339,13 +354,16 @@
> >>> return false;
> >>> }
> >>>
> >>> + DEBUG(if (VTTI) dbgs() << "BBV: using target
> >>> information\n");
> >>> +
> >>> bool changed = false;
> >>> // Iterate a sufficient number of times to merge types of
> >>> size
> >>> 1 bit,
> >>> // then 2 bits, then 4, etc. up to half of the target vector
> >>> width of
> >>> the
> >>> // target vector register.
> >>> unsigned n = 1;
> >>> for (unsigned v = 2;
> >>> - v <= Config.VectorBits && (!Config.MaxIter || n <=
> >>> Config.MaxIter);
> >>> + (VTTI || v <= Config.VectorBits) &&
> >>> + (!Config.MaxIter || n <= Config.MaxIter);
> >>> v *= 2, ++n) {
> >>> DEBUG(dbgs() << "BBV: fusing loop #" << n <<
> >>> " for " << BB.getName() << " in " <<
> >>> @@ -375,6 +393,9 @@
> >>> DT = &getAnalysis<DominatorTree>();
> >>> SE = &getAnalysis<ScalarEvolution>();
> >>> TD = getAnalysisIfAvailable<DataLayout>();
> >>> + TTI = IgnoreTargetInfo ? 0 :
> >>> + getAnalysisIfAvailable<TargetTransformInfo>();
> >>> + VTTI = TTI ? TTI->getVectorTargetTransformInfo() : 0;
> >>>
> >>> return vectorizeBB(BB);
> >>> }
> >>> @@ -427,6 +448,10 @@
> >>> T2 = cast<CastInst>(I)->getSrcTy();
> >>> else
> >>> T2 = T1;
> >>> +
> >>> + if (SelectInst *SI = dyn_cast<SelectInst>(I)) {
> >>> + T2 = SI->getCondition()->getType();
> >>> + }
> >>> }
> >>>
> >>> // Returns the weight associated with the provided value. A
> >>> chain of
> >>> @@ -465,18 +490,25 @@
> >>> // directly after J.
> >>> bool getPairPtrInfo(Instruction *I, Instruction *J,
> >>> Value *&IPtr, Value *&JPtr, unsigned &IAlignment, unsigned
> >>> &JAlignment,
> >>> + unsigned &IAddressSpace, unsigned &JAddressSpace,
> >>> int64_t &OffsetInElmts) {
> >>> OffsetInElmts = 0;
> >>> - if (isa<LoadInst>(I)) {
> >>> - IPtr = cast<LoadInst>(I)->getPointerOperand();
> >>> - JPtr = cast<LoadInst>(J)->getPointerOperand();
> >>> - IAlignment = cast<LoadInst>(I)->getAlignment();
> >>> - JAlignment = cast<LoadInst>(J)->getAlignment();
> >>> + if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
> >>> + LoadInst *LJ = cast<LoadInst>(J);
> >>> + IPtr = LI->getPointerOperand();
> >>> + JPtr = LJ->getPointerOperand();
> >>> + IAlignment = LI->getAlignment();
> >>> + JAlignment = LJ->getAlignment();
> >>> + IAddressSpace = LI->getPointerAddressSpace();
> >>> + JAddressSpace = LJ->getPointerAddressSpace();
> >>> } else {
> >>> - IPtr = cast<StoreInst>(I)->getPointerOperand();
> >>> - JPtr = cast<StoreInst>(J)->getPointerOperand();
> >>> - IAlignment = cast<StoreInst>(I)->getAlignment();
> >>> - JAlignment = cast<StoreInst>(J)->getAlignment();
> >>> + StoreInst *SI = cast<StoreInst>(I), *SJ =
> >>> cast<StoreInst>(J);
> >>> + IPtr = SI->getPointerOperand();
> >>> + JPtr = SJ->getPointerOperand();
> >>> + IAlignment = SI->getAlignment();
> >>> + JAlignment = SJ->getAlignment();
> >>> + IAddressSpace = SI->getPointerAddressSpace();
> >>> + JAddressSpace = SJ->getPointerAddressSpace();
> >>> }
> >>>
> >>> const SCEV *IPtrSCEV = SE->getSCEV(IPtr);
> >>> @@ -562,7 +594,9 @@
> >>> do {
> >>> std::vector<Value *> PairableInsts;
> >>> std::multimap<Value *, Value *> CandidatePairs;
> >>> + DenseMap<ValuePair, int> CandidatePairCostSavings;
> >>> ShouldContinue = getCandidatePairs(BB, Start,
> >>> CandidatePairs,
> >>> +
> >>> CandidatePairCostSavings,
> >>> PairableInsts,
> >>> NonPow2Len);
> >>> if (PairableInsts.empty()) continue;
> >>>
> >>> @@ -590,7 +624,8 @@
> >>> // variables.
> >>>
> >>> DenseMap<Value *, Value *> ChosenPairs;
> >>> - choosePairs(CandidatePairs, PairableInsts, ConnectedPairs,
> >>> + choosePairs(CandidatePairs, CandidatePairCostSavings,
> >>> + PairableInsts, ConnectedPairs,
> >>> PairableInstUsers, ChosenPairs);
> >>>
> >>> if (ChosenPairs.empty()) continue;
> >>> @@ -679,15 +714,22 @@
> >>> !(VectorType::isValidElementType(T2) || T2->isVectorTy()))
> >>> return false;
> >>>
> >>> - if (T1->getScalarSizeInBits() == 1 &&
> >>> T2->getScalarSizeInBits()
> >>> == 1) {
> >>> + if (T1->getScalarSizeInBits() == 1) {
> >>> if (!Config.VectorizeBools)
> >>> return false;
> >>> } else {
> >>> - if (!Config.VectorizeInts
> >>> - && (T1->isIntOrIntVectorTy() ||
> >>> T2->isIntOrIntVectorTy()))
> >>> + if (!Config.VectorizeInts && T1->isIntOrIntVectorTy())
> >>> return false;
> >>> }
> >>> -
> >>> +
> >>> + if (T2->getScalarSizeInBits() == 1) {
> >>> + if (!Config.VectorizeBools)
> >>> + return false;
> >>> + } else {
> >>> + if (!Config.VectorizeInts && T2->isIntOrIntVectorTy())
> >>> + return false;
> >>> + }
> >>> +
> >>> if (!Config.VectorizeFloats
> >>> && (T1->isFPOrFPVectorTy() || T2->isFPOrFPVectorTy()))
> >>> return false;
> >>> @@ -703,8 +745,8 @@
> >>> T2->getScalarType()->isPointerTy()))
> >>> return false;
> >>>
> >>> - if (T1->getPrimitiveSizeInBits() >= Config.VectorBits ||
> >>> - T2->getPrimitiveSizeInBits() >= Config.VectorBits)
> >>> + if (!VTTI && (T1->getPrimitiveSizeInBits() >=
> >>> Config.VectorBits
> >>> ||
> >>> + T2->getPrimitiveSizeInBits() >=
> >>> Config.VectorBits))
> >>> return false;
> >>>
> >>> return true;
> >>> @@ -715,10 +757,13 @@
> >>> // that I has already been determined to be vectorizable and
> >>> that
> >>> J is
> >>> not
> >>> // in the use tree of I.
> >>> bool BBVectorize::areInstsCompatible(Instruction *I, Instruction
> >>> *J,
> >>> - bool IsSimpleLoadStore, bool NonPow2Len)
> >>> {
> >>> + bool IsSimpleLoadStore, bool NonPow2Len,
> >>> + int &CostSavings) {
> >>> DEBUG(if (DebugInstructionExamination) dbgs() << "BBV: looking
> >>> at " <<
> >>> *I <<
> >>> " <-> " << *J << "\n");
> >>>
> >>> + CostSavings = 0;
> >>> +
> >>> // Loads and stores can be merged if they have different
> >>> alignments,
> >>> // but are otherwise the same.
> >>> if (!J->isSameOperationAs(I,
> >>> Instruction::CompareIgnoringAlignment |
> >>> @@ -731,38 +776,62 @@
> >>> unsigned MaxTypeBits = std::max(
> >>> IT1->getPrimitiveSizeInBits() +
> >>> JT1->getPrimitiveSizeInBits(),
> >>> IT2->getPrimitiveSizeInBits() +
> >>> JT2->getPrimitiveSizeInBits());
> >>> - if (MaxTypeBits > Config.VectorBits)
> >>> + if (!VTTI && MaxTypeBits > Config.VectorBits)
> >>> return false;
> >>>
> >>> // FIXME: handle addsub-type operations!
> >>>
> >>> if (IsSimpleLoadStore) {
> >>> Value *IPtr, *JPtr;
> >>> - unsigned IAlignment, JAlignment;
> >>> + unsigned IAlignment, JAlignment, IAddressSpace,
> >>> JAddressSpace;
> >>> int64_t OffsetInElmts = 0;
> >>> if (getPairPtrInfo(I, J, IPtr, JPtr, IAlignment, JAlignment,
> >>> + IAddressSpace, JAddressSpace,
> >>> OffsetInElmts) && abs64(OffsetInElmts) == 1) {
> >>> - if (Config.AlignedOnly) {
> >>> - Type *aTypeI = isa<StoreInst>(I) ?
> >>> - cast<StoreInst>(I)->getValueOperand()->getType() :
> >>> I->getType();
> >>> - Type *aTypeJ = isa<StoreInst>(J) ?
> >>> - cast<StoreInst>(J)->getValueOperand()->getType() :
> >>> J->getType();
> >>> + unsigned BottomAlignment = IAlignment;
> >>> + if (OffsetInElmts < 0) BottomAlignment = JAlignment;
> >>>
> >>> + Type *aTypeI = isa<StoreInst>(I) ?
> >>> + cast<StoreInst>(I)->getValueOperand()->getType() :
> >>> I->getType();
> >>> + Type *aTypeJ = isa<StoreInst>(J) ?
> >>> + cast<StoreInst>(J)->getValueOperand()->getType() :
> >>> J->getType();
> >>> + Type *VType = getVecTypeForPair(aTypeI, aTypeJ);
> >>> +
> >>> + if (Config.AlignedOnly) {
> >>> // An aligned load or store is possible only if the
> >>> instruction
> >>> // with the lower offset has an alignment suitable for
> >>> the
> >>> // vector type.
> >>>
> >>> - unsigned BottomAlignment = IAlignment;
> >>> - if (OffsetInElmts < 0) BottomAlignment = JAlignment;
> >>> -
> >>> - Type *VType = getVecTypeForPair(aTypeI, aTypeJ);
> >>> unsigned VecAlignment = TD->getPrefTypeAlignment(VType);
> >>> if (BottomAlignment < VecAlignment)
> >>> return false;
> >>> }
> >>> +
> >>> + if (VTTI) {
> >>> + unsigned ICost = VTTI->getMemoryOpCost(I->getOpcode(),
> >>> I->getType(),
> >>> + IAlignment,
> >>> IAddressSpace);
> >>> + unsigned JCost = VTTI->getMemoryOpCost(J->getOpcode(),
> >>> J->getType(),
> >>> + JAlignment,
> >>> JAddressSpace);
> >>> + unsigned VCost = VTTI->getMemoryOpCost(I->getOpcode(),
> >>> VType,
> >>> +
> >>> BottomAlignment,
> >>> + IAddressSpace);
> >>> + if (VCost > ICost + JCost)
> >>> + return false;
> >>> + CostSavings = ICost + JCost - VCost;
> >>> + }
> >>> } else {
> >>> return false;
> >>> }
> >>> + } else if (VTTI) {
> >>> + unsigned ICost = VTTI->getInstrCost(I->getOpcode(), IT1,
> >>> IT2);
> >>> + unsigned JCost = VTTI->getInstrCost(J->getOpcode(), JT1,
> >>> JT2);
> >>> + Type *VT1 = getVecTypeForPair(IT1, JT1),
> >>> + *VT2 = getVecTypeForPair(IT2, JT2);
> >>> + unsigned VCost = VTTI->getInstrCost(I->getOpcode(), VT1,
> >>> VT2);
> >>> +
> >>> + if (VCost > ICost + JCost)
> >>> + return false;
> >>> + CostSavings = ICost + JCost - VCost;
> >>> }
> >>>
> >>> // The powi intrinsic is special because only the first
> >>> argument
> >>> is
> >>> @@ -845,6 +914,7 @@
> >>> bool BBVectorize::getCandidatePairs(BasicBlock &BB,
> >>> BasicBlock::iterator &Start,
> >>> std::multimap<Value *, Value *>
> >>> &CandidatePairs,
> >>> + DenseMap<ValuePair, int>
> >>> &CandidatePairCostSavings,
> >>> std::vector<Value *> &PairableInsts, bool
> >>> NonPow2Len) {
> >>> BasicBlock::iterator E = BB.end();
> >>> if (Start == E) return false;
> >>> @@ -881,7 +951,9 @@
> >>>
> >>> // J does not use I, and comes before the first use of I,
> >>> so
> >>> it can
> >>> be
> >>> // merged with I if the instructions are compatible.
> >>> - if (!areInstsCompatible(I, J, IsSimpleLoadStore,
> >>> NonPow2Len))
> >>> continue;
> >>> + int CostSavings;
> >>> + if (!areInstsCompatible(I, J, IsSimpleLoadStore,
> >>> NonPow2Len,
> >>> + CostSavings)) continue;
> >>>
> >>> // J is a candidate for merging with I.
> >>> if (!PairableInsts.size() ||
> >>> @@ -890,6 +962,9 @@
> >>> }
> >>>
> >>> CandidatePairs.insert(ValuePair(I, J));
> >>> + if (VTTI)
> >>> +
> >>> CandidatePairCostSavings.insert(ValuePairWithCost(ValuePair(I,
> >>> J),
> >>> +
> >>> CostSavings));
> >>>
> >>> // The next call to this function must start after the
> >>> last
> >>> instruction
> >>> // selected during this invocation.
> >>> @@ -899,7 +974,8 @@
> >>> }
> >>>
> >>> DEBUG(if (DebugCandidateSelection) dbgs() << "BBV:
> >>> candidate
> >>> pair "
> >>> - << *I << " <-> " << *J << "\n");
> >>> + << *I << " <-> " << *J << " (cost savings:
> >>> " <<
> >>> + CostSavings << ")\n");
> >>>
> >>> // If we have already found too many pairs, break here and
> >>> this
> >>> function
> >>> // will be called again starting after the last
> >>> instruction
> >>> selected
> >>> @@ -1353,13 +1429,14 @@
> >>> // pairs, given the choice of root pairs as an iterator range.
> >>> void BBVectorize::findBestTreeFor(
> >>> std::multimap<Value *, Value *>
> >>> &CandidatePairs,
> >>> + DenseMap<ValuePair, int>
> >>> &CandidatePairCostSavings,
> >>> std::vector<Value *> &PairableInsts,
> >>> std::multimap<ValuePair, ValuePair>
> >>> &ConnectedPairs,
> >>> DenseSet<ValuePair> &PairableInstUsers,
> >>> std::multimap<ValuePair, ValuePair>
> >>> &PairableInstUserMap,
> >>> DenseMap<Value *, Value *> &ChosenPairs,
> >>> DenseSet<ValuePair> &BestTree, size_t
> >>> &BestMaxDepth,
> >>> - size_t &BestEffSize, VPIteratorPair
> >>> ChoiceRange,
> >>> + int &BestEffSize, VPIteratorPair
> >>> ChoiceRange,
> >>> bool UseCycleCheck) {
> >>> for (std::multimap<Value *, Value *>::iterator J =
> >>> ChoiceRange.first;
> >>> J != ChoiceRange.second; ++J) {
> >>> @@ -1409,17 +1486,26 @@
> >>> PairableInstUsers, PairableInstUserMap,
> >>> ChosenPairs,
> >>> Tree,
> >>> PrunedTree, *J, UseCycleCheck);
> >>>
> >>> - size_t EffSize = 0;
> >>> - for (DenseSet<ValuePair>::iterator S = PrunedTree.begin(),
> >>> - E = PrunedTree.end(); S != E; ++S)
> >>> - EffSize += getDepthFactor(S->first);
> >>> + int EffSize = 0;
> >>> + if (VTTI) {
> >>> + for (DenseSet<ValuePair>::iterator S =
> >>> PrunedTree.begin(),
> >>> + E = PrunedTree.end(); S != E; ++S) {
> >>> + if (getDepthFactor(S->first))
> >>> + EffSize +=
> >>> CandidatePairCostSavings.find(*S)->second;
> >>> + }
> >>> + } else {
> >>> + for (DenseSet<ValuePair>::iterator S =
> >>> PrunedTree.begin(),
> >>> + E = PrunedTree.end(); S != E; ++S)
> >>> + EffSize += (int) getDepthFactor(S->first);
> >>> + }
> >>>
> >>> DEBUG(if (DebugPairSelection)
> >>> dbgs() << "BBV: found pruned Tree for pair {"
> >>> << *J->first << " <-> " << *J->second << "} of depth
> >>> "
> >>> <<
> >>> MaxDepth << " and size " << PrunedTree.size() <<
> >>> " (effective size: " << EffSize << ")\n");
> >>> - if (MaxDepth >= Config.ReqChainDepth && EffSize >
> >>> BestEffSize)
> >>> {
> >>> + if (MaxDepth >= Config.ReqChainDepth &&
> >>> + EffSize > 0 && EffSize > BestEffSize) {
> >>> BestMaxDepth = MaxDepth;
> >>> BestEffSize = EffSize;
> >>> BestTree = PrunedTree;
> >>> @@ -1431,6 +1517,7 @@
> >>> // that will be fused into vector instructions.
> >>> void BBVectorize::choosePairs(
> >>> std::multimap<Value *, Value *>
> >>> &CandidatePairs,
> >>> + DenseMap<ValuePair, int>
> >>> &CandidatePairCostSavings,
> >>> std::vector<Value *> &PairableInsts,
> >>> std::multimap<ValuePair, ValuePair>
> >>> &ConnectedPairs,
> >>> DenseSet<ValuePair> &PairableInstUsers,
> >>> @@ -1447,9 +1534,11 @@
> >>> VPIteratorPair ChoiceRange = CandidatePairs.equal_range(*I);
> >>>
> >>> // The best pair to choose and its tree:
> >>> - size_t BestMaxDepth = 0, BestEffSize = 0;
> >>> + size_t BestMaxDepth = 0;
> >>> + int BestEffSize = 0;
> >>> DenseSet<ValuePair> BestTree;
> >>> - findBestTreeFor(CandidatePairs, PairableInsts,
> >>> ConnectedPairs,
> >>> + findBestTreeFor(CandidatePairs, CandidatePairCostSavings,
> >>> + PairableInsts, ConnectedPairs,
> >>> PairableInstUsers, PairableInstUserMap,
> >>> ChosenPairs,
> >>> BestTree, BestMaxDepth, BestEffSize,
> >>> ChoiceRange,
> >>> UseCycleCheck);
> >>> @@ -1505,12 +1594,13 @@
> >>> Instruction *I, Instruction *J, unsigned o,
> >>> bool FlipMemInputs) {
> >>> Value *IPtr, *JPtr;
> >>> - unsigned IAlignment, JAlignment;
> >>> + unsigned IAlignment, JAlignment, IAddressSpace,
> >>> JAddressSpace;
> >>> int64_t OffsetInElmts;
> >>>
> >>> // Note: the analysis might fail here, that is why
> >>> FlipMemInputs
> >>> has
> >>> // been precomputed (OffsetInElmts must be unused here).
> >>> (void) getPairPtrInfo(I, J, IPtr, JPtr, IAlignment,
> >>> JAlignment,
> >>> + IAddressSpace, JAddressSpace,
> >>> OffsetInElmts);
> >>>
> >>> // The pointer value is taken to be the one with the lowest
> >>> offset.
> >>> @@ -2212,9 +2302,10 @@
> >>> continue;
> >>>
> >>> Value *IPtr, *JPtr;
> >>> - unsigned IAlignment, JAlignment;
> >>> + unsigned IAlignment, JAlignment, IAddressSpace,
> >>> JAddressSpace;
> >>> int64_t OffsetInElmts;
> >>> if (!getPairPtrInfo(I, J, IPtr, JPtr, IAlignment,
> >>> JAlignment,
> >>> + IAddressSpace, JAddressSpace,
> >>> OffsetInElmts) || abs64(OffsetInElmts)
> >>> !=
> >>> 1)
> >>> llvm_unreachable("Pre-fusion pointer analysis failed");
> >>>
> >>>
> >>> Modified: llvm/trunk/test/Transforms/BBVectorize/loop1.ll
> >>> URL:
> >>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/BBVectorize/l
> >>> oop1.ll?rev=166716&r1=166715&r2=166716&view=diff
> >>> ============================================================================
> >>> ==
> >>> --- llvm/trunk/test/Transforms/BBVectorize/loop1.ll (original)
> >>> +++ llvm/trunk/test/Transforms/BBVectorize/loop1.ll Thu Oct 25
> >>> 16:12:23 2012
> >>> @@ -1,8 +1,11 @@
> >>> target datalayout =
> >>> "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:6
> >>> 4-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
> >>> target triple = "x86_64-unknown-linux-gnu"
> >>> ; RUN: opt < %s -bb-vectorize -bb-vectorize-req-chain-depth=3
> >>> -instcombine
> >>> -gvn -S | FileCheck %s
> >>> +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -bb-vectorize
> >>> -bb-vectorize-req-chain-depth=3 -instcombine -gvn -S | FileCheck
> >>> %s
> >>> ; RUN: opt < %s -basicaa -loop-unroll -unroll-threshold=45
> >>> -unroll-allow-partial -bb-vectorize
> >>> -bb-vectorize-req-chain-depth=3
> >>> -instcombine -gvn -S | FileCheck %s -check-prefix=CHECK-UNRL
> >>> +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -basicaa
> >>> -loop-unroll
> >>> -unroll-threshold=45 -unroll-allow-partial -bb-vectorize
> >>> -bb-vectorize-req-chain-depth=3 -instcombine -gvn -S | FileCheck
> >>> %s
> >>> -check-prefix=CHECK-UNRL
> >>> ; The second check covers the use of alias analysis (with loop
> >>> unrolling).
> >>> +; Both checks are run with and without target information.
> >>>
> >>> define void @test1(double* noalias %out, double* noalias %in1,
> >>> double*
> >>> noalias %in2) nounwind uwtable {
> >>> entry:
> >>>
> >>> Modified: llvm/trunk/test/Transforms/BBVectorize/simple.ll
> >>> URL:
> >>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/BBVectorize/s
> >>> imple.ll?rev=166716&r1=166715&r2=166716&view=diff
> >>> ============================================================================
> >>> ==
> >>> --- llvm/trunk/test/Transforms/BBVectorize/simple.ll (original)
> >>> +++ llvm/trunk/test/Transforms/BBVectorize/simple.ll Thu Oct 25
> >>> 16:12:23
> >>> 2012
> >>> @@ -1,5 +1,6 @@
> >>> target datalayout =
> >>> "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:6
> >>> 4-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
> >>> ; RUN: opt < %s -bb-vectorize -bb-vectorize-req-chain-depth=3
> >>> -instcombine
> >>> -gvn -S | FileCheck %s
> >>> +; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -bb-vectorize
> >>> -bb-vectorize-req-chain-depth=3 -instcombine -gvn -S | FileCheck
> >>> %s
> >>> -check-prefix=CHECK-TI
> >>>
> >>> ; Basic depth-3 chain
> >>> define double @test1(double %A1, double %A2, double %B1, double
> >>> %B2)
> >>> {
> >>> @@ -23,6 +24,9 @@
> >>> ; CHECK: %R = fmul double %Z1.v.r1, %Z1.v.r2
> >>> ret double %R
> >>> ; CHECK: ret double %R
> >>> +; CHECK-TI: @test1
> >>> +; CHECK-TI: fsub <2 x double>
> >>> +; CHECK-TI: ret double
> >>> }
> >>>
> >>> ; Basic depth-3 chain (last pair permuted)
> >>> @@ -146,6 +150,9 @@
> >>> ; CHECK: %R = mul <8 x i8> %Q1.v.r1, %Q1.v.r2
> >>> ret <8 x i8> %R
> >>> ; CHECK: ret <8 x i8> %R
> >>> +; CHECK-TI: @test6
> >>> +; CHECK-TI-NOT: sub <16 x i8>
> >>> +; CHECK-TI: ret <8 x i8>
> >>> }
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> llvm-commits mailing list
> >>> llvm-commits at cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >> --
> >> Hal Finkel
> >> Postdoctoral Appointee
> >> Leadership Computing Facility
> >> Argonne National Laboratory
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-commits
mailing list