[llvm] bd7949b - reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost

Roman Lebedev via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 25 08:52:47 PDT 2022


Ok, let me reword this then.

On Tue, Oct 25, 2022 at 6:13 PM Liu, Yaxun (Sam) <Yaxun.Liu at amd.com> wrote:
>
> [AMD Official Use Only - General]
>
> The patch is intended to fix the worst case scenario where a chain of branch folding happens without considering the accumulated costs. Since now the option -bonus-inst-threshold (default 1) is used to compare with the accumulated cost, some previously folded branches may no longer folded. This causes mixed results in benchmarks. The default value of `-bonus-inst-threshold` may need to be tuned to get most out of this change, e.g, changing it to 2 or 3.

Looks like there wasn't a real consensus in that review either,
and the performance concern was raised and not fully addressed.
I'm guessing the fix is mainly targeting GPU's?
Perhaps we simply shouldn't speculate like that for them?

Anyways, please revert and move to an RFC. Thanks!

> Is there a way for me to run your benchmark to tune this option? Thanks.
Sure.
Clone https://github.com/darktable-org/rawspeed,
build passing -DBUILD_BENCHMARKING=ON at CMake time,
fetch rsync://raw.pixls.us/data-unique/ elsewhere,
see the rest of the magic at the first line of the report
i attached in previous reply.

> Sam
Roman

> -----Original Message-----
> From: Roman Lebedev <lebedev.ri at gmail.com>
> Sent: Monday, October 24, 2022 6:17 PM
> To: Liu, Yaxun (Sam) <Yaxun.Liu at amd.com>
> Cc: llvm-commits at lists.llvm.org; Yaxun Liu <llvmlistbot at llvm.org>; Philip Reames <listmail at philipreames.com>; Nikita Popov <nikita.ppv at gmail.com>
> Subject: Re: [llvm] bd7949b - reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost
>
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> Some more perf numbers for me (attached); on that bench, there's 11 noticeable (>+1%) *statistically significant* regressions but only 4 similar improvements.
>
> On Mon, Oct 24, 2022 at 11:22 PM Liu, Yaxun (Sam) <Yaxun.Liu at amd.com> wrote:
> >
> > [AMD Official Use Only - General]
> >
> > Also, if the middle-end fold the following example:
> >
> > /// E.g. folding
> >   ///  if (cond1) return false;
> >   ///  if (cond2) return false;
> >   ///  return true;
> >   /// into
> >   ///  if (cond1 | cond2) return false;
> >
> > How could the backend unfold this?
> >
> > Sam
> >
> > -----Original Message-----
> > From: Liu, Yaxun (Sam)
> > Sent: Monday, October 24, 2022 4:08 PM
> > To: Roman Lebedev <lebedev.ri at gmail.com>; llvm-commits at lists.llvm.org
> > Cc: Yaxun Liu <llvmlistbot at llvm.org>; Philip Reames
> > <listmail at philipreames.com>; Nikita Popov <nikita.ppv at gmail.com>
> > Subject: RE: [llvm] bd7949b - reland e5581df60a35 [SimplifyCFG]
> > accumulate bonus insts cost
> >
> > The original review is at
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Frevi
> > ews.llvm.org%2FD132408&data=05%7C01%7CYaxun.Liu%40amd.com%7C0f05a7
> > 6ad0024daff53c08dab60d9899%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%
> > 7C638022466780964329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI
> > joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2
> > FBUfQSqpggWFObsBKjWvEky5q%2B4F0G7POvS851HURkw%3D&reserved=0
> >
> > The cost per BB is tracked by ValueMap. If a BB is deleted, it is no longer tracked. Tracking of merged BB only happens of the old BB is 'replaced all uses by' the new BB, which AFAIK not happening for BB's.
> >
> > Sam
> >
> > -----Original Message-----
> > From: Roman Lebedev <lebedev.ri at gmail.com>
> > Sent: Monday, October 24, 2022 3:58 PM
> > To: llvm-commits at lists.llvm.org
> > Cc: Liu, Yaxun (Sam) <Yaxun.Liu at amd.com>; Yaxun Liu
> > <llvmlistbot at llvm.org>; Philip Reames <listmail at philipreames.com>;
> > Nikita Popov <nikita.ppv at gmail.com>
> > Subject: Re: [llvm] bd7949b - reland e5581df60a35 [SimplifyCFG]
> > accumulate bonus insts cost
> >
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> >
> >
> > I was in a rather long AWOL when this was being reviewed, so post-commit it is.
> >
> > I'm really not comfortable with maintaining such a global state, especially because i know for a fact that the existing state, LoopHeaders, is already pretty broken - what should happen when a block is deleted?
> > How do we know we are tracking the right block when they are merged?
> > Etc.
> >
> > Also, i was under a rather strong impression that we decided that aggressively flattening blocks like that is the right decision for the middle-end, and back-end will need to selectively (pun intended) undo this with actual performance characteristics in mind.
> >
> > If you don't agree, i would recommend proceeding through an RFC...
> >
> > (Also, this is missing a link to the review.)
> >
> >
> > Roman.
> >
> > On Mon, Oct 24, 2022 at 10:44 PM Yaxun Liu via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> > >
> > >
> > > Author: Yaxun (Sam) Liu
> > > Date: 2022-10-24T15:43:53-04:00
> > > New Revision: bd7949bcd86633bd4203b2ba6f891aea00fce4d1
> > >
> > > URL:
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > th
> > > ub.com%2Fllvm%2Fllvm-project%2Fcommit%2Fbd7949bcd86633bd4203b2ba6f89
> > > 1a
> > > ea00fce4d1&data=05%7C01%7Cyaxun.liu%40amd.com%7Ce97f2df2126f476e
> > > 3a
> > > f108dab5fa1cb3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63802238
> > > 31
> > > 06379522%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> > > iL
> > > CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Xzj%2FEck1Oi
> > > 7d
> > > MS14twesHRQGR%2F3aAzBppPPEEGdrEpY%3D&reserved=0
> > > DIFF:
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > th
> > > ub.com%2Fllvm%2Fllvm-project%2Fcommit%2Fbd7949bcd86633bd4203b2ba6f89
> > > 1a
> > > ea00fce4d1.diff&data=05%7C01%7Cyaxun.liu%40amd.com%7Ce97f2df2126
> > > f4
> > > 76e3af108dab5fa1cb3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638
> > > 02
> > > 2383106394513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2
> > > lu
> > > MzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fw4X2f5
> > > 1R
> > > SO2GhkKCe4yL%2B0nCZzpVxyB89l2O5TN86c%3D&reserved=0
> > >
> > > LOG: reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost
> > >
> > > Fixed compile time increase due to always constructing LocalCostTracker.
> > > Now only construct LocalCostTracker when needed.
> > >
> > > Added:
> > >
> > >
> > > Modified:
> > >     llvm/include/llvm/Transforms/Utils/Local.h
> > >     llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
> > >     llvm/lib/Transforms/Utils/LoopSimplify.cpp
> > >     llvm/lib/Transforms/Utils/SimplifyCFG.cpp
> > >     llvm/test/Transforms/LoopUnroll/peel-loop-inner.ll
> > >     llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
> > >     llvm/test/Transforms/SimplifyCFG/branch-fold-multiple.ll
> > >     llvm/test/Transforms/SimplifyCFG/branch-fold-threshold.ll
> > >
> > > llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-two-pred
> > > s-
> > > cost.ll
> > >
> > > Removed:
> > >
> > >
> > >
> > > ####################################################################
> > > ## ########## diff  --git
> > > a/llvm/include/llvm/Transforms/Utils/Local.h
> > > b/llvm/include/llvm/Transforms/Utils/Local.h
> > > index 4db697c1ffcec..1397420b2950f 100644
> > > --- a/llvm/include/llvm/Transforms/Utils/Local.h
> > > +++ b/llvm/include/llvm/Transforms/Utils/Local.h
> > > @@ -16,6 +16,7 @@
> > >
> > >  #include "llvm/ADT/ArrayRef.h"
> > >  #include "llvm/IR/Dominators.h"
> > > +#include "llvm/IR/ValueMap.h"
> > >  #include "llvm/Support/CommandLine.h"
> > >  #include "llvm/Transforms/Utils/SimplifyCFGOptions.h"
> > >  #include <cstdint>
> > > @@ -164,6 +165,26 @@ bool
> > > TryToSimplifyUncondBranchFromEmptyBlock(BasicBlock *BB,  /// values, but instcombine orders them so it usually won't matter.
> > >  bool EliminateDuplicatePHINodes(BasicBlock *BB);
> > >
> > > +/// Class to track cost of simplify CFG transformations.
> > > +class SimplifyCFGCostTracker {
> > > +  /// Number of bonus instructions due to folding branches into predecessors.
> > > +  /// E.g. folding
> > > +  ///  if (cond1) return false;
> > > +  ///  if (cond2) return false;
> > > +  ///  return true;
> > > +  /// into
> > > +  ///  if (cond1 | cond2) return false;
> > > +  ///  return true;
> > > +  /// In this case cond2 is always executed whereas originally it
> > > +may be
> > > +  /// evicted due to early exit of cond1. 'cond2' is called bonus
> > > +instructions
> > > +  /// and such bonus instructions could accumulate for unrolled
> > > +loops, therefore
> > > +  /// use a value map to accumulate their costs across transformations.
> > > +  ValueMap<BasicBlock *, unsigned> NumBonusInsts;
> > > +
> > > +public:
> > > +  void updateNumBonusInsts(BasicBlock *Parent, unsigned InstCount);
> > > +  unsigned getNumBonusInsts(BasicBlock *Parent); };
> > >  /// This function is used to do simplification of a CFG.  For
> > > example, it  /// adjusts branches to branches to eliminate the extra
> > > hop, it eliminates  /// unreachable basic blocks, and does other peephole optimization of the CFG.
> > > @@ -174,7 +195,8 @@ extern cl::opt<bool> RequireAndPreserveDomTree;
> > > bool simplifyCFG(BasicBlock *BB, const TargetTransformInfo &TTI,
> > >                   DomTreeUpdater *DTU = nullptr,
> > >                   const SimplifyCFGOptions &Options = {},
> > > -                 ArrayRef<WeakVH> LoopHeaders = {});
> > > +                 ArrayRef<WeakVH> LoopHeaders = {},
> > > +                 SimplifyCFGCostTracker *CostTracker = nullptr);
> > >
> > >  /// This function is used to flatten a CFG. For example, it uses
> > > parallel-and  /// and parallel-or mode to collapse if-conditions and
> > > merge if-regions with @@ -184,7 +206,8 @@ bool FlattenCFG(BasicBlock
> > > *BB, AAResults *AA = nullptr);  /// If this basic block is ONLY a
> > > setcc and a branch, and if a predecessor  /// branches to us and one
> > > of our successors, fold the setcc into the  /// predecessor and use logical operations to pick the right destination.
> > > -bool FoldBranchToCommonDest(BranchInst *BI, llvm::DomTreeUpdater
> > > *DTU = nullptr,
> > > +bool FoldBranchToCommonDest(BranchInst *BI, SimplifyCFGCostTracker &CostTracker,
> > > +                            DomTreeUpdater *DTU = nullptr,
> > >                              MemorySSAUpdater *MSSAU = nullptr,
> > >                              const TargetTransformInfo *TTI = nullptr,
> > >                              unsigned BonusInstThreshold = 1);
> > >
> > > diff  --git a/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
> > > b/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
> > > index fb2d812a186df..e2646eda06c54 100644
> > > --- a/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
> > > +++ b/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
> > > @@ -221,7 +221,8 @@ static bool
> > > tailMergeBlocksWithSimilarFunctionTerminators(Function &F,  /// iterating until no more changes are made.
> > >  static bool iterativelySimplifyCFG(Function &F, const TargetTransformInfo &TTI,
> > >                                     DomTreeUpdater *DTU,
> > > -                                   const SimplifyCFGOptions &Options) {
> > > +                                   const SimplifyCFGOptions &Options,
> > > +                                   SimplifyCFGCostTracker
> > > + &CostTracker) {
> > >    bool Changed = false;
> > >    bool LocalChange = true;
> > >
> > > @@ -252,7 +253,7 @@ static bool iterativelySimplifyCFG(Function &F, const TargetTransformInfo &TTI,
> > >          while (BBIt != F.end() && DTU->isBBPendingDeletion(&*BBIt))
> > >            ++BBIt;
> > >        }
> > > -      if (simplifyCFG(&BB, TTI, DTU, Options, LoopHeaders)) {
> > > +      if (simplifyCFG(&BB, TTI, DTU, Options, LoopHeaders,
> > > + &CostTracker)) {
> > >          LocalChange = true;
> > >          ++NumSimpl;
> > >        }
> > > @@ -266,11 +267,13 @@ static bool simplifyFunctionCFGImpl(Function &F, const TargetTransformInfo &TTI,
> > >                                      DominatorTree *DT,
> > >                                      const SimplifyCFGOptions &Options) {
> > >    DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
> > > +  SimplifyCFGCostTracker CostTracker;
> > >
> > >    bool EverChanged = removeUnreachableBlocks(F, DT ? &DTU : nullptr);
> > >    EverChanged |=
> > >        tailMergeBlocksWithSimilarFunctionTerminators(F, DT ? &DTU :
> > > nullptr);
> > > -  EverChanged |= iterativelySimplifyCFG(F, TTI, DT ? &DTU :
> > > nullptr, Options);
> > > +  EverChanged |=
> > > +      iterativelySimplifyCFG(F, TTI, DT ? &DTU : nullptr, Options,
> > > + CostTracker);
> > >
> > >    // If neither pass changed anything, we're done.
> > >    if (!EverChanged) return false;
> > > @@ -284,7 +287,8 @@ static bool simplifyFunctionCFGImpl(Function &F, const TargetTransformInfo &TTI,
> > >      return true;
> > >
> > >    do {
> > > -    EverChanged = iterativelySimplifyCFG(F, TTI, DT ? &DTU : nullptr, Options);
> > > +    EverChanged = iterativelySimplifyCFG(F, TTI, DT ? &DTU : nullptr, Options,
> > > +                                         CostTracker);
> > >      EverChanged |= removeUnreachableBlocks(F, DT ? &DTU : nullptr);
> > >    } while (EverChanged);
> > >
> > >
> > > diff  --git a/llvm/lib/Transforms/Utils/LoopSimplify.cpp
> > > b/llvm/lib/Transforms/Utils/LoopSimplify.cpp
> > > index 8943b4bb651f6..13f1080ffc044 100644
> > > --- a/llvm/lib/Transforms/Utils/LoopSimplify.cpp
> > > +++ b/llvm/lib/Transforms/Utils/LoopSimplify.cpp
> > > @@ -480,6 +480,7 @@ static bool simplifyOneLoop(Loop *L, SmallVectorImpl<Loop *> &Worklist,
> > >                              DominatorTree *DT, LoopInfo *LI,
> > >                              ScalarEvolution *SE, AssumptionCache *AC,
> > >                              MemorySSAUpdater *MSSAU, bool
> > > PreserveLCSSA) {
> > > +  SimplifyCFGCostTracker CostTracker;
> > >    bool Changed = false;
> > >    if (MSSAU && VerifyMemorySSA)
> > >      MSSAU->getMemorySSA()->verifyMemorySSA();
> > > @@ -661,7 +662,7 @@ static bool simplifyOneLoop(Loop *L, SmallVectorImpl<Loop *> &Worklist,
> > >        // The block has now been cleared of all instructions except for
> > >        // a comparison and a conditional branch. SimplifyCFG may be able
> > >        // to fold it now.
> > > -      if (!FoldBranchToCommonDest(BI, /*DTU=*/nullptr, MSSAU))
> > > +      if (!FoldBranchToCommonDest(BI, CostTracker, /*DTU=*/nullptr,
> > > + MSSAU))
> > >          continue;
> > >
> > >        // Success. The block is now dead, so remove it from the
> > > loop,
> > >
> > > diff  --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
> > > b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
> > > index fcdd85838340d..7008f9b152f7f 100644
> > > --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
> > > +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
> > > @@ -207,6 +207,21 @@ STATISTIC(NumInvokes,
> > > STATISTIC(NumInvokesMerged, "Number of invokes that were merged
> > > together"); STATISTIC(NumInvokeSetsFormed, "Number of invoke sets
> > > that were formed");
> > >
> > > +namespace llvm {
> > > +
> > > +void SimplifyCFGCostTracker::updateNumBonusInsts(BasicBlock *BB,
> > > +                                                 unsigned
> > > +InstCount) {
> > > +  auto Loc = NumBonusInsts.find(BB);
> > > +  if (Loc == NumBonusInsts.end())
> > > +    Loc = NumBonusInsts.insert({BB, 0}).first;
> > > +  Loc->second = Loc->second + InstCount; } unsigned
> > > +SimplifyCFGCostTracker::getNumBonusInsts(BasicBlock *BB) {
> > > +  return NumBonusInsts.lookup(BB);
> > > +}
> > > +
> > > +} // namespace llvm
> > > +
> > >  namespace {
> > >
> > >  // The first field contains the value that the switch produces when
> > > a certain @@ -243,6 +258,10 @@ class SimplifyCFGOpt {
> > >    ArrayRef<WeakVH> LoopHeaders;
> > >    const SimplifyCFGOptions &Options;
> > >    bool Resimplify;
> > > +  // Accumulates number of bonus instructions due to merging basic
> > > + blocks  // of common destination.
> > > +  SimplifyCFGCostTracker *CostTracker;
> > > + std::unique_ptr<SimplifyCFGCostTracker> LocalCostTracker;
> > >
> > >    Value *isValueEqualityComparison(Instruction *TI);
> > >    BasicBlock *GetValueEqualityComparisonCases( @@ -286,8 +305,15 @@
> > > class SimplifyCFGOpt {
> > >  public:
> > >    SimplifyCFGOpt(const TargetTransformInfo &TTI, DomTreeUpdater *DTU,
> > >                   const DataLayout &DL, ArrayRef<WeakVH> LoopHeaders,
> > > -                 const SimplifyCFGOptions &Opts)
> > > +                 const SimplifyCFGOptions &Opts,
> > > +                 SimplifyCFGCostTracker *CostTracker_)
> > >        : TTI(TTI), DTU(DTU), DL(DL), LoopHeaders(LoopHeaders),
> > > Options(Opts) {
> > > +    // Cannot do this with member initializer list since LocalCostTracker is not
> > > +    // initialized there yet.
> > > +    CostTracker = CostTracker_
> > > +                      ? CostTracker_
> > > +                      : (LocalCostTracker.reset(new SimplifyCFGCostTracker()),
> > > +                         LocalCostTracker.get());
> > >      assert((!DTU || !DTU->hasPostDomTree()) &&
> > >             "SimplifyCFG is not yet capable of maintaining validity of a "
> > >             "PostDomTree, so don't ask for it."); @@ -3624,8 +3650,9
> > > @@ static bool isVectorOp(Instruction &I) {  /// If this basic block
> > > is simple enough, and if a predecessor branches to us  /// and one
> > > of our successors, fold the block into the predecessor and use  ///
> > > logical operations to pick the right destination.
> > > -bool llvm::FoldBranchToCommonDest(BranchInst *BI, DomTreeUpdater *DTU,
> > > -                                  MemorySSAUpdater *MSSAU,
> > > +bool llvm::FoldBranchToCommonDest(BranchInst *BI,
> > > +                                  SimplifyCFGCostTracker &CostTracker,
> > > +                                  DomTreeUpdater *DTU,
> > > +MemorySSAUpdater *MSSAU,
> > >                                    const TargetTransformInfo *TTI,
> > >                                    unsigned BonusInstThreshold) {
> > >    // If this block ends with an unconditional branch, @@ -3697,7
> > > +3724,6 @@ bool llvm::FoldBranchToCommonDest(BranchInst *BI,
> > > +DomTreeUpdater *DTU,
> > >    // as "bonus instructions", and only allow this transformation when the
> > >    // number of the bonus instructions we'll need to create when cloning into
> > >    // each predecessor does not exceed a certain threshold.
> > > -  unsigned NumBonusInsts = 0;
> > >    bool SawVectorOp = false;
> > >    const unsigned PredCount = Preds.size();
> > >    for (Instruction &I : *BB) {
> > > @@ -3716,12 +3742,13 @@ bool llvm::FoldBranchToCommonDest(BranchInst *BI, DomTreeUpdater *DTU,
> > >      // predecessor. Ignore free instructions.
> > >      if (!TTI || TTI->getInstructionCost(&I, CostKind) !=
> > >                      TargetTransformInfo::TCC_Free) {
> > > -      NumBonusInsts += PredCount;
> > > -
> > > -      // Early exits once we reach the limit.
> > > -      if (NumBonusInsts >
> > > -          BonusInstThreshold * BranchFoldToCommonDestVectorMultiplier)
> > > -        return false;
> > > +      for (auto PredBB : Preds) {
> > > +        CostTracker.updateNumBonusInsts(PredBB, PredCount);
> > > +        // Early exits once we reach the limit.
> > > +        if (CostTracker.getNumBonusInsts(PredBB) >
> > > +            BonusInstThreshold * BranchFoldToCommonDestVectorMultiplier)
> > > +          return false;
> > > +      }
> > >      }
> > >
> > >      auto IsBCSSAUse = [BB, &I](Use &U) { @@ -3735,10 +3762,12 @@
> > > bool llvm::FoldBranchToCommonDest(BranchInst *BI, DomTreeUpdater *DTU,
> > >      if (!all_of(I.uses(), IsBCSSAUse))
> > >        return false;
> > >    }
> > > -  if (NumBonusInsts >
> > > -      BonusInstThreshold *
> > > -          (SawVectorOp ? BranchFoldToCommonDestVectorMultiplier : 1))
> > > -    return false;
> > > +  for (auto PredBB : Preds) {
> > > +    if (CostTracker.getNumBonusInsts(PredBB) >
> > > +        BonusInstThreshold *
> > > +            (SawVectorOp ? BranchFoldToCommonDestVectorMultiplier : 1))
> > > +      return false;
> > > +  }
> > >
> > >    // Ok, we have the budget. Perform the transformation.
> > >    for (BasicBlock *PredBlock : Preds) { @@ -6889,7 +6918,7 @@ bool
> > > SimplifyCFGOpt::simplifyUncondBranch(BranchInst *BI,
> > >    // branches to us and our successor, fold the comparison into the
> > >    // predecessor and use logical operations to update the incoming value
> > >    // for PHI nodes in common successor.
> > > -  if (FoldBranchToCommonDest(BI, DTU, /*MSSAU=*/nullptr, &TTI,
> > > +  if (FoldBranchToCommonDest(BI, *CostTracker, DTU,
> > > + /*MSSAU=*/nullptr, &TTI,
> > >                               Options.BonusInstThreshold))
> > >      return requestResimplify();
> > >    return false;
> > > @@ -6958,7 +6987,7 @@ bool SimplifyCFGOpt::simplifyCondBranch(BranchInst *BI, IRBuilder<> &Builder) {
> > >    // If this basic block is ONLY a compare and a branch, and if a predecessor
> > >    // branches to us and one of our successors, fold the comparison into the
> > >    // predecessor and use logical operations to pick the right destination.
> > > -  if (FoldBranchToCommonDest(BI, DTU, /*MSSAU=*/nullptr, &TTI,
> > > +  if (FoldBranchToCommonDest(BI, *CostTracker, DTU,
> > > + /*MSSAU=*/nullptr, &TTI,
> > >                               Options.BonusInstThreshold))
> > >      return requestResimplify();
> > >
> > > @@ -7257,8 +7286,9 @@ bool SimplifyCFGOpt::run(BasicBlock *BB) {
> > >
> > >  bool llvm::simplifyCFG(BasicBlock *BB, const TargetTransformInfo &TTI,
> > >                         DomTreeUpdater *DTU, const SimplifyCFGOptions &Options,
> > > -                       ArrayRef<WeakVH> LoopHeaders) {
> > > +                       ArrayRef<WeakVH> LoopHeaders,
> > > +                       SimplifyCFGCostTracker *CostTracker) {
> > >    return SimplifyCFGOpt(TTI, DTU, BB->getModule()->getDataLayout(), LoopHeaders,
> > > -                        Options)
> > > +                        Options, CostTracker)
> > >        .run(BB);
> > >  }
> > >
> > > diff  --git a/llvm/test/Transforms/LoopUnroll/peel-loop-inner.ll
> > > b/llvm/test/Transforms/LoopUnroll/peel-loop-inner.ll
> > > index fa39b77aae36a..40493b48dfe05 100644
> > > --- a/llvm/test/Transforms/LoopUnroll/peel-loop-inner.ll
> > > +++ b/llvm/test/Transforms/LoopUnroll/peel-loop-inner.ll
> > > @@ -1,5 +1,5 @@
> > >  ; NOTE: Assertions have been autogenerated by
> > > utils/update_test_checks.py -; RUN: opt < %s -S
> > > -passes='require<opt-remark-emit>,loop-unroll<peeling;no-runtime>,si
> > > mp lifycfg,instcombine' -unroll-force-peel-count=3 -verify-dom-info
> > > | FileCheck %s
> > > +; RUN: opt < %s -S
> > > +-passes='require<opt-remark-emit>,loop-unroll<peeling;no-runtime>,s
> > > +im plifycfg<bonus-inst-threshold=3>,instcombine'
> > > +-unroll-force-peel-count=3 -verify-dom-info | FileCheck %s
> > >
> > >  define void @basic(i32 %K, i32 %N) {  ; CHECK-LABEL: @basic(
> > >
> > > diff  --git
> > > a/llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.l
> > > l
> > > b/llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.l
> > > l index 05ef21d5a1123..c126dbcd6ca96 100644
> > > ---
> > > a/llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.l
> > > l
> > > +++ b/llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logic
> > > +++ al
> > > +++ .ll
> > > @@ -1,5 +1,5 @@
> > >  ; NOTE: Assertions have been autogenerated by
> > > utils/update_test_checks.py -; RUN: opt -O2 -S < %s | FileCheck %s
> > > +; RUN: opt -bonus-inst-threshold=4 -O2 -S < %s | FileCheck %s
> > >
> > >  target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
> > >  target triple = "x86_64--"
> > >
> > > diff  --git
> > > a/llvm/test/Transforms/SimplifyCFG/branch-fold-multiple.ll
> > > b/llvm/test/Transforms/SimplifyCFG/branch-fold-multiple.ll
> > > index fd400632a5916..f332cf82b5573 100644
> > > --- a/llvm/test/Transforms/SimplifyCFG/branch-fold-multiple.ll
> > > +++ b/llvm/test/Transforms/SimplifyCFG/branch-fold-multiple.ll
> > > @@ -3,9 +3,12 @@
> > >
> > >  %struct.S = type { [4 x i32] }
> > >
> > > -; Check the second, third, and fourth basic blocks are folded into
> > > -; the first basic block since each has one bonus intruction, which
> > > -; does not exceed the default bouns instruction threshold of 1.
> > > +; Check the second basic block is folded into the first basic block
> > > +; since it has one bonus intruction. The third basic block is not ;
> > > +folded into the first basic block since the accumulated bonus ;
> > > +instructions will exceed the default threshold of 1. The fourth
> > > +basic ; block is foled into the third basic block since the
> > > +accumulated ; bonus instruction cost is 1.
> > >
> > >  define i1 @test1(i32 %0, i32 %1, i32 %2, i32 %3) {  ; CHECK-LABEL:
> > > @test1( @@ -15,14 +18,18 @@ define i1 @test1(i32 %0, i32 %1, i32 %2,
> > > i32 %3) {
> > >  ; CHECK-NEXT:    [[MUL1:%.*]] = mul i32 [[TMP1:%.*]], [[TMP1]]
> > >  ; CHECK-NEXT:    [[CMP2_1:%.*]] = icmp sgt i32 [[MUL1]], 0
> > >  ; CHECK-NEXT:    [[OR_COND:%.*]] = select i1 [[CMP2]], i1 true, i1 [[CMP2_1]]
> > > +; CHECK-NEXT:    br i1 [[OR_COND]], label [[CLEANUP:%.*]], label [[FOR_COND_1:%.*]]
> > > +; CHECK:       for.cond.1:
> > >  ; CHECK-NEXT:    [[MUL2:%.*]] = mul i32 [[TMP2:%.*]], [[TMP2]]
> > >  ; CHECK-NEXT:    [[CMP2_2:%.*]] = icmp sgt i32 [[MUL2]], 0
> > > -; CHECK-NEXT:    [[OR_COND1:%.*]] = select i1 [[OR_COND]], i1 true, i1 [[CMP2_2]]
> > >  ; CHECK-NEXT:    [[MUL3:%.*]] = mul i32 [[TMP3:%.*]], [[TMP3]]
> > >  ; CHECK-NEXT:    [[CMP2_3:%.*]] = icmp sgt i32 [[MUL3]], 0
> > > -; CHECK-NEXT:    [[OR_COND2:%.*]] = select i1 [[OR_COND1]], i1 true, i1 [[CMP2_3]]
> > > -; CHECK-NEXT:    [[SPEC_SELECT:%.*]] = select i1 [[OR_COND2]], i1 false, i1 true
> > > -; CHECK-NEXT:    ret i1 [[SPEC_SELECT]]
> > > +; CHECK-NEXT:    [[OR_COND1:%.*]] = select i1 [[CMP2_2]], i1 true, i1 [[CMP2_3]]
> > > +; CHECK-NEXT:    [[SPEC_SELECT:%.*]] = select i1 [[OR_COND1]], i1 false, i1 true
> > > +; CHECK-NEXT:    br label [[CLEANUP]]
> > > +; CHECK:       cleanup:
> > > +; CHECK-NEXT:    [[CMP:%.*]] = phi i1 [ false, [[ENTRY:%.*]] ], [ [[SPEC_SELECT]], [[FOR_COND_1]] ]
> > > +; CHECK-NEXT:    ret i1 [[CMP]]
> > >  ;
> > >  entry:
> > >    %mul0 = mul i32 %0, %0
> > >
> > > diff  --git
> > > a/llvm/test/Transforms/SimplifyCFG/branch-fold-threshold.ll
> > > b/llvm/test/Transforms/SimplifyCFG/branch-fold-threshold.ll
> > > index c1fd267aef93f..0482fa57227f6 100644
> > > --- a/llvm/test/Transforms/SimplifyCFG/branch-fold-threshold.ll
> > > +++ b/llvm/test/Transforms/SimplifyCFG/branch-fold-threshold.ll
> > > @@ -1,9 +1,9 @@
> > >  ; RUN: opt %s -simplifycfg
> > > -simplifycfg-require-and-preserve-domtree=1 -S | FileCheck %s
> > > --check-prefix=NORMAL -; RUN: opt %s -simplifycfg
> > > -simplifycfg-require-and-preserve-domtree=1 -S
> > > -bonus-inst-threshold=2
> > > | FileCheck %s --check-prefix=AGGRESSIVE -; RUN: opt %s -simplifycfg
> > > -simplifycfg-require-and-preserve-domtree=1 -S
> > > -bonus-inst-threshold=4
> > > | FileCheck %s --check-prefix=WAYAGGRESSIVE
> > > +; RUN: opt %s -simplifycfg
> > > +-simplifycfg-require-and-preserve-domtree=1 -S
> > > +-bonus-inst-threshold=3 | FileCheck %s --check-prefix=AGGRESSIVE ;
> > > +RUN: opt %s -simplifycfg
> > > +-simplifycfg-require-and-preserve-domtree=1
> > > +-S -bonus-inst-threshold=6 | FileCheck %s
> > > +--check-prefix=WAYAGGRESSIVE
> > >  ; RUN: opt %s -passes=simplifycfg -S | FileCheck %s
> > > --check-prefix=NORMAL -; RUN: opt %s
> > > -passes='simplifycfg<bonus-inst-threshold=2>' -S | FileCheck %s
> > > --check-prefix=AGGRESSIVE -; RUN: opt %s
> > > -passes='simplifycfg<bonus-inst-threshold=4>' -S | FileCheck %s
> > > --check-prefix=WAYAGGRESSIVE
> > > +; RUN: opt %s -passes='simplifycfg<bonus-inst-threshold=3>' -S |
> > > +FileCheck %s --check-prefix=AGGRESSIVE ; RUN: opt %s
> > > +-passes='simplifycfg<bonus-inst-threshold=6>' -S | FileCheck %s
> > > +--check-prefix=WAYAGGRESSIVE
> > >
> > >  define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d, i32* %input) {  ;
> > > NORMAL-LABEL: @foo(
> > >
> > > diff  --git
> > > a/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-two-pr
> > > ed
> > > s-cost.ll
> > > b/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-two-pr
> > > ed s-cost.ll index 71b8ef5c7612c..6b8ebd9054dc6 100644
> > > ---
> > > a/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-two-pr
> > > ed
> > > s-cost.ll
> > > +++ b/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-tw
> > > +++ o-
> > > +++ preds-cost.ll
> > > @@ -1,6 +1,6 @@
> > >  ; NOTE: Assertions have been autogenerated by
> > > utils/update_test_checks.py  ; RUN: opt < %s -S -simplifycfg
> > > -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=1
> > > | FileCheck --check-prefixes=ALL,THR1 %s -; RUN: opt < %s -S
> > > -simplifycfg -simplifycfg-require-and-preserve-domtree=1
> > > -bonus-inst-threshold=2 | FileCheck --check-prefixes=ALL,THR2 %s
> > > +; RUN: opt < %s -S -simplifycfg
> > > +-simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=3
> > > +| FileCheck --check-prefixes=ALL,THR2 %s
> > >
> > >  declare void @sideeffect0()
> > >  declare void @sideeffect1()
> > > @@ -10,7 +10,7 @@ declare i1 @gen1()
> > >
> > >  ; Here we'd want to duplicate %v3_adj into two predecessors,  ; but
> > > -bonus-inst-threshold=1 says that we can only clone it into one.
> > > -; With -bonus-inst-threshold=2 we can clone it into both though.
> > > +; With -bonus-inst-threshold=3 we can clone it into both though.
> > >  define void @two_preds_with_extra_op(i8 %v0, i8 %v1, i8 %v2, i8
> > > %v3) {  ; THR1-LABEL: @two_preds_with_extra_op(  ; THR1-NEXT:  entry:
> > >
> > >
> > >
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at lists.llvm.org
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> > > st
> > > s.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-commits&data=05
> > > %7
> > > C01%7Cyaxun.liu%40amd.com%7Ce97f2df2126f476e3af108dab5fa1cb3%7C3dd89
> > > 61
> > > fe4884e608e11a82d994e183d%7C0%7C0%7C638022383106404511%7CUnknown%7CT
> > > WF
> > > pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> > > 6M
> > > n0%3D%7C3000%7C%7C%7C&sdata=Qs1bProsrzvPwVl1r7KQBNbZ2bMWIg%2BYX6
> > > S5
> > > %2FT1u%2FV0%3D&reserved=0


More information about the llvm-commits mailing list