<div dir="ltr">Philip, Carrot, I think this discussion should be moved to the original patch review thread which had some context that is missing here...<div><br></div><div>Over there, Eli already suggested that this would be better done late, and pointed out that we actually *already* have a pass that tries to do exactly this late. Also, I've just written up concrete benchmarks there which this patch very significantly regresses (>2x regression for me).</div><div><br></div><div>Posting here mostly to try to corral the discussion in a single place and let others know that we root-caused real performance issues to this patch as well.</div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, Dec 22, 2017 at 2:34 PM Carrot Wei via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">It will be a too big hammer to write a new pass for this task, unless<br>

we can see significant benefits.<br>

Current method is a natural extension to existing if-conversion heuristic.<br>

<br>

On Fri, Dec 22, 2017 at 1:22 PM, Philip Reames<br>

<<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>> wrote:<br>

> It's entirely possible to write a target independent transform or lowing<br>

> over MI.  Did you look at that option?<br>

><br>

><br>

><br>

> On 12/22/2017 12:11 PM, Carrot Wei wrote:<br>

>><br>

>> Interesting transform!<br>

>> But it still generates a basic block contains a long dependence chain,<br>

>> so it has similar performance as current cmov method.<br>

>><br>

>> For the reverse transform of select in later pass, efriedma has<br>

>> similar comment. This optimization can impact multiple platforms, at<br>

>> least we have observed it on both x86 and ppc. So it should be handled<br>

>> in a target independent pass.<br>

>><br>

>> On Fri, Dec 22, 2017 at 11:38 AM, Philip Reames<br>

>> <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>> wrote:<br>

>>><br>

>>> Reading through your test cases, I noticed a case where I think this<br>

>>> change<br>

>>> leads to an unprofitable result.  I'm not objecting to the change - it<br>

>>> overall seems reasonable - but maybe there's a way to improve here?<br>

>>><br>

>>> Specifically, this test:<br>

>>><br>

>>><br>

>>> +define i64 @test2(i64** %pp, i64* %p) {<br>

>>> +entry:<br>

>>> +  %0 = load i64*, i64** %pp, align 8<br>

>>> +  %1 = load i64, i64* %0, align 8<br>

>>> +  %cmp = icmp slt i64 %1, 0<br>

>>> +  %pint = ptrtoint i64* %p to i64<br>

>>> +  br i1 %cmp, label %cond.true, label %cond.false<br>

>>> +<br>

>>> +cond.true:<br>

>>> +  %p1 = add i64 %pint, 8<br>

>>> +  br label %cond.end<br>

>>> +<br>

>>> +cond.false:<br>

>>> +  %p2 = add i64 %pint, 16<br>

>>> +  br label %cond.end<br>

>>> +<br>

>>> +cond.end:<br>

>>> +  %p3 = phi i64 [%p1, %cond.true], [%p2, %cond.false]<br>

>>> +  %ptr = inttoptr i64 %p3 to i64*<br>

>>> +  %val = load i64, i64* %ptr, align 8<br>

>>> +  ret i64 %val<br>

>>> +<br>

>>> +; CHECK-LABEL: @test2<br>

>>> +; CHECK-NOT: select<br>

>>> +}<br>

>>><br>

>>> Using the select form, this can be rewritten as:<br>

>>><br>

>>> +define i64 @test2(i64** %pp, i64* %p) {<br>

>>> +entry:<br>

>>> +  %0 = load i64*, i64** %pp, align 8<br>

>>> +  %1 = load i64, i64* %0, align 8<br>

>>> +  %cmp = icmp slt i64 %1, 0<br>

>>> +  %pint = ptrtoint i64* %p to i64<br>

>>>     %cmp.ext = zext i1 %cmp to i64<br>

>>>     %shift = shl i64 8, i64 %cmp.ext<br>

>>>     %p3 = add i64 %ptr, %shift<br>

>>> +  %ptr = inttoptr i64 %p3 to i64*<br>

>>> +  %val = load i64, i64* %ptr, align 8<br>

>>> +  ret i64 %val<br>

>>> }<br>

>>><br>

>>> And then this whole sequence becomes an addressing mode on x86:<br>

>>><br>

>>>     %shift = shl i64 8, i64 %cmp.ext<br>

>>>     %p3 = add i64 %ptr, %shift<br>

>>> +  %ptr = inttoptr i64 %p3 to i64*<br>

>>> +  %val = load i64, i64* %ptr, align 8<br>

>>><br>

>>> Out of curiosity, did you consider trying to improve the lowering of<br>

>>> select<br>

>>> instead?  It seems like the cost model you use here would let you make<br>

>>> pretty reasonable choices to convert the select back to a branch if<br>

>>> needed.<br>

>>><br>

>>> Philip<br>

>>><br>

>>><br>

>>> On 12/22/2017 10:54 AM, Guozhi Wei via llvm-commits wrote:<br>

>>>><br>

>>>> Author: carrot<br>

>>>> Date: Fri Dec 22 10:54:04 2017<br>

>>>> New Revision: 321377<br>

>>>><br>

>>>> URL: <a href="http://llvm.org/viewvc/llvm-project?rev=321377&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=321377&view=rev</a><br>

>>>> Log:<br>

>>>> [SimplifyCFG] Don't do if-conversion if there is a long dependence chain<br>

>>>><br>

>>>> If after if-conversion, most of the instructions in this new BB<br>

>>>> construct<br>

>>>> a long and slow dependence chain, it may be slower than cmp/branch, even<br>

>>>> if<br>

>>>> the branch has a high miss rate, because the control dependence is<br>

>>>> transformed into data dependence, and control dependence can be<br>

>>>> speculated,<br>

>>>> and thus, the second part can execute in parallel with the first part on<br>

>>>> modern OOO processor.<br>

>>>><br>

>>>> This patch checks for the long dependence chain, and give up<br>

>>>> if-conversion<br>

>>>> if find one.<br>

>>>><br>

>>>> Differential Revision: <a href="https://reviews.llvm.org/D39352" rel="noreferrer" target="_blank">https://reviews.llvm.org/D39352</a><br>

>>>><br>

>>>><br>

>>>> Added:<br>

>>>>       llvm/trunk/test/Transforms/SimplifyCFG/X86/if-conversion.ll<br>

>>>> Modified:<br>

>>>>       llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h<br>

>>>>       llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h<br>

>>>>       llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h<br>

>>>>       llvm/trunk/lib/Analysis/TargetTransformInfo.cpp<br>

>>>>       llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp<br>

>>>><br>

>>>> Modified: llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h<br>

>>>> URL:<br>

>>>><br>

>>>> <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h?rev=321377&r1=321376&r2=321377&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h?rev=321377&r1=321376&r2=321377&view=diff</a><br>

>>>><br>

>>>><br>

>>>> ==============================================================================<br>

>>>> --- llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h (original)<br>

>>>> +++ llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h Fri Dec 22<br>

>>>> 10:54:04 2017<br>

>>>> @@ -646,6 +646,9 @@ public:<br>

>>>>      /// \brief Additional properties of an operand's values.<br>

>>>>      enum OperandValueProperties { OP_None = 0, OP_PowerOf2 = 1 };<br>

>>>>    +  /// \return True if target can execute instructions out of order.<br>

>>>> +  bool isOutOfOrder() const;<br>

>>>> +<br>

>>>>      /// \return The number of scalar or vector registers that the<br>

>>>> target<br>

>>>> has.<br>

>>>>      /// If 'Vectors' is true, it returns the number of vector<br>

>>>> registers.<br>

>>>> If it is<br>

>>>>      /// set to false, it returns the number of scalar registers.<br>

>>>> @@ -1018,6 +1021,7 @@ public:<br>

>>>>                                Type *Ty) = 0;<br>

>>>>      virtual int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const<br>

>>>> APInt<br>

>>>> &Imm,<br>

>>>>                                Type *Ty) = 0;<br>

>>>> +  virtual bool isOutOfOrder() const = 0;<br>

>>>>      virtual unsigned getNumberOfRegisters(bool Vector) = 0;<br>

>>>>      virtual unsigned getRegisterBitWidth(bool Vector) const = 0;<br>

>>>>      virtual unsigned getMinVectorRegisterBitWidth() = 0;<br>

>>>> @@ -1295,6 +1299,9 @@ public:<br>

>>>>                        Type *Ty) override {<br>

>>>>        return Impl.getIntImmCost(IID, Idx, Imm, Ty);<br>

>>>>      }<br>

>>>> +  bool isOutOfOrder() const override {<br>

>>>> +    return Impl.isOutOfOrder();<br>

>>>> +  }<br>

>>>>      unsigned getNumberOfRegisters(bool Vector) override {<br>

>>>>        return Impl.getNumberOfRegisters(Vector);<br>

>>>>      }<br>

>>>><br>

>>>> Modified: llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h<br>

>>>> URL:<br>

>>>><br>

>>>> <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h?rev=321377&r1=321376&r2=321377&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h?rev=321377&r1=321376&r2=321377&view=diff</a><br>

>>>><br>

>>>><br>

>>>> ==============================================================================<br>

>>>> --- llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h<br>

>>>> (original)<br>

>>>> +++ llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h Fri Dec<br>

>>>> 22<br>

>>>> 10:54:04 2017<br>

>>>> @@ -337,6 +337,8 @@ public:<br>

>>>>        return TTI::TCC_Free;<br>

>>>>      }<br>

>>>>    +  bool isOutOfOrder() const { return false; }<br>

>>>> +<br>

>>>>      unsigned getNumberOfRegisters(bool Vector) { return 8; }<br>

>>>>        unsigned getRegisterBitWidth(bool Vector) const { return 32; }<br>

>>>><br>

>>>> Modified: llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h<br>

>>>> URL:<br>

>>>><br>

>>>> <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h?rev=321377&r1=321376&r2=321377&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h?rev=321377&r1=321376&r2=321377&view=diff</a><br>

>>>><br>

>>>><br>

>>>> ==============================================================================<br>

>>>> --- llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h (original)<br>

>>>> +++ llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h Fri Dec 22 10:54:04<br>

>>>> 2017<br>

>>>> @@ -402,6 +402,10 @@ public:<br>

>>>>        return BaseT::getInstructionLatency(I);<br>

>>>>      }<br>

>>>>    +  bool isOutOfOrder() const {<br>

>>>> +    return getST()->getSchedModel().isOutOfOrder();<br>

>>>> +  }<br>

>>>> +<br>

>>>>      /// @}<br>

>>>>        /// \name Vector TTI Implementations<br>

>>>><br>

>>>> Modified: llvm/trunk/lib/Analysis/TargetTransformInfo.cpp<br>

>>>> URL:<br>

>>>><br>

>>>> <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/TargetTransformInfo.cpp?rev=321377&r1=321376&r2=321377&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/TargetTransformInfo.cpp?rev=321377&r1=321376&r2=321377&view=diff</a><br>

>>>><br>

>>>><br>

>>>> ==============================================================================<br>

>>>> --- llvm/trunk/lib/Analysis/TargetTransformInfo.cpp (original)<br>

>>>> +++ llvm/trunk/lib/Analysis/TargetTransformInfo.cpp Fri Dec 22 10:54:04<br>

>>>> 2017<br>

>>>> @@ -314,6 +314,10 @@ int TargetTransformInfo::getIntImmCost(I<br>

>>>>      return Cost;<br>

>>>>    }<br>

>>>>    +bool TargetTransformInfo::isOutOfOrder() const {<br>

>>>> +  return TTIImpl->isOutOfOrder();<br>

>>>> +}<br>

>>>> +<br>

>>>>    unsigned TargetTransformInfo::getNumberOfRegisters(bool Vector) const<br>

>>>> {<br>

>>>>      return TTIImpl->getNumberOfRegisters(Vector);<br>

>>>>    }<br>

>>>><br>

>>>> Modified: llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp<br>

>>>> URL:<br>

>>>><br>

>>>> <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp?rev=321377&r1=321376&r2=321377&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp?rev=321377&r1=321376&r2=321377&view=diff</a><br>

>>>><br>

>>>><br>

>>>> ==============================================================================<br>

>>>> --- llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp (original)<br>

>>>> +++ llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp Fri Dec 22 10:54:04<br>

>>>> 2017<br>

>>>> @@ -127,6 +127,16 @@ static cl::opt<unsigned> MaxSpeculationD<br>

>>>>        cl::desc("Limit maximum recursion depth when calculating costs of<br>

>>>> "<br>

>>>>                 "speculatively executed instructions"));<br>

>>>>    +static cl::opt<unsigned> DependenceChainLatency(<br>

>>>> +    "dependence-chain-latency", cl::Hidden, cl::init(8),<br>

>>>> +    cl::desc("Limit the maximum latency of dependence chain containing<br>

>>>> cmp "<br>

>>>> +             "for if conversion"));<br>

>>>> +<br>

>>>> +static cl::opt<unsigned> SmallBBSize(<br>

>>>> +    "small-bb-size", cl::Hidden, cl::init(40),<br>

>>>> +    cl::desc("Check dependence chain latency only in basic block<br>

>>>> smaller<br>

>>>> than "<br>

>>>> +             "this number"));<br>

>>>> +<br>

>>>>    STATISTIC(NumBitMaps, "Number of switch instructions turned into<br>

>>>> bitmaps");<br>

>>>>    STATISTIC(NumLinearMaps,<br>

>>>>              "Number of switch instructions turned into linear<br>

>>>> mapping");<br>

>>>> @@ -395,6 +405,166 @@ static bool DominatesMergePoint(Value *V<br>

>>>>      return true;<br>

>>>>    }<br>

>>>>    +/// Estimate the code size of the specified BB.<br>

>>>> +static unsigned CountBBCodeSize(BasicBlock *BB,<br>

>>>> +                                const TargetTransformInfo &TTI) {<br>

>>>> +  unsigned Size = 0;<br>

>>>> +  for (auto II = BB->begin(); !isa<TerminatorInst>(II); ++II)<br>

>>>> +    Size += TTI.getInstructionCost(&(*II),<br>

>>>> TargetTransformInfo::TCK_CodeSize);<br>

>>>> +  return Size;<br>

>>>> +}<br>

>>>> +<br>

>>>> +/// Find out the latency of the longest dependence chain in the BB if<br>

>>>> +/// LongestChain is true, or the dependence chain containing the<br>

>>>> compare<br>

>>>> +/// instruction feeding the block's conditional branch.<br>

>>>> +static unsigned FindDependenceChainLatency(BasicBlock *BB,<br>

>>>> +                            DenseMap<Instruction *, unsigned><br>

>>>> &Instructions,<br>

>>>> +                            const TargetTransformInfo &TTI,<br>

>>>> +                            bool LongestChain) {<br>

>>>> +  unsigned MaxLatency = 0;<br>

>>>> +<br>

>>>> +  BasicBlock::iterator II;<br>

>>>> +  for (II = BB->begin(); !isa<TerminatorInst>(II); ++II) {<br>

>>>> +    unsigned Latency = 0;<br>

>>>> +    for (unsigned O = 0, E = II->getNumOperands(); O != E; ++O) {<br>

>>>> +      Instruction *Op = dyn_cast<Instruction>(II->getOperand(O));<br>

>>>> +      if (Op && Instructions.count(Op)) {<br>

>>>> +        auto OpLatency = Instructions[Op];<br>

>>>> +        if (OpLatency > Latency)<br>

>>>> +          Latency = OpLatency;<br>

>>>> +      }<br>

>>>> +    }<br>

>>>> +    Latency += TTI.getInstructionCost(&(*II),<br>

>>>> TargetTransformInfo::TCK_Latency);<br>

>>>> +    Instructions[&(*II)] = Latency;<br>

>>>> +<br>

>>>> +    if (Latency > MaxLatency)<br>

>>>> +      MaxLatency = Latency;<br>

>>>> +  }<br>

>>>> +<br>

>>>> +  if (LongestChain)<br>

>>>> +    return MaxLatency;<br>

>>>> +<br>

>>>> +  // The length of the dependence chain containing the compare<br>

>>>> instruction is<br>

>>>> +  // wanted, so the terminator must be a BranchInst.<br>

>>>> +  assert(isa<BranchInst>(II));<br>

>>>> +  BranchInst* Br = cast<BranchInst>(II);<br>

>>>> +  Instruction *Cmp = dyn_cast<Instruction>(Br->getCondition());<br>

>>>> +  if (Cmp && Instructions.count(Cmp))<br>

>>>> +    return Instructions[Cmp];<br>

>>>> +  else<br>

>>>> +    return 0;<br>

>>>> +}<br>

>>>> +<br>

>>>> +/// Instructions in BB2 may depend on instructions in BB1, and<br>

>>>> instructions<br>

>>>> +/// in BB1 may have users in BB2. If the last (in terms of latency)<br>

>>>> such<br>

>>>> kind<br>

>>>> +/// of instruction in BB1 is I, then the instructions after I can be<br>

>>>> executed<br>

>>>> +/// in parallel with instructions in BB2.<br>

>>>> +/// This function returns the latency of I.<br>

>>>> +static unsigned LatencyAdjustment(BasicBlock *BB1, BasicBlock *BB2,<br>

>>>> +                        BasicBlock *IfBlock1, BasicBlock *IfBlock2,<br>

>>>> +                        DenseMap<Instruction *, unsigned><br>

>>>> &BB1Instructions) {<br>

>>>> +  unsigned LastLatency = 0;<br>

>>>> +  SmallVector<Instruction *, 16> Worklist;<br>

>>>> +  BasicBlock::iterator II;<br>

>>>> +  for (II = BB2->begin(); !isa<TerminatorInst>(II); ++II) {<br>

>>>> +    if (PHINode *PN = dyn_cast<PHINode>(II)) {<br>

>>>> +      // Look for users in BB2.<br>

>>>> +      bool InBBUser = false;<br>

>>>> +      for (User *U : PN->users()) {<br>

>>>> +        if (cast<Instruction>(U)->getParent() == BB2) {<br>

>>>> +          InBBUser = true;<br>

>>>> +          break;<br>

>>>> +        }<br>

>>>> +      }<br>

>>>> +      // No such user, we don't care about this instruction and its<br>

>>>> operands.<br>

>>>> +      if (!InBBUser)<br>

>>>> +        break;<br>

>>>> +    }<br>

>>>> +    Worklist.push_back(&(*II));<br>

>>>> +  }<br>

>>>> +<br>

>>>> +  while (!Worklist.empty()) {<br>

>>>> +    Instruction *I = Worklist.pop_back_val();<br>

>>>> +    for (unsigned O = 0, E = I->getNumOperands(); O != E; ++O) {<br>

>>>> +      if (Instruction *Op = dyn_cast<Instruction>(I->getOperand(O))) {<br>

>>>> +        if (Op->getParent() == IfBlock1 || Op->getParent() == IfBlock2)<br>

>>>> +          Worklist.push_back(Op);<br>

>>>> +        else if (Op->getParent() == BB1 && BB1Instructions.count(Op)) {<br>

>>>> +          if (BB1Instructions[Op] > LastLatency)<br>

>>>> +            LastLatency = BB1Instructions[Op];<br>

>>>> +        }<br>

>>>> +      }<br>

>>>> +    }<br>

>>>> +  }<br>

>>>> +<br>

>>>> +  return LastLatency;<br>

>>>> +}<br>

>>>> +<br>

>>>> +/// If after if conversion, most of the instructions in this new BB<br>

>>>> construct a<br>

>>>> +/// long and slow dependence chain, it may be slower than cmp/branch,<br>

>>>> even<br>

>>>> +/// if the branch has a high miss rate, because the control dependence<br>

>>>> is<br>

>>>> +/// transformed into data dependence, and control dependence can be<br>

>>>> speculated,<br>

>>>> +/// and thus, the second part can execute in parallel with the first<br>

>>>> part<br>

>>>> on<br>

>>>> +/// modern OOO processor.<br>

>>>> +///<br>

>>>> +/// To check this condition, this function finds the length of the<br>

>>>> dependence<br>

>>>> +/// chain in BB1 (only the part that can be executed in parallel with<br>

>>>> code after<br>

>>>> +/// branch in BB2) containing cmp, and if the length is longer than a<br>

>>>> threshold,<br>

>>>> +/// don't perform if conversion.<br>

>>>> +///<br>

>>>> +/// BB1, BB2, IfBlock1 and IfBlock2 are candidate BBs for if<br>

>>>> conversion.<br>

>>>> +/// SpeculationSize contains the code size of IfBlock1 and IfBlock2.<br>

>>>> +static bool FindLongDependenceChain(BasicBlock *BB1, BasicBlock *BB2,<br>

>>>> +                             BasicBlock *IfBlock1, BasicBlock<br>

>>>> *IfBlock2,<br>

>>>> +                             unsigned SpeculationSize,<br>

>>>> +                             const TargetTransformInfo &TTI) {<br>

>>>> +  // Accumulated latency of each instruction in their BBs.<br>

>>>> +  DenseMap<Instruction *, unsigned> BB1Instructions;<br>

>>>> +  DenseMap<Instruction *, unsigned> BB2Instructions;<br>

>>>> +<br>

>>>> +  if (!TTI.isOutOfOrder())<br>

>>>> +    return false;<br>

>>>> +<br>

>>>> +  unsigned NewBBSize = CountBBCodeSize(BB1, TTI) + CountBBCodeSize(BB2,<br>

>>>> TTI)<br>

>>>> +                         + SpeculationSize;<br>

>>>> +<br>

>>>> +  // We check small BB only since it is more difficult to find<br>

>>>> unrelated<br>

>>>> +  // instructions to fill functional units in a small BB.<br>

>>>> +  if (NewBBSize > SmallBBSize)<br>

>>>> +    return false;<br>

>>>> +<br>

>>>> +  auto BB1Chain =<br>

>>>> +         FindDependenceChainLatency(BB1, BB1Instructions, TTI, false);<br>

>>>> +  auto BB2Chain =<br>

>>>> +         FindDependenceChainLatency(BB2, BB2Instructions, TTI, true);<br>

>>>> +<br>

>>>> +  // If there are many unrelated instructions in the new BB, there will<br>

>>>> be<br>

>>>> +  // other instructions for the processor to issue regardless of the<br>

>>>> length<br>

>>>> +  // of this new dependence chain.<br>

>>>> +  // Modern processors can issue 3 or more instructions in each cycle.<br>

>>>> But in<br>

>>>> +  // real world applications, an IPC of 2 is already very good for<br>

>>>> non-loop<br>

>>>> +  // code with small basic blocks. Higher IPC is usually found in<br>

>>>> programs with<br>

>>>> +  // small kernel. So IPC of 2 is more reasonable for most<br>

>>>> applications.<br>

>>>> +  if ((BB1Chain + BB2Chain) * 2 <= NewBBSize)<br>

>>>> +    return false;<br>

>>>> +<br>

>>>> +  // We only care about part of the dependence chain in BB1 that can be<br>

>>>> +  // executed in parallel with BB2, so adjust the latency.<br>

>>>> +  BB1Chain -=<br>

>>>> +      LatencyAdjustment(BB1, BB2, IfBlock1, IfBlock2, BB1Instructions);<br>

>>>> +<br>

>>>> +  // Correctly predicted branch instruction can skip the dependence<br>

>>>> chain<br>

>>>> in<br>

>>>> +  // BB1, but misprediction has a penalty, so only when the dependence<br>

>>>> chain is<br>

>>>> +  // longer than DependenceChainLatency, then branch is better than<br>

>>>> select.<br>

>>>> +  // Besides misprediction penalty, the threshold value<br>

>>>> DependenceChainLatency<br>

>>>> +  // also depends on branch misprediction rate, taken branch latency<br>

>>>> and<br>

>>>> cmov<br>

>>>> +  // latency.<br>

>>>> +  if (BB1Chain >= DependenceChainLatency)<br>

>>>> +    return true;<br>

>>>> +<br>

>>>> +  return false;<br>

>>>> +}<br>

>>>> +<br>

>>>>    /// Extract ConstantInt from value, looking through IntToPtr<br>

>>>>    /// and PointerNullValue. Return NULL if value is not a constant int.<br>

>>>>    static ConstantInt *GetConstantInt(Value *V, const DataLayout &DL) {<br>

>>>> @@ -2044,6 +2214,11 @@ static bool SpeculativelyExecuteBB(Branc<br>

>>>>      if (!HaveRewritablePHIs && !(HoistCondStores &&<br>

>>>> SpeculatedStoreValue))<br>

>>>>        return false;<br>

>>>>    +  // Don't do if conversion for long dependence chain.<br>

>>>> +  if (FindLongDependenceChain(BB, EndBB, ThenBB, nullptr,<br>

>>>> +                              CountBBCodeSize(ThenBB, TTI), TTI))<br>

>>>> +    return false;<br>

>>>> +<br>

>>>>      // If we get here, we can hoist the instruction and if-convert.<br>

>>>>      DEBUG(dbgs() << "SPECULATIVELY EXECUTING BB" << *ThenBB << "\n";);<br>

>>>>    @@ -2351,6 +2526,10 @@ static bool FoldTwoEntryPHINode(PHINode<br>

>>>>          }<br>

>>>>      }<br>

>>>>    +  if (FindLongDependenceChain(DomBlock, BB, IfBlock1, IfBlock2,<br>

>>>> +                              AggressiveInsts.size(), TTI))<br>

>>>> +    return false;<br>

>>>> +<br>

>>>>      DEBUG(dbgs() << "FOUND IF CONDITION!  " << *IfCond << "  T: "<br>

>>>>                   << IfTrue->getName() << "  F: " << IfFalse->getName()<br>

>>>> <<<br>

>>>> "\n");<br>

>>>><br>

>>>> Added: llvm/trunk/test/Transforms/SimplifyCFG/X86/if-conversion.ll<br>

>>>> URL:<br>

>>>><br>

>>>> <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SimplifyCFG/X86/if-conversion.ll?rev=321377&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SimplifyCFG/X86/if-conversion.ll?rev=321377&view=auto</a><br>

>>>><br>

>>>><br>

>>>> ==============================================================================<br>

>>>> --- llvm/trunk/test/Transforms/SimplifyCFG/X86/if-conversion.ll (added)<br>

>>>> +++ llvm/trunk/test/Transforms/SimplifyCFG/X86/if-conversion.ll Fri Dec<br>

>>>> 22<br>

>>>> 10:54:04 2017<br>

>>>> @@ -0,0 +1,231 @@<br>

>>>> +; RUN: opt < %s -simplifycfg -mtriple=x86_64-unknown-linux-gnu<br>

>>>> -mcpu=corei7 -S | FileCheck %s<br>

>>>> +; Avoid if-conversion if there is a long dependence chain.<br>

>>>> +<br>

>>>> +target datalayout =<br>

>>>><br>

>>>> "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"<br>

>>>> +<br>

>>>> +; The first several cases test FindLongDependenceChain returns true, so<br>

>>>> +; if-conversion is blocked.<br>

>>>> +<br>

>>>> +define i64 @test1(i64** %pp, i64* %p) {<br>

>>>> +entry:<br>

>>>> +  %0 = load i64*, i64** %pp, align 8<br>

>>>> +  %1 = load i64, i64* %0, align 8<br>

>>>> +  %cmp = icmp slt i64 %1, 0<br>

>>>> +  %pint = ptrtoint i64* %p to i64<br>

>>>> +  br i1 %cmp, label %cond.true, label %cond.false<br>

>>>> +<br>

>>>> +cond.true:<br>

>>>> +  %p1 = add i64 %pint, 8<br>

>>>> +  br label %cond.end<br>

>>>> +<br>

>>>> +cond.false:<br>

>>>> +  %p2 = or i64 %pint, 16<br>

>>>> +  br label %cond.end<br>

>>>> +<br>

>>>> +cond.end:<br>

>>>> +  %p3 = phi i64 [%p1, %cond.true], [%p2, %cond.false]<br>

>>>> +  %ptr = inttoptr i64 %p3 to i64*<br>

>>>> +  %val = load i64, i64* %ptr, align 8<br>

>>>> +  ret i64 %val<br>

>>>> +<br>

>>>> +; CHECK-NOT: select<br>

>>>> +}<br>

>>>> +<br>

>>>> +define i64 @test2(i64** %pp, i64* %p) {<br>

>>>> +entry:<br>

>>>> +  %0 = load i64*, i64** %pp, align 8<br>

>>>> +  %1 = load i64, i64* %0, align 8<br>

>>>> +  %cmp = icmp slt i64 %1, 0<br>

>>>> +  %pint = ptrtoint i64* %p to i64<br>

>>>> +  br i1 %cmp, label %cond.true, label %cond.false<br>

>>>> +<br>

>>>> +cond.true:<br>

>>>> +  %p1 = add i64 %pint, 8<br>

>>>> +  br label %cond.end<br>

>>>> +<br>

>>>> +cond.false:<br>

>>>> +  %p2 = add i64 %pint, 16<br>

>>>> +  br label %cond.end<br>

>>>> +<br>

>>>> +cond.end:<br>

>>>> +  %p3 = phi i64 [%p1, %cond.true], [%p2, %cond.false]<br>

>>>> +  %ptr = inttoptr i64 %p3 to i64*<br>

>>>> +  %val = load i64, i64* %ptr, align 8<br>

>>>> +  ret i64 %val<br>

>>>> +<br>

>>>> +; CHECK-LABEL: @test2<br>

>>>> +; CHECK-NOT: select<br>

>>>> +}<br>

>>>> +<br>

>>>> +; The following cases test FindLongDependenceChain returns false, so<br>

>>>> +; if-conversion will proceed.<br>

>>>> +<br>

>>>> +; Non trivial LatencyAdjustment.<br>

>>>> +define i64 @test3(i64** %pp, i64* %p) {<br>

>>>> +entry:<br>

>>>> +  %0 = load i64*, i64** %pp, align 8<br>

>>>> +  %1 = load i64, i64* %0, align 8<br>

>>>> +  %cmp = icmp slt i64 %1, 0<br>

>>>> +  %pint = ptrtoint i64* %p to i64<br>

>>>> +  br i1 %cmp, label %cond.true, label %cond.false<br>

>>>> +<br>

>>>> +cond.true:<br>

>>>> +  %p1 = add i64 %pint, 8<br>

>>>> +  br label %cond.end<br>

>>>> +<br>

>>>> +cond.false:<br>

>>>> +  %p2 = or i64 %pint, 16<br>

>>>> +  br label %cond.end<br>

>>>> +<br>

>>>> +cond.end:<br>

>>>> +  %p3 = phi i64 [%p1, %cond.true], [%p2, %cond.false]<br>

>>>> +  %p4 = add i64 %p3, %1<br>

>>>> +  %ptr = inttoptr i64 %p4 to i64*<br>

>>>> +  %val = load i64, i64* %ptr, align 8<br>

>>>> +  ret i64 %val<br>

>>>> +<br>

>>>> +; CHECK-LABEL: @test3<br>

>>>> +; CHECK: select<br>

>>>> +}<br>

>>>> +<br>

>>>> +; Short dependence chain.<br>

>>>> +define i64 @test4(i64* %pp, i64* %p) {<br>

>>>> +entry:<br>

>>>> +  %0 = load i64, i64* %pp, align 8<br>

>>>> +  %cmp = icmp slt i64 %0, 0<br>

>>>> +  %pint = ptrtoint i64* %p to i64<br>

>>>> +  br i1 %cmp, label %cond.true, label %cond.false<br>

>>>> +<br>

>>>> +cond.true:<br>

>>>> +  %p1 = add i64 %pint, 8<br>

>>>> +  br label %cond.end<br>

>>>> +<br>

>>>> +cond.false:<br>

>>>> +  %p2 = or i64 %pint, 16<br>

>>>> +  br label %cond.end<br>

>>>> +<br>

>>>> +cond.end:<br>

>>>> +  %p3 = phi i64 [%p1, %cond.true], [%p2, %cond.false]<br>

>>>> +  %ptr = inttoptr i64 %p3 to i64*<br>

>>>> +  %val = load i64, i64* %ptr, align 8<br>

>>>> +  ret i64 %val<br>

>>>> +<br>

>>>> +; CHECK-LABEL: @test4<br>

>>>> +; CHECK: select<br>

>>>> +}<br>

>>>> +<br>

>>>> +; High IPC.<br>

>>>> +define i64 @test5(i64** %pp, i64* %p) {<br>

>>>> +entry:<br>

>>>> +  %0 = load i64*, i64** %pp, align 8<br>

>>>> +  %1 = load i64, i64* %0, align 8<br>

>>>> +  %cmp = icmp slt i64 %1, 0<br>

>>>> +  %pint = ptrtoint i64* %p to i64<br>

>>>> +  %2 = add i64 %pint, 2<br>

>>>> +  %3 = add i64 %pint, 3<br>

>>>> +  %4 = or i64 %pint, 16<br>

>>>> +  %5 = and i64 %pint, 255<br>

>>>> +<br>

>>>> +  %6 = or i64 %2, 9<br>

>>>> +  %7 = and i64 %3, 255<br>

>>>> +  %8 = add i64 %4, 4<br>

>>>> +  %9 = add i64 %5, 5<br>

>>>> +<br>

>>>> +  %10 = add i64 %6, 2<br>

>>>> +  %11 = add i64 %7, 3<br>

>>>> +  %12 = add i64 %8, 4<br>

>>>> +  %13 = add i64 %9, 5<br>

>>>> +<br>

>>>> +  %14 = add i64 %10, 6<br>

>>>> +  %15 = add i64 %11, 7<br>

>>>> +  %16 = add i64 %12, 8<br>

>>>> +  %17 = add i64 %13, 9<br>

>>>> +<br>

>>>> +  %18 = add i64 %14, 10<br>

>>>> +  %19 = add i64 %15, 11<br>

>>>> +  %20 = add i64 %16, 12<br>

>>>> +  %21 = add i64 %17, 13<br>

>>>> +<br>

>>>> +  br i1 %cmp, label %cond.true, label %cond.false<br>

>>>> +<br>

>>>> +cond.true:<br>

>>>> +  %p1 = add i64 %pint, 8<br>

>>>> +  br label %cond.end<br>

>>>> +<br>

>>>> +cond.false:<br>

>>>> +  %p2 = or i64 %pint, 16<br>

>>>> +  br label %cond.end<br>

>>>> +<br>

>>>> +cond.end:<br>

>>>> +  %p3 = phi i64 [%p1, %cond.true], [%p2, %cond.false]<br>

>>>> +  %ptr = inttoptr i64 %p3 to i64*<br>

>>>> +  %val = load i64, i64* %ptr, align 8<br>

>>>> +<br>

>>>> +  ret i64 %val<br>

>>>> +<br>

>>>> +; CHECK-LABEL: @test5<br>

>>>> +; CHECK: select<br>

>>>> +}<br>

>>>> +<br>

>>>> +; Large BB size.<br>

>>>> +define i64 @test6(i64** %pp, i64* %p) {<br>

>>>> +entry:<br>

>>>> +  %0 = load i64*, i64** %pp, align 8<br>

>>>> +  %1 = load i64, i64* %0, align 8<br>

>>>> +  %cmp = icmp slt i64 %1, 0<br>

>>>> +  %pint = ptrtoint i64* %p to i64<br>

>>>> +  br i1 %cmp, label %cond.true, label %cond.false<br>

>>>> +<br>

>>>> +cond.true:<br>

>>>> +  %p1 = add i64 %pint, 8<br>

>>>> +  br label %cond.end<br>

>>>> +<br>

>>>> +cond.false:<br>

>>>> +  %p2 = or i64 %pint, 16<br>

>>>> +  br label %cond.end<br>

>>>> +<br>

>>>> +cond.end:<br>

>>>> +  %p3 = phi i64 [%p1, %cond.true], [%p2, %cond.false]<br>

>>>> +  %ptr = inttoptr i64 %p3 to i64*<br>

>>>> +  %val = load i64, i64* %ptr, align 8<br>

>>>> +  %2 = add i64 %pint, 2<br>

>>>> +  %3 = add i64 %pint, 3<br>

>>>> +  %4 = add i64 %2, 4<br>

>>>> +  %5 = add i64 %3, 5<br>

>>>> +  %6 = add i64 %4, 6<br>

>>>> +  %7 = add i64 %5, 7<br>

>>>> +  %8 = add i64 %6, 6<br>

>>>> +  %9 = add i64 %7, 7<br>

>>>> +  %10 = add i64 %8, 6<br>

>>>> +  %11 = add i64 %9, 7<br>

>>>> +  %12 = add i64 %10, 6<br>

>>>> +  %13 = add i64 %11, 7<br>

>>>> +  %14 = add i64 %12, 6<br>

>>>> +  %15 = add i64 %13, 7<br>

>>>> +  %16 = add i64 %14, 6<br>

>>>> +  %17 = add i64 %15, 7<br>

>>>> +  %18 = add i64 %16, 6<br>

>>>> +  %19 = add i64 %17, 7<br>

>>>> +  %20 = add i64 %18, 6<br>

>>>> +  %21 = add i64 %19, 7<br>

>>>> +  %22 = add i64 %20, 6<br>

>>>> +  %23 = add i64 %21, 7<br>

>>>> +  %24 = add i64 %22, 6<br>

>>>> +  %25 = add i64 %23, 7<br>

>>>> +  %26 = add i64 %24, 6<br>

>>>> +  %27 = add i64 %25, 7<br>

>>>> +  %28 = add i64 %26, 6<br>

>>>> +  %29 = add i64 %27, 7<br>

>>>> +  %30 = add i64 %28, 6<br>

>>>> +  %31 = add i64 %29, 7<br>

>>>> +  %32 = add i64 %30, 8<br>

>>>> +  %33 = add i64 %31, 9<br>

>>>> +  %34 = add i64 %32, %33<br>

>>>> +  %35 = and i64 %34, 255<br>

>>>> +  %res = add i64 %val, %35<br>

>>>> +<br>

>>>> +  ret i64 %res<br>

>>>> +<br>

>>>> +; CHECK-LABEL: @test6<br>

>>>> +; CHECK: select<br>

>>>> +}<br>

>>>><br>

>>>><br>

>>>> _______________________________________________<br>

>>>> llvm-commits mailing list<br>

>>>> <a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>

>>>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>

>>><br>

>>><br>

>>><br>

><br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>

</blockquote></div>