[llvm] r179957 - SimplifyCFG: If convert single conditional stores

Arnold Schwaighofer aschwaighofer at apple.com
Sat Apr 20 17:48:36 PDT 2013


Hi Shuxin,

We already get this transformation (eliminate the extra loads) even without my patch, GVN does this for us:
 
C-code excerpt:

      mc[k] = mpp[k-1]   + tpmm[k-1];
      if ((sc = ip[k-1]  + tpim[k-1]) > mc[k])  mc[k] = sc;
      if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k])  mc[k] = sc;

Without my patch we already produce this:

  %79 = add nsw i32 %78, %76 << else value
  ...
  %85 = add nsw i32 %84, %82 << if value
  %86 = icmp sgt i32 %85, %79
  br i1 %86, label %87, label %88

; <label>:87                                      ; preds = %.lr.ph
  store i32 %85, i32* %80, align 4, !tbaa !0
  br label %88

; <label>:88                                      ; preds = %87, %.lr.ph
  %89 = phi i32 [ %85, %87 ], [ %79, %.lr.ph ]   << that is your eliminated load
  %90 = getelementptr inbounds i32* %59, i64 %74
  %91 = load i32* %90, align 4, !tbaa !0
  %92 = getelementptr inbounds i32* %32, i64 %74
  %93 = load i32* %92, align 4, !tbaa !0
  %94 = add nsw i32 %93, %91
  %95 = icmp sgt i32 %94, %89
  br i1 %95, label %96, label %97

Best,
Arnold

On Apr 20, 2013, at 7:24 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:

> I'm not going to discuss how to optimize/hack the famous loops in hmmer, I just focus on
> this transformation.
> 
> IMHO, this xform is better motivated by the subsequent loads, which is the case in hmmer.
> 


>  ---------------
>  if (cond)
>     store val to x ; // x is just a memory unit, dose not necessary has name.
> 
>    ... load from x
>  ---------------
> 
> =>
>   if (cond)
>     t = val
>   store x = t
> 
>   ... t
> 
>  ------------------
> 
>  by stripping off the condition of the store stmt, we can git rid of the load expr, which
> seems to be big save regardless the which combination of following conditions take place
>   a)
>      a1 the branch is biased (likely taken)
>      a2) the branch is biased (likely fall-thru)
>      a3) the branch is balanced
> 
>   b) the load-store unit of the underlying architecture has load-store forwarding capability.
> 
>  if b) dose not hold, it always win as we reduce a expensive load
>  for combination a1 + b, the load request dose not go to cache, however, we still suffer
>   from latency. (FIXME: load-store unit will blindly go through the pipeline).
>  ....
> 
> 
> 
> 
> 
> 
> On 4/20/13 2:42 PM, Arnold Schwaighofer wrote:
>> Author: arnolds
>> Date: Sat Apr 20 16:42:09 2013
>> New Revision: 179957
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=179957&view=rev
>> Log:
>> SimplifyCFG: If convert single conditional stores
>> 
>> This transformation will transform a conditional store with a preceeding
>> uncondtional store to the same location:
>> 
>>  a[i] =
>>  may-alias with a[i] load
>>  if (cond)
>>    a[i] = Y
>> 
>> into an unconditional store.
>> 
>>  a[i] = X
>>  may-alias with a[i] load
>>  tmp = cond ? Y : X;
>>  a[i] = tmp
>> 
>> We assume that on average the cost of a mispredicted branch is going to be
>> higher than the cost of a second store to the same location, and that the
>> secondary benefits of creating a bigger basic block for other optimizations to
>> work on outway the potential case were the branch would be correctly predicted
>> and the cost of the executing the second store would be noticably reflected in
>> performance.
>> 
>> hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
>> this change we are on par with gcc's performance (gcc also performs this
>> transformation). There was a 1.2 % performance improvement on a ARM swift chip.
>> Other tests in the test-suite+external seem to be mostly uninfluenced in my
>> experiments:
>> This optimization was triggered on 41 tests such that the executable was
>> different before/after the patch. Only 1 out of the 40 tests (dealII) was
>> reproducable below 100% (by about .4%). Given that hmmer benefits so much I
>> believe this to be a fair trade off.
>> 
>> I am going to watch performance numbers across the builtbots and will revert
>> this if anything unexpected comes up.
>> 
>> Added:
>>     llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll
>> Modified:
>>     llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp
>> 
>> Modified: llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp?rev=179957&r1=179956&r2=179957&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp (original)
>> +++ llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp Sat Apr 20 16:42:09 2013
>> @@ -59,6 +59,10 @@ static cl::opt<bool>
>>  SinkCommon("simplifycfg-sink-common", cl::Hidden, cl::init(true),
>>         cl::desc("Sink common instructions down to the end block"));
>>  +static cl::opt<bool>
>> +HoistCondStores("simplifycfg-hoist-cond-stores", cl::Hidden, cl::init(true),
>> +       cl::desc("Hoist conditional stores if an unconditional store preceeds"));
>> +
>>  STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
>>  STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
>>  STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
>> @@ -1332,6 +1336,64 @@ static bool SinkThenElseCodeToEnd(Branch
>>    return Changed;
>>  }
>>  +/// \brief Determine if we can hoist sink a sole store instruction out of a
>> +/// conditional block.
>> +///
>> +/// We are looking for code like the following:
>> +///   BrBB:
>> +///     store i32 %add, i32* %arrayidx2
>> +///     ... // No other stores or function calls (we could be calling a memory
>> +///     ... // function).
>> +///     %cmp = icmp ult %x, %y
>> +///     br i1 %cmp, label %EndBB, label %ThenBB
>> +///   ThenBB:
>> +///     store i32 %add5, i32* %arrayidx2
>> +///     br label EndBB
>> +///   EndBB:
>> +///     ...
>> +///   We are going to transform this into:
>> +///   BrBB:
>> +///     store i32 %add, i32* %arrayidx2
>> +///     ... //
>> +///     %cmp = icmp ult %x, %y
>> +///     %add.add5 = select i1 %cmp, i32 %add, %add5
>> +///     store i32 %add.add5, i32* %arrayidx2
>> +///     ...
>> +///
>> +/// \return The pointer to the value of the previous store if the store can be
>> +///         hoisted into the predecessor block. 0 otherwise.
>> +Value *isSafeToSpeculateStore(Instruction *I, BasicBlock *BrBB,
>> +                              BasicBlock *StoreBB, BasicBlock *EndBB) {
>> +  StoreInst *StoreToHoist = dyn_cast<StoreInst>(I);
>> +  if (!StoreToHoist)
>> +    return 0;
>> +
>> +  Value *StorePtr = StoreToHoist->getPointerOperand();
>> +  StoreInst *PreviousStore = 0;
>> +
>> +  // Look for a store to the same pointer in BrBB.
>> +  unsigned MaxNumInstToLookAt = 10;
>> +  for (BasicBlock::reverse_iterator RI = BrBB->rbegin(),
>> +       RE = BrBB->rend(); RI != RE && (--MaxNumInstToLookAt); ++RI) {
>> +    Instruction *CurI = &*RI;
>> +
>> +    // Could be calling an instruction that effects memory like free().
>> +    if (CurI->mayHaveSideEffects() && !isa<StoreInst>(CurI))
>> +      return 0;
>> +
>> +    // Found the previous store make sure it stores to the same location.
>> +    if ((PreviousStore = dyn_cast<StoreInst>(CurI))) {
>> +      if (PreviousStore->getPointerOperand() == StorePtr)
>> +        // Found the previous store, return its value operand.
>> +        return PreviousStore->getValueOperand();
>> +      else
>> +        return 0; // Unknown store.
>> +    }
>> +  }
>> +
>> +  return 0;
>> +}
>> +
>>  /// \brief Speculate a conditional basic block flattening the CFG.
>>  ///
>>  /// Note that this is a very risky transform currently. Speculating
>> @@ -1395,6 +1457,8 @@ static bool SpeculativelyExecuteBB(Branc
>>    SmallDenseMap<Instruction *, unsigned, 4> SinkCandidateUseCounts;
>>      unsigned SpeculationCost = 0;
>> +  Value *SpeculatedStoreValue = 0;
>> +  StoreInst *SpeculatedStore = 0;
>>    for (BasicBlock::iterator BBI = ThenBB->begin(),
>>                              BBE = llvm::prior(ThenBB->end());
>>         BBI != BBE; ++BBI) {
>> @@ -1410,13 +1474,21 @@ static bool SpeculativelyExecuteBB(Branc
>>        return false;
>>        // Don't hoist the instruction if it's unsafe or expensive.
>> -    if (!isSafeToSpeculativelyExecute(I))
>> +    if (!isSafeToSpeculativelyExecute(I) &&
>> +        !(HoistCondStores &&
>> +          (SpeculatedStoreValue = isSafeToSpeculateStore(I, BB, ThenBB,
>> +                                                         EndBB))))
>>        return false;
>> -    if (ComputeSpeculationCost(I) > PHINodeFoldingThreshold)
>> +    if (!SpeculatedStoreValue &&
>> +        ComputeSpeculationCost(I) > PHINodeFoldingThreshold)
>>        return false;
>>  +    // Store the store speculation candidate.
>> +    if (SpeculatedStoreValue)
>> +      SpeculatedStore = cast<StoreInst>(I);
>> +
>>      // Do not hoist the instruction if any of its operands are defined but not
>> -    // used in this BB. The transformation will prevent the operand from
>> +    // used in BB. The transformation will prevent the operand from
>>      // being sunk into the use block.
>>      for (User::op_iterator i = I->op_begin(), e = I->op_end();
>>           i != e; ++i) {
>> @@ -1473,12 +1545,24 @@ static bool SpeculativelyExecuteBB(Branc
>>      // If there are no PHIs to process, bail early. This helps ensure idempotence
>>    // as well.
>> -  if (!HaveRewritablePHIs)
>> +  if (!HaveRewritablePHIs && !(HoistCondStores && SpeculatedStoreValue))
>>      return false;
>>      // If we get here, we can hoist the instruction and if-convert.
>>    DEBUG(dbgs() << "SPECULATIVELY EXECUTING BB" << *ThenBB << "\n";);
>>  +  // Insert a select of the value of the speculated store.
>> +  if (SpeculatedStoreValue) {
>> +    IRBuilder<true, NoFolder> Builder(BI);
>> +    Value *TrueV = SpeculatedStore->getValueOperand();
>> +    Value *FalseV = SpeculatedStoreValue;
>> +    if (Invert)
>> +      std::swap(TrueV, FalseV);
>> +    Value *S = Builder.CreateSelect(BrCond, TrueV, FalseV, TrueV->getName() +
>> +                                    "." + FalseV->getName());
>> +    SpeculatedStore->setOperand(0, S);
>> +  }
>> +
>>    // Hoist the instructions.
>>    BB->getInstList().splice(BI, ThenBB->getInstList(), ThenBB->begin(),
>>                             llvm::prior(ThenBB->end()));
>> 
>> Added: llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll?rev=179957&view=auto
>> ==============================================================================
>> --- llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll (added)
>> +++ llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll Sat Apr 20 16:42:09 2013
>> @@ -0,0 +1,83 @@
>> +; RUN: opt -simplifycfg -S < %s | FileCheck %s
>> +
>> +define void @ifconvertstore(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>> +entry:
>> +  %arrayidx = getelementptr inbounds i32* %B, i64 0
>> +  %0 = load i32* %arrayidx, align 4
>> +  %add = add nsw i32 %0, %C
>> +  %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>> +
>> +; First store to the location.
>> +  store i32 %add, i32* %arrayidx2, align 4
>> +  %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>> +  %1 = load i32* %arrayidx4, align 4
>> +  %add5 = add nsw i32 %1, %D
>> +  %cmp6 = icmp sgt i32 %add5, %C
>> +  br i1 %cmp6, label %if.then, label %ret.end
>> +
>> +; Make sure we speculate stores like the following one. It is cheap compared to
>> +; a mispredicated branch.
>> +; CHECK: @ifconvertstore
>> +; CHECK: %add5.add = select i1 %cmp6, i32 %add5, i32 %add
>> +; CHECK: store i32 %add5.add, i32* %arrayidx2, align 4
>> +if.then:
>> +  store i32 %add5, i32* %arrayidx2, align 4
>> +  br label %ret.end
>> +
>> +ret.end:
>> +  ret void
>> +}
>> +
>> +define void @noifconvertstore1(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>> +entry:
>> +  %arrayidx = getelementptr inbounds i32* %B, i64 0
>> +  %0 = load i32* %arrayidx, align 4
>> +  %add = add nsw i32 %0, %C
>> +  %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>> +
>> +; Store to a different location.
>> +  store i32 %add, i32* %arrayidx, align 4
>> +  %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>> +  %1 = load i32* %arrayidx4, align 4
>> +  %add5 = add nsw i32 %1, %D
>> +  %cmp6 = icmp sgt i32 %add5, %C
>> +  br i1 %cmp6, label %if.then, label %ret.end
>> +
>> +; CHECK: @noifconvertstore1
>> +; CHECK-NOT: select
>> +if.then:
>> +  store i32 %add5, i32* %arrayidx2, align 4
>> +  br label %ret.end
>> +
>> +ret.end:
>> +  ret void
>> +}
>> +
>> +declare void @unknown_fun()
>> +
>> +define void @noifconvertstore2(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>> +entry:
>> +  %arrayidx = getelementptr inbounds i32* %B, i64 0
>> +  %0 = load i32* %arrayidx, align 4
>> +  %add = add nsw i32 %0, %C
>> +  %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>> +
>> +; First store to the location.
>> +  store i32 %add, i32* %arrayidx2, align 4
>> +  call void @unknown_fun()
>> +  %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>> +  %1 = load i32* %arrayidx4, align 4
>> +  %add5 = add nsw i32 %1, %D
>> +  %cmp6 = icmp sgt i32 %add5, %C
>> +  br i1 %cmp6, label %if.then, label %ret.end
>> +
>> +; CHECK: @noifconvertstore2
>> +; CHECK-NOT: select
>> +if.then:
>> +  store i32 %add5, i32* %arrayidx2, align 4
>> +  br label %ret.end
>> +
>> +ret.end:
>> +  ret void
>> +}
>> +
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits




More information about the llvm-commits mailing list