[llvm] r179957 - SimplifyCFG: If convert single conditional stores

Arnold Schwaighofer aschwaighofer at apple.com
Sat Apr 20 18:04:38 PDT 2013


On Apr 20, 2013, at 7:54 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:

> It tail dup the load, which dosen't necessarily simplify the CFG.
> I'm from CFG simplification perspective.
> 

It does not perform select/if conversation of the store - hence my patch.

But the loads are already eliminated so I can't use them to motivate my patch - that is all I am saying.

> 
> On 4/20/13 5:48 PM, Arnold Schwaighofer wrote:
>> Hi Shuxin,
>> 
>> We already get this transformation (eliminate the extra loads) even without my patch, GVN does this for us:
>>  C-code excerpt:
>> 
>>       mc[k] = mpp[k-1]   + tpmm[k-1];
>>       if ((sc = ip[k-1]  + tpim[k-1]) > mc[k])  mc[k] = sc;
>>       if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k])  mc[k] = sc;
>> 
>> Without my patch we already produce this:
>> 
>>   %79 = add nsw i32 %78, %76 << else value
>>   ...
>>   %85 = add nsw i32 %84, %82 << if value
>>   %86 = icmp sgt i32 %85, %79
>>   br i1 %86, label %87, label %88
>> 
>> ; <label>:87                                      ; preds = %.lr.ph
>>   store i32 %85, i32* %80, align 4, !tbaa !0
>>   br label %88
>> 
>> ; <label>:88                                      ; preds = %87, %.lr.ph
>>   %89 = phi i32 [ %85, %87 ], [ %79, %.lr.ph ]   << that is your eliminated load
>>   %90 = getelementptr inbounds i32* %59, i64 %74
>>   %91 = load i32* %90, align 4, !tbaa !0
>>   %92 = getelementptr inbounds i32* %32, i64 %74
>>   %93 = load i32* %92, align 4, !tbaa !0
>>   %94 = add nsw i32 %93, %91
>>   %95 = icmp sgt i32 %94, %89
>>   br i1 %95, label %96, label %97
>> 
>> Best,
>> Arnold
>> 
>> On Apr 20, 2013, at 7:24 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>> 
>>> I'm not going to discuss how to optimize/hack the famous loops in hmmer, I just focus on
>>> this transformation.
>>> 
>>> IMHO, this xform is better motivated by the subsequent loads, which is the case in hmmer.
>>> 
>> 
>>>  ---------------
>>>  if (cond)
>>>     store val to x ; // x is just a memory unit, dose not necessary has name.
>>> 
>>>    ... load from x
>>>  ---------------
>>> 
>>> =>
>>>   if (cond)
>>>     t = val
>>>   store x = t
>>> 
>>>   ... t
>>> 
>>>  ------------------
>>> 
>>>  by stripping off the condition of the store stmt, we can git rid of the load expr, which
>>> seems to be big save regardless the which combination of following conditions take place
>>>   a)
>>>      a1 the branch is biased (likely taken)
>>>      a2) the branch is biased (likely fall-thru)
>>>      a3) the branch is balanced
>>> 
>>>   b) the load-store unit of the underlying architecture has load-store forwarding capability.
>>> 
>>>  if b) dose not hold, it always win as we reduce a expensive load
>>>  for combination a1 + b, the load request dose not go to cache, however, we still suffer
>>>   from latency. (FIXME: load-store unit will blindly go through the pipeline).
>>>  ....
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 4/20/13 2:42 PM, Arnold Schwaighofer wrote:
>>>> Author: arnolds
>>>> Date: Sat Apr 20 16:42:09 2013
>>>> New Revision: 179957
>>>> 
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=179957&view=rev
>>>> Log:
>>>> SimplifyCFG: If convert single conditional stores
>>>> 
>>>> This transformation will transform a conditional store with a preceeding
>>>> uncondtional store to the same location:
>>>> 
>>>>  a[i] =
>>>>  may-alias with a[i] load
>>>>  if (cond)
>>>>    a[i] = Y
>>>> 
>>>> into an unconditional store.
>>>> 
>>>>  a[i] = X
>>>>  may-alias with a[i] load
>>>>  tmp = cond ? Y : X;
>>>>  a[i] = tmp
>>>> 
>>>> We assume that on average the cost of a mispredicted branch is going to be
>>>> higher than the cost of a second store to the same location, and that the
>>>> secondary benefits of creating a bigger basic block for other optimizations to
>>>> work on outway the potential case were the branch would be correctly predicted
>>>> and the cost of the executing the second store would be noticably reflected in
>>>> performance.
>>>> 
>>>> hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
>>>> this change we are on par with gcc's performance (gcc also performs this
>>>> transformation). There was a 1.2 % performance improvement on a ARM swift chip.
>>>> Other tests in the test-suite+external seem to be mostly uninfluenced in my
>>>> experiments:
>>>> This optimization was triggered on 41 tests such that the executable was
>>>> different before/after the patch. Only 1 out of the 40 tests (dealII) was
>>>> reproducable below 100% (by about .4%). Given that hmmer benefits so much I
>>>> believe this to be a fair trade off.
>>>> 
>>>> I am going to watch performance numbers across the builtbots and will revert
>>>> this if anything unexpected comes up.
>>>> 
>>>> Added:
>>>>     llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll
>>>> Modified:
>>>>     llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp
>>>> 
>>>> Modified: llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp?rev=179957&r1=179956&r2=179957&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp (original)
>>>> +++ llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp Sat Apr 20 16:42:09 2013
>>>> @@ -59,6 +59,10 @@ static cl::opt<bool>
>>>>  SinkCommon("simplifycfg-sink-common", cl::Hidden, cl::init(true),
>>>>         cl::desc("Sink common instructions down to the end block"));
>>>>  +static cl::opt<bool>
>>>> +HoistCondStores("simplifycfg-hoist-cond-stores", cl::Hidden, cl::init(true),
>>>> +       cl::desc("Hoist conditional stores if an unconditional store preceeds"));
>>>> +
>>>>  STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
>>>>  STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
>>>>  STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
>>>> @@ -1332,6 +1336,64 @@ static bool SinkThenElseCodeToEnd(Branch
>>>>    return Changed;
>>>>  }
>>>>  +/// \brief Determine if we can hoist sink a sole store instruction out of a
>>>> +/// conditional block.
>>>> +///
>>>> +/// We are looking for code like the following:
>>>> +///   BrBB:
>>>> +///     store i32 %add, i32* %arrayidx2
>>>> +///     ... // No other stores or function calls (we could be calling a memory
>>>> +///     ... // function).
>>>> +///     %cmp = icmp ult %x, %y
>>>> +///     br i1 %cmp, label %EndBB, label %ThenBB
>>>> +///   ThenBB:
>>>> +///     store i32 %add5, i32* %arrayidx2
>>>> +///     br label EndBB
>>>> +///   EndBB:
>>>> +///     ...
>>>> +///   We are going to transform this into:
>>>> +///   BrBB:
>>>> +///     store i32 %add, i32* %arrayidx2
>>>> +///     ... //
>>>> +///     %cmp = icmp ult %x, %y
>>>> +///     %add.add5 = select i1 %cmp, i32 %add, %add5
>>>> +///     store i32 %add.add5, i32* %arrayidx2
>>>> +///     ...
>>>> +///
>>>> +/// \return The pointer to the value of the previous store if the store can be
>>>> +///         hoisted into the predecessor block. 0 otherwise.
>>>> +Value *isSafeToSpeculateStore(Instruction *I, BasicBlock *BrBB,
>>>> +                              BasicBlock *StoreBB, BasicBlock *EndBB) {
>>>> +  StoreInst *StoreToHoist = dyn_cast<StoreInst>(I);
>>>> +  if (!StoreToHoist)
>>>> +    return 0;
>>>> +
>>>> +  Value *StorePtr = StoreToHoist->getPointerOperand();
>>>> +  StoreInst *PreviousStore = 0;
>>>> +
>>>> +  // Look for a store to the same pointer in BrBB.
>>>> +  unsigned MaxNumInstToLookAt = 10;
>>>> +  for (BasicBlock::reverse_iterator RI = BrBB->rbegin(),
>>>> +       RE = BrBB->rend(); RI != RE && (--MaxNumInstToLookAt); ++RI) {
>>>> +    Instruction *CurI = &*RI;
>>>> +
>>>> +    // Could be calling an instruction that effects memory like free().
>>>> +    if (CurI->mayHaveSideEffects() && !isa<StoreInst>(CurI))
>>>> +      return 0;
>>>> +
>>>> +    // Found the previous store make sure it stores to the same location.
>>>> +    if ((PreviousStore = dyn_cast<StoreInst>(CurI))) {
>>>> +      if (PreviousStore->getPointerOperand() == StorePtr)
>>>> +        // Found the previous store, return its value operand.
>>>> +        return PreviousStore->getValueOperand();
>>>> +      else
>>>> +        return 0; // Unknown store.
>>>> +    }
>>>> +  }
>>>> +
>>>> +  return 0;
>>>> +}
>>>> +
>>>>  /// \brief Speculate a conditional basic block flattening the CFG.
>>>>  ///
>>>>  /// Note that this is a very risky transform currently. Speculating
>>>> @@ -1395,6 +1457,8 @@ static bool SpeculativelyExecuteBB(Branc
>>>>    SmallDenseMap<Instruction *, unsigned, 4> SinkCandidateUseCounts;
>>>>      unsigned SpeculationCost = 0;
>>>> +  Value *SpeculatedStoreValue = 0;
>>>> +  StoreInst *SpeculatedStore = 0;
>>>>    for (BasicBlock::iterator BBI = ThenBB->begin(),
>>>>                              BBE = llvm::prior(ThenBB->end());
>>>>         BBI != BBE; ++BBI) {
>>>> @@ -1410,13 +1474,21 @@ static bool SpeculativelyExecuteBB(Branc
>>>>        return false;
>>>>        // Don't hoist the instruction if it's unsafe or expensive.
>>>> -    if (!isSafeToSpeculativelyExecute(I))
>>>> +    if (!isSafeToSpeculativelyExecute(I) &&
>>>> +        !(HoistCondStores &&
>>>> +          (SpeculatedStoreValue = isSafeToSpeculateStore(I, BB, ThenBB,
>>>> +                                                         EndBB))))
>>>>        return false;
>>>> -    if (ComputeSpeculationCost(I) > PHINodeFoldingThreshold)
>>>> +    if (!SpeculatedStoreValue &&
>>>> +        ComputeSpeculationCost(I) > PHINodeFoldingThreshold)
>>>>        return false;
>>>>  +    // Store the store speculation candidate.
>>>> +    if (SpeculatedStoreValue)
>>>> +      SpeculatedStore = cast<StoreInst>(I);
>>>> +
>>>>      // Do not hoist the instruction if any of its operands are defined but not
>>>> -    // used in this BB. The transformation will prevent the operand from
>>>> +    // used in BB. The transformation will prevent the operand from
>>>>      // being sunk into the use block.
>>>>      for (User::op_iterator i = I->op_begin(), e = I->op_end();
>>>>           i != e; ++i) {
>>>> @@ -1473,12 +1545,24 @@ static bool SpeculativelyExecuteBB(Branc
>>>>      // If there are no PHIs to process, bail early. This helps ensure idempotence
>>>>    // as well.
>>>> -  if (!HaveRewritablePHIs)
>>>> +  if (!HaveRewritablePHIs && !(HoistCondStores && SpeculatedStoreValue))
>>>>      return false;
>>>>      // If we get here, we can hoist the instruction and if-convert.
>>>>    DEBUG(dbgs() << "SPECULATIVELY EXECUTING BB" << *ThenBB << "\n";);
>>>>  +  // Insert a select of the value of the speculated store.
>>>> +  if (SpeculatedStoreValue) {
>>>> +    IRBuilder<true, NoFolder> Builder(BI);
>>>> +    Value *TrueV = SpeculatedStore->getValueOperand();
>>>> +    Value *FalseV = SpeculatedStoreValue;
>>>> +    if (Invert)
>>>> +      std::swap(TrueV, FalseV);
>>>> +    Value *S = Builder.CreateSelect(BrCond, TrueV, FalseV, TrueV->getName() +
>>>> +                                    "." + FalseV->getName());
>>>> +    SpeculatedStore->setOperand(0, S);
>>>> +  }
>>>> +
>>>>    // Hoist the instructions.
>>>>    BB->getInstList().splice(BI, ThenBB->getInstList(), ThenBB->begin(),
>>>>                             llvm::prior(ThenBB->end()));
>>>> 
>>>> Added: llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll?rev=179957&view=auto
>>>> ==============================================================================
>>>> --- llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll (added)
>>>> +++ llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll Sat Apr 20 16:42:09 2013
>>>> @@ -0,0 +1,83 @@
>>>> +; RUN: opt -simplifycfg -S < %s | FileCheck %s
>>>> +
>>>> +define void @ifconvertstore(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>>>> +entry:
>>>> +  %arrayidx = getelementptr inbounds i32* %B, i64 0
>>>> +  %0 = load i32* %arrayidx, align 4
>>>> +  %add = add nsw i32 %0, %C
>>>> +  %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>>>> +
>>>> +; First store to the location.
>>>> +  store i32 %add, i32* %arrayidx2, align 4
>>>> +  %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>>>> +  %1 = load i32* %arrayidx4, align 4
>>>> +  %add5 = add nsw i32 %1, %D
>>>> +  %cmp6 = icmp sgt i32 %add5, %C
>>>> +  br i1 %cmp6, label %if.then, label %ret.end
>>>> +
>>>> +; Make sure we speculate stores like the following one. It is cheap compared to
>>>> +; a mispredicated branch.
>>>> +; CHECK: @ifconvertstore
>>>> +; CHECK: %add5.add = select i1 %cmp6, i32 %add5, i32 %add
>>>> +; CHECK: store i32 %add5.add, i32* %arrayidx2, align 4
>>>> +if.then:
>>>> +  store i32 %add5, i32* %arrayidx2, align 4
>>>> +  br label %ret.end
>>>> +
>>>> +ret.end:
>>>> +  ret void
>>>> +}
>>>> +
>>>> +define void @noifconvertstore1(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>>>> +entry:
>>>> +  %arrayidx = getelementptr inbounds i32* %B, i64 0
>>>> +  %0 = load i32* %arrayidx, align 4
>>>> +  %add = add nsw i32 %0, %C
>>>> +  %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>>>> +
>>>> +; Store to a different location.
>>>> +  store i32 %add, i32* %arrayidx, align 4
>>>> +  %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>>>> +  %1 = load i32* %arrayidx4, align 4
>>>> +  %add5 = add nsw i32 %1, %D
>>>> +  %cmp6 = icmp sgt i32 %add5, %C
>>>> +  br i1 %cmp6, label %if.then, label %ret.end
>>>> +
>>>> +; CHECK: @noifconvertstore1
>>>> +; CHECK-NOT: select
>>>> +if.then:
>>>> +  store i32 %add5, i32* %arrayidx2, align 4
>>>> +  br label %ret.end
>>>> +
>>>> +ret.end:
>>>> +  ret void
>>>> +}
>>>> +
>>>> +declare void @unknown_fun()
>>>> +
>>>> +define void @noifconvertstore2(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>>>> +entry:
>>>> +  %arrayidx = getelementptr inbounds i32* %B, i64 0
>>>> +  %0 = load i32* %arrayidx, align 4
>>>> +  %add = add nsw i32 %0, %C
>>>> +  %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>>>> +
>>>> +; First store to the location.
>>>> +  store i32 %add, i32* %arrayidx2, align 4
>>>> +  call void @unknown_fun()
>>>> +  %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>>>> +  %1 = load i32* %arrayidx4, align 4
>>>> +  %add5 = add nsw i32 %1, %D
>>>> +  %cmp6 = icmp sgt i32 %add5, %C
>>>> +  br i1 %cmp6, label %if.then, label %ret.end
>>>> +
>>>> +; CHECK: @noifconvertstore2
>>>> +; CHECK-NOT: select
>>>> +if.then:
>>>> +  store i32 %add5, i32* %arrayidx2, align 4
>>>> +  br label %ret.end
>>>> +
>>>> +ret.end:
>>>> +  ret void
>>>> +}
>>>> +
>>>> 
>>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits




More information about the llvm-commits mailing list