[llvm] r179957 - SimplifyCFG: If convert single conditional stores

Shuxin Yang shuxin.llvm at gmail.com
Sat Apr 20 17:54:32 PDT 2013


It tail dup the load, which dosen't necessarily simplify the CFG.
I'm from CFG simplification perspective.


On 4/20/13 5:48 PM, Arnold Schwaighofer wrote:
> Hi Shuxin,
>
> We already get this transformation (eliminate the extra loads) even without my patch, GVN does this for us:
>   
> C-code excerpt:
>
>        mc[k] = mpp[k-1]   + tpmm[k-1];
>        if ((sc = ip[k-1]  + tpim[k-1]) > mc[k])  mc[k] = sc;
>        if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k])  mc[k] = sc;
>
> Without my patch we already produce this:
>
>    %79 = add nsw i32 %78, %76 << else value
>    ...
>    %85 = add nsw i32 %84, %82 << if value
>    %86 = icmp sgt i32 %85, %79
>    br i1 %86, label %87, label %88
>
> ; <label>:87                                      ; preds = %.lr.ph
>    store i32 %85, i32* %80, align 4, !tbaa !0
>    br label %88
>
> ; <label>:88                                      ; preds = %87, %.lr.ph
>    %89 = phi i32 [ %85, %87 ], [ %79, %.lr.ph ]   << that is your eliminated load
>    %90 = getelementptr inbounds i32* %59, i64 %74
>    %91 = load i32* %90, align 4, !tbaa !0
>    %92 = getelementptr inbounds i32* %32, i64 %74
>    %93 = load i32* %92, align 4, !tbaa !0
>    %94 = add nsw i32 %93, %91
>    %95 = icmp sgt i32 %94, %89
>    br i1 %95, label %96, label %97
>
> Best,
> Arnold
>
> On Apr 20, 2013, at 7:24 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>
>> I'm not going to discuss how to optimize/hack the famous loops in hmmer, I just focus on
>> this transformation.
>>
>> IMHO, this xform is better motivated by the subsequent loads, which is the case in hmmer.
>>
>
>>   ---------------
>>   if (cond)
>>      store val to x ; // x is just a memory unit, dose not necessary has name.
>>
>>     ... load from x
>>   ---------------
>>
>> =>
>>    if (cond)
>>      t = val
>>    store x = t
>>
>>    ... t
>>
>>   ------------------
>>
>>   by stripping off the condition of the store stmt, we can git rid of the load expr, which
>> seems to be big save regardless the which combination of following conditions take place
>>    a)
>>       a1 the branch is biased (likely taken)
>>       a2) the branch is biased (likely fall-thru)
>>       a3) the branch is balanced
>>
>>    b) the load-store unit of the underlying architecture has load-store forwarding capability.
>>
>>   if b) dose not hold, it always win as we reduce a expensive load
>>   for combination a1 + b, the load request dose not go to cache, however, we still suffer
>>    from latency. (FIXME: load-store unit will blindly go through the pipeline).
>>   ....
>>
>>
>>
>>
>>
>>
>> On 4/20/13 2:42 PM, Arnold Schwaighofer wrote:
>>> Author: arnolds
>>> Date: Sat Apr 20 16:42:09 2013
>>> New Revision: 179957
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=179957&view=rev
>>> Log:
>>> SimplifyCFG: If convert single conditional stores
>>>
>>> This transformation will transform a conditional store with a preceeding
>>> uncondtional store to the same location:
>>>
>>>   a[i] =
>>>   may-alias with a[i] load
>>>   if (cond)
>>>     a[i] = Y
>>>
>>> into an unconditional store.
>>>
>>>   a[i] = X
>>>   may-alias with a[i] load
>>>   tmp = cond ? Y : X;
>>>   a[i] = tmp
>>>
>>> We assume that on average the cost of a mispredicted branch is going to be
>>> higher than the cost of a second store to the same location, and that the
>>> secondary benefits of creating a bigger basic block for other optimizations to
>>> work on outway the potential case were the branch would be correctly predicted
>>> and the cost of the executing the second store would be noticably reflected in
>>> performance.
>>>
>>> hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
>>> this change we are on par with gcc's performance (gcc also performs this
>>> transformation). There was a 1.2 % performance improvement on a ARM swift chip.
>>> Other tests in the test-suite+external seem to be mostly uninfluenced in my
>>> experiments:
>>> This optimization was triggered on 41 tests such that the executable was
>>> different before/after the patch. Only 1 out of the 40 tests (dealII) was
>>> reproducable below 100% (by about .4%). Given that hmmer benefits so much I
>>> believe this to be a fair trade off.
>>>
>>> I am going to watch performance numbers across the builtbots and will revert
>>> this if anything unexpected comes up.
>>>
>>> Added:
>>>      llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll
>>> Modified:
>>>      llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp
>>>
>>> Modified: llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp?rev=179957&r1=179956&r2=179957&view=diff
>>> ==============================================================================
>>> --- llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp (original)
>>> +++ llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp Sat Apr 20 16:42:09 2013
>>> @@ -59,6 +59,10 @@ static cl::opt<bool>
>>>   SinkCommon("simplifycfg-sink-common", cl::Hidden, cl::init(true),
>>>          cl::desc("Sink common instructions down to the end block"));
>>>   +static cl::opt<bool>
>>> +HoistCondStores("simplifycfg-hoist-cond-stores", cl::Hidden, cl::init(true),
>>> +       cl::desc("Hoist conditional stores if an unconditional store preceeds"));
>>> +
>>>   STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
>>>   STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
>>>   STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
>>> @@ -1332,6 +1336,64 @@ static bool SinkThenElseCodeToEnd(Branch
>>>     return Changed;
>>>   }
>>>   +/// \brief Determine if we can hoist sink a sole store instruction out of a
>>> +/// conditional block.
>>> +///
>>> +/// We are looking for code like the following:
>>> +///   BrBB:
>>> +///     store i32 %add, i32* %arrayidx2
>>> +///     ... // No other stores or function calls (we could be calling a memory
>>> +///     ... // function).
>>> +///     %cmp = icmp ult %x, %y
>>> +///     br i1 %cmp, label %EndBB, label %ThenBB
>>> +///   ThenBB:
>>> +///     store i32 %add5, i32* %arrayidx2
>>> +///     br label EndBB
>>> +///   EndBB:
>>> +///     ...
>>> +///   We are going to transform this into:
>>> +///   BrBB:
>>> +///     store i32 %add, i32* %arrayidx2
>>> +///     ... //
>>> +///     %cmp = icmp ult %x, %y
>>> +///     %add.add5 = select i1 %cmp, i32 %add, %add5
>>> +///     store i32 %add.add5, i32* %arrayidx2
>>> +///     ...
>>> +///
>>> +/// \return The pointer to the value of the previous store if the store can be
>>> +///         hoisted into the predecessor block. 0 otherwise.
>>> +Value *isSafeToSpeculateStore(Instruction *I, BasicBlock *BrBB,
>>> +                              BasicBlock *StoreBB, BasicBlock *EndBB) {
>>> +  StoreInst *StoreToHoist = dyn_cast<StoreInst>(I);
>>> +  if (!StoreToHoist)
>>> +    return 0;
>>> +
>>> +  Value *StorePtr = StoreToHoist->getPointerOperand();
>>> +  StoreInst *PreviousStore = 0;
>>> +
>>> +  // Look for a store to the same pointer in BrBB.
>>> +  unsigned MaxNumInstToLookAt = 10;
>>> +  for (BasicBlock::reverse_iterator RI = BrBB->rbegin(),
>>> +       RE = BrBB->rend(); RI != RE && (--MaxNumInstToLookAt); ++RI) {
>>> +    Instruction *CurI = &*RI;
>>> +
>>> +    // Could be calling an instruction that effects memory like free().
>>> +    if (CurI->mayHaveSideEffects() && !isa<StoreInst>(CurI))
>>> +      return 0;
>>> +
>>> +    // Found the previous store make sure it stores to the same location.
>>> +    if ((PreviousStore = dyn_cast<StoreInst>(CurI))) {
>>> +      if (PreviousStore->getPointerOperand() == StorePtr)
>>> +        // Found the previous store, return its value operand.
>>> +        return PreviousStore->getValueOperand();
>>> +      else
>>> +        return 0; // Unknown store.
>>> +    }
>>> +  }
>>> +
>>> +  return 0;
>>> +}
>>> +
>>>   /// \brief Speculate a conditional basic block flattening the CFG.
>>>   ///
>>>   /// Note that this is a very risky transform currently. Speculating
>>> @@ -1395,6 +1457,8 @@ static bool SpeculativelyExecuteBB(Branc
>>>     SmallDenseMap<Instruction *, unsigned, 4> SinkCandidateUseCounts;
>>>       unsigned SpeculationCost = 0;
>>> +  Value *SpeculatedStoreValue = 0;
>>> +  StoreInst *SpeculatedStore = 0;
>>>     for (BasicBlock::iterator BBI = ThenBB->begin(),
>>>                               BBE = llvm::prior(ThenBB->end());
>>>          BBI != BBE; ++BBI) {
>>> @@ -1410,13 +1474,21 @@ static bool SpeculativelyExecuteBB(Branc
>>>         return false;
>>>         // Don't hoist the instruction if it's unsafe or expensive.
>>> -    if (!isSafeToSpeculativelyExecute(I))
>>> +    if (!isSafeToSpeculativelyExecute(I) &&
>>> +        !(HoistCondStores &&
>>> +          (SpeculatedStoreValue = isSafeToSpeculateStore(I, BB, ThenBB,
>>> +                                                         EndBB))))
>>>         return false;
>>> -    if (ComputeSpeculationCost(I) > PHINodeFoldingThreshold)
>>> +    if (!SpeculatedStoreValue &&
>>> +        ComputeSpeculationCost(I) > PHINodeFoldingThreshold)
>>>         return false;
>>>   +    // Store the store speculation candidate.
>>> +    if (SpeculatedStoreValue)
>>> +      SpeculatedStore = cast<StoreInst>(I);
>>> +
>>>       // Do not hoist the instruction if any of its operands are defined but not
>>> -    // used in this BB. The transformation will prevent the operand from
>>> +    // used in BB. The transformation will prevent the operand from
>>>       // being sunk into the use block.
>>>       for (User::op_iterator i = I->op_begin(), e = I->op_end();
>>>            i != e; ++i) {
>>> @@ -1473,12 +1545,24 @@ static bool SpeculativelyExecuteBB(Branc
>>>       // If there are no PHIs to process, bail early. This helps ensure idempotence
>>>     // as well.
>>> -  if (!HaveRewritablePHIs)
>>> +  if (!HaveRewritablePHIs && !(HoistCondStores && SpeculatedStoreValue))
>>>       return false;
>>>       // If we get here, we can hoist the instruction and if-convert.
>>>     DEBUG(dbgs() << "SPECULATIVELY EXECUTING BB" << *ThenBB << "\n";);
>>>   +  // Insert a select of the value of the speculated store.
>>> +  if (SpeculatedStoreValue) {
>>> +    IRBuilder<true, NoFolder> Builder(BI);
>>> +    Value *TrueV = SpeculatedStore->getValueOperand();
>>> +    Value *FalseV = SpeculatedStoreValue;
>>> +    if (Invert)
>>> +      std::swap(TrueV, FalseV);
>>> +    Value *S = Builder.CreateSelect(BrCond, TrueV, FalseV, TrueV->getName() +
>>> +                                    "." + FalseV->getName());
>>> +    SpeculatedStore->setOperand(0, S);
>>> +  }
>>> +
>>>     // Hoist the instructions.
>>>     BB->getInstList().splice(BI, ThenBB->getInstList(), ThenBB->begin(),
>>>                              llvm::prior(ThenBB->end()));
>>>
>>> Added: llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll?rev=179957&view=auto
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll (added)
>>> +++ llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll Sat Apr 20 16:42:09 2013
>>> @@ -0,0 +1,83 @@
>>> +; RUN: opt -simplifycfg -S < %s | FileCheck %s
>>> +
>>> +define void @ifconvertstore(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>>> +entry:
>>> +  %arrayidx = getelementptr inbounds i32* %B, i64 0
>>> +  %0 = load i32* %arrayidx, align 4
>>> +  %add = add nsw i32 %0, %C
>>> +  %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>>> +
>>> +; First store to the location.
>>> +  store i32 %add, i32* %arrayidx2, align 4
>>> +  %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>>> +  %1 = load i32* %arrayidx4, align 4
>>> +  %add5 = add nsw i32 %1, %D
>>> +  %cmp6 = icmp sgt i32 %add5, %C
>>> +  br i1 %cmp6, label %if.then, label %ret.end
>>> +
>>> +; Make sure we speculate stores like the following one. It is cheap compared to
>>> +; a mispredicated branch.
>>> +; CHECK: @ifconvertstore
>>> +; CHECK: %add5.add = select i1 %cmp6, i32 %add5, i32 %add
>>> +; CHECK: store i32 %add5.add, i32* %arrayidx2, align 4
>>> +if.then:
>>> +  store i32 %add5, i32* %arrayidx2, align 4
>>> +  br label %ret.end
>>> +
>>> +ret.end:
>>> +  ret void
>>> +}
>>> +
>>> +define void @noifconvertstore1(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>>> +entry:
>>> +  %arrayidx = getelementptr inbounds i32* %B, i64 0
>>> +  %0 = load i32* %arrayidx, align 4
>>> +  %add = add nsw i32 %0, %C
>>> +  %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>>> +
>>> +; Store to a different location.
>>> +  store i32 %add, i32* %arrayidx, align 4
>>> +  %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>>> +  %1 = load i32* %arrayidx4, align 4
>>> +  %add5 = add nsw i32 %1, %D
>>> +  %cmp6 = icmp sgt i32 %add5, %C
>>> +  br i1 %cmp6, label %if.then, label %ret.end
>>> +
>>> +; CHECK: @noifconvertstore1
>>> +; CHECK-NOT: select
>>> +if.then:
>>> +  store i32 %add5, i32* %arrayidx2, align 4
>>> +  br label %ret.end
>>> +
>>> +ret.end:
>>> +  ret void
>>> +}
>>> +
>>> +declare void @unknown_fun()
>>> +
>>> +define void @noifconvertstore2(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>>> +entry:
>>> +  %arrayidx = getelementptr inbounds i32* %B, i64 0
>>> +  %0 = load i32* %arrayidx, align 4
>>> +  %add = add nsw i32 %0, %C
>>> +  %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>>> +
>>> +; First store to the location.
>>> +  store i32 %add, i32* %arrayidx2, align 4
>>> +  call void @unknown_fun()
>>> +  %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>>> +  %1 = load i32* %arrayidx4, align 4
>>> +  %add5 = add nsw i32 %1, %D
>>> +  %cmp6 = icmp sgt i32 %add5, %C
>>> +  br i1 %cmp6, label %if.then, label %ret.end
>>> +
>>> +; CHECK: @noifconvertstore2
>>> +; CHECK-NOT: select
>>> +if.then:
>>> +  store i32 %add5, i32* %arrayidx2, align 4
>>> +  br label %ret.end
>>> +
>>> +ret.end:
>>> +  ret void
>>> +}
>>> +
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits




More information about the llvm-commits mailing list