[llvm] r179957 - SimplifyCFG: If convert single conditional stores
Arnold Schwaighofer
aschwaighofer at apple.com
Sat Apr 20 17:48:36 PDT 2013
Hi Shuxin,
We already get this transformation (eliminate the extra loads) even without my patch, GVN does this for us:
C-code excerpt:
mc[k] = mpp[k-1] + tpmm[k-1];
if ((sc = ip[k-1] + tpim[k-1]) > mc[k]) mc[k] = sc;
if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k]) mc[k] = sc;
Without my patch we already produce this:
%79 = add nsw i32 %78, %76 << else value
...
%85 = add nsw i32 %84, %82 << if value
%86 = icmp sgt i32 %85, %79
br i1 %86, label %87, label %88
; <label>:87 ; preds = %.lr.ph
store i32 %85, i32* %80, align 4, !tbaa !0
br label %88
; <label>:88 ; preds = %87, %.lr.ph
%89 = phi i32 [ %85, %87 ], [ %79, %.lr.ph ] << that is your eliminated load
%90 = getelementptr inbounds i32* %59, i64 %74
%91 = load i32* %90, align 4, !tbaa !0
%92 = getelementptr inbounds i32* %32, i64 %74
%93 = load i32* %92, align 4, !tbaa !0
%94 = add nsw i32 %93, %91
%95 = icmp sgt i32 %94, %89
br i1 %95, label %96, label %97
Best,
Arnold
On Apr 20, 2013, at 7:24 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
> I'm not going to discuss how to optimize/hack the famous loops in hmmer, I just focus on
> this transformation.
>
> IMHO, this xform is better motivated by the subsequent loads, which is the case in hmmer.
>
> ---------------
> if (cond)
> store val to x ; // x is just a memory unit, dose not necessary has name.
>
> ... load from x
> ---------------
>
> =>
> if (cond)
> t = val
> store x = t
>
> ... t
>
> ------------------
>
> by stripping off the condition of the store stmt, we can git rid of the load expr, which
> seems to be big save regardless the which combination of following conditions take place
> a)
> a1 the branch is biased (likely taken)
> a2) the branch is biased (likely fall-thru)
> a3) the branch is balanced
>
> b) the load-store unit of the underlying architecture has load-store forwarding capability.
>
> if b) dose not hold, it always win as we reduce a expensive load
> for combination a1 + b, the load request dose not go to cache, however, we still suffer
> from latency. (FIXME: load-store unit will blindly go through the pipeline).
> ....
>
>
>
>
>
>
> On 4/20/13 2:42 PM, Arnold Schwaighofer wrote:
>> Author: arnolds
>> Date: Sat Apr 20 16:42:09 2013
>> New Revision: 179957
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=179957&view=rev
>> Log:
>> SimplifyCFG: If convert single conditional stores
>>
>> This transformation will transform a conditional store with a preceeding
>> uncondtional store to the same location:
>>
>> a[i] =
>> may-alias with a[i] load
>> if (cond)
>> a[i] = Y
>>
>> into an unconditional store.
>>
>> a[i] = X
>> may-alias with a[i] load
>> tmp = cond ? Y : X;
>> a[i] = tmp
>>
>> We assume that on average the cost of a mispredicted branch is going to be
>> higher than the cost of a second store to the same location, and that the
>> secondary benefits of creating a bigger basic block for other optimizations to
>> work on outway the potential case were the branch would be correctly predicted
>> and the cost of the executing the second store would be noticably reflected in
>> performance.
>>
>> hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
>> this change we are on par with gcc's performance (gcc also performs this
>> transformation). There was a 1.2 % performance improvement on a ARM swift chip.
>> Other tests in the test-suite+external seem to be mostly uninfluenced in my
>> experiments:
>> This optimization was triggered on 41 tests such that the executable was
>> different before/after the patch. Only 1 out of the 40 tests (dealII) was
>> reproducable below 100% (by about .4%). Given that hmmer benefits so much I
>> believe this to be a fair trade off.
>>
>> I am going to watch performance numbers across the builtbots and will revert
>> this if anything unexpected comes up.
>>
>> Added:
>> llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll
>> Modified:
>> llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp
>>
>> Modified: llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp?rev=179957&r1=179956&r2=179957&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp (original)
>> +++ llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp Sat Apr 20 16:42:09 2013
>> @@ -59,6 +59,10 @@ static cl::opt<bool>
>> SinkCommon("simplifycfg-sink-common", cl::Hidden, cl::init(true),
>> cl::desc("Sink common instructions down to the end block"));
>> +static cl::opt<bool>
>> +HoistCondStores("simplifycfg-hoist-cond-stores", cl::Hidden, cl::init(true),
>> + cl::desc("Hoist conditional stores if an unconditional store preceeds"));
>> +
>> STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
>> STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
>> STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
>> @@ -1332,6 +1336,64 @@ static bool SinkThenElseCodeToEnd(Branch
>> return Changed;
>> }
>> +/// \brief Determine if we can hoist sink a sole store instruction out of a
>> +/// conditional block.
>> +///
>> +/// We are looking for code like the following:
>> +/// BrBB:
>> +/// store i32 %add, i32* %arrayidx2
>> +/// ... // No other stores or function calls (we could be calling a memory
>> +/// ... // function).
>> +/// %cmp = icmp ult %x, %y
>> +/// br i1 %cmp, label %EndBB, label %ThenBB
>> +/// ThenBB:
>> +/// store i32 %add5, i32* %arrayidx2
>> +/// br label EndBB
>> +/// EndBB:
>> +/// ...
>> +/// We are going to transform this into:
>> +/// BrBB:
>> +/// store i32 %add, i32* %arrayidx2
>> +/// ... //
>> +/// %cmp = icmp ult %x, %y
>> +/// %add.add5 = select i1 %cmp, i32 %add, %add5
>> +/// store i32 %add.add5, i32* %arrayidx2
>> +/// ...
>> +///
>> +/// \return The pointer to the value of the previous store if the store can be
>> +/// hoisted into the predecessor block. 0 otherwise.
>> +Value *isSafeToSpeculateStore(Instruction *I, BasicBlock *BrBB,
>> + BasicBlock *StoreBB, BasicBlock *EndBB) {
>> + StoreInst *StoreToHoist = dyn_cast<StoreInst>(I);
>> + if (!StoreToHoist)
>> + return 0;
>> +
>> + Value *StorePtr = StoreToHoist->getPointerOperand();
>> + StoreInst *PreviousStore = 0;
>> +
>> + // Look for a store to the same pointer in BrBB.
>> + unsigned MaxNumInstToLookAt = 10;
>> + for (BasicBlock::reverse_iterator RI = BrBB->rbegin(),
>> + RE = BrBB->rend(); RI != RE && (--MaxNumInstToLookAt); ++RI) {
>> + Instruction *CurI = &*RI;
>> +
>> + // Could be calling an instruction that effects memory like free().
>> + if (CurI->mayHaveSideEffects() && !isa<StoreInst>(CurI))
>> + return 0;
>> +
>> + // Found the previous store make sure it stores to the same location.
>> + if ((PreviousStore = dyn_cast<StoreInst>(CurI))) {
>> + if (PreviousStore->getPointerOperand() == StorePtr)
>> + // Found the previous store, return its value operand.
>> + return PreviousStore->getValueOperand();
>> + else
>> + return 0; // Unknown store.
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> +
>> /// \brief Speculate a conditional basic block flattening the CFG.
>> ///
>> /// Note that this is a very risky transform currently. Speculating
>> @@ -1395,6 +1457,8 @@ static bool SpeculativelyExecuteBB(Branc
>> SmallDenseMap<Instruction *, unsigned, 4> SinkCandidateUseCounts;
>> unsigned SpeculationCost = 0;
>> + Value *SpeculatedStoreValue = 0;
>> + StoreInst *SpeculatedStore = 0;
>> for (BasicBlock::iterator BBI = ThenBB->begin(),
>> BBE = llvm::prior(ThenBB->end());
>> BBI != BBE; ++BBI) {
>> @@ -1410,13 +1474,21 @@ static bool SpeculativelyExecuteBB(Branc
>> return false;
>> // Don't hoist the instruction if it's unsafe or expensive.
>> - if (!isSafeToSpeculativelyExecute(I))
>> + if (!isSafeToSpeculativelyExecute(I) &&
>> + !(HoistCondStores &&
>> + (SpeculatedStoreValue = isSafeToSpeculateStore(I, BB, ThenBB,
>> + EndBB))))
>> return false;
>> - if (ComputeSpeculationCost(I) > PHINodeFoldingThreshold)
>> + if (!SpeculatedStoreValue &&
>> + ComputeSpeculationCost(I) > PHINodeFoldingThreshold)
>> return false;
>> + // Store the store speculation candidate.
>> + if (SpeculatedStoreValue)
>> + SpeculatedStore = cast<StoreInst>(I);
>> +
>> // Do not hoist the instruction if any of its operands are defined but not
>> - // used in this BB. The transformation will prevent the operand from
>> + // used in BB. The transformation will prevent the operand from
>> // being sunk into the use block.
>> for (User::op_iterator i = I->op_begin(), e = I->op_end();
>> i != e; ++i) {
>> @@ -1473,12 +1545,24 @@ static bool SpeculativelyExecuteBB(Branc
>> // If there are no PHIs to process, bail early. This helps ensure idempotence
>> // as well.
>> - if (!HaveRewritablePHIs)
>> + if (!HaveRewritablePHIs && !(HoistCondStores && SpeculatedStoreValue))
>> return false;
>> // If we get here, we can hoist the instruction and if-convert.
>> DEBUG(dbgs() << "SPECULATIVELY EXECUTING BB" << *ThenBB << "\n";);
>> + // Insert a select of the value of the speculated store.
>> + if (SpeculatedStoreValue) {
>> + IRBuilder<true, NoFolder> Builder(BI);
>> + Value *TrueV = SpeculatedStore->getValueOperand();
>> + Value *FalseV = SpeculatedStoreValue;
>> + if (Invert)
>> + std::swap(TrueV, FalseV);
>> + Value *S = Builder.CreateSelect(BrCond, TrueV, FalseV, TrueV->getName() +
>> + "." + FalseV->getName());
>> + SpeculatedStore->setOperand(0, S);
>> + }
>> +
>> // Hoist the instructions.
>> BB->getInstList().splice(BI, ThenBB->getInstList(), ThenBB->begin(),
>> llvm::prior(ThenBB->end()));
>>
>> Added: llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll?rev=179957&view=auto
>> ==============================================================================
>> --- llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll (added)
>> +++ llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll Sat Apr 20 16:42:09 2013
>> @@ -0,0 +1,83 @@
>> +; RUN: opt -simplifycfg -S < %s | FileCheck %s
>> +
>> +define void @ifconvertstore(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>> +entry:
>> + %arrayidx = getelementptr inbounds i32* %B, i64 0
>> + %0 = load i32* %arrayidx, align 4
>> + %add = add nsw i32 %0, %C
>> + %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>> +
>> +; First store to the location.
>> + store i32 %add, i32* %arrayidx2, align 4
>> + %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>> + %1 = load i32* %arrayidx4, align 4
>> + %add5 = add nsw i32 %1, %D
>> + %cmp6 = icmp sgt i32 %add5, %C
>> + br i1 %cmp6, label %if.then, label %ret.end
>> +
>> +; Make sure we speculate stores like the following one. It is cheap compared to
>> +; a mispredicated branch.
>> +; CHECK: @ifconvertstore
>> +; CHECK: %add5.add = select i1 %cmp6, i32 %add5, i32 %add
>> +; CHECK: store i32 %add5.add, i32* %arrayidx2, align 4
>> +if.then:
>> + store i32 %add5, i32* %arrayidx2, align 4
>> + br label %ret.end
>> +
>> +ret.end:
>> + ret void
>> +}
>> +
>> +define void @noifconvertstore1(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>> +entry:
>> + %arrayidx = getelementptr inbounds i32* %B, i64 0
>> + %0 = load i32* %arrayidx, align 4
>> + %add = add nsw i32 %0, %C
>> + %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>> +
>> +; Store to a different location.
>> + store i32 %add, i32* %arrayidx, align 4
>> + %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>> + %1 = load i32* %arrayidx4, align 4
>> + %add5 = add nsw i32 %1, %D
>> + %cmp6 = icmp sgt i32 %add5, %C
>> + br i1 %cmp6, label %if.then, label %ret.end
>> +
>> +; CHECK: @noifconvertstore1
>> +; CHECK-NOT: select
>> +if.then:
>> + store i32 %add5, i32* %arrayidx2, align 4
>> + br label %ret.end
>> +
>> +ret.end:
>> + ret void
>> +}
>> +
>> +declare void @unknown_fun()
>> +
>> +define void @noifconvertstore2(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>> +entry:
>> + %arrayidx = getelementptr inbounds i32* %B, i64 0
>> + %0 = load i32* %arrayidx, align 4
>> + %add = add nsw i32 %0, %C
>> + %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>> +
>> +; First store to the location.
>> + store i32 %add, i32* %arrayidx2, align 4
>> + call void @unknown_fun()
>> + %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>> + %1 = load i32* %arrayidx4, align 4
>> + %add5 = add nsw i32 %1, %D
>> + %cmp6 = icmp sgt i32 %add5, %C
>> + br i1 %cmp6, label %if.then, label %ret.end
>> +
>> +; CHECK: @noifconvertstore2
>> +; CHECK-NOT: select
>> +if.then:
>> + store i32 %add5, i32* %arrayidx2, align 4
>> + br label %ret.end
>> +
>> +ret.end:
>> + ret void
>> +}
>> +
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list