[llvm] r179957 - SimplifyCFG: If convert single conditional stores
Shuxin Yang
shuxin.llvm at gmail.com
Sat Apr 20 17:54:32 PDT 2013
It tail dup the load, which dosen't necessarily simplify the CFG.
I'm from CFG simplification perspective.
On 4/20/13 5:48 PM, Arnold Schwaighofer wrote:
> Hi Shuxin,
>
> We already get this transformation (eliminate the extra loads) even without my patch, GVN does this for us:
>
> C-code excerpt:
>
> mc[k] = mpp[k-1] + tpmm[k-1];
> if ((sc = ip[k-1] + tpim[k-1]) > mc[k]) mc[k] = sc;
> if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k]) mc[k] = sc;
>
> Without my patch we already produce this:
>
> %79 = add nsw i32 %78, %76 << else value
> ...
> %85 = add nsw i32 %84, %82 << if value
> %86 = icmp sgt i32 %85, %79
> br i1 %86, label %87, label %88
>
> ; <label>:87 ; preds = %.lr.ph
> store i32 %85, i32* %80, align 4, !tbaa !0
> br label %88
>
> ; <label>:88 ; preds = %87, %.lr.ph
> %89 = phi i32 [ %85, %87 ], [ %79, %.lr.ph ] << that is your eliminated load
> %90 = getelementptr inbounds i32* %59, i64 %74
> %91 = load i32* %90, align 4, !tbaa !0
> %92 = getelementptr inbounds i32* %32, i64 %74
> %93 = load i32* %92, align 4, !tbaa !0
> %94 = add nsw i32 %93, %91
> %95 = icmp sgt i32 %94, %89
> br i1 %95, label %96, label %97
>
> Best,
> Arnold
>
> On Apr 20, 2013, at 7:24 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>
>> I'm not going to discuss how to optimize/hack the famous loops in hmmer, I just focus on
>> this transformation.
>>
>> IMHO, this xform is better motivated by the subsequent loads, which is the case in hmmer.
>>
>
>> ---------------
>> if (cond)
>> store val to x ; // x is just a memory unit, dose not necessary has name.
>>
>> ... load from x
>> ---------------
>>
>> =>
>> if (cond)
>> t = val
>> store x = t
>>
>> ... t
>>
>> ------------------
>>
>> by stripping off the condition of the store stmt, we can git rid of the load expr, which
>> seems to be big save regardless the which combination of following conditions take place
>> a)
>> a1 the branch is biased (likely taken)
>> a2) the branch is biased (likely fall-thru)
>> a3) the branch is balanced
>>
>> b) the load-store unit of the underlying architecture has load-store forwarding capability.
>>
>> if b) dose not hold, it always win as we reduce a expensive load
>> for combination a1 + b, the load request dose not go to cache, however, we still suffer
>> from latency. (FIXME: load-store unit will blindly go through the pipeline).
>> ....
>>
>>
>>
>>
>>
>>
>> On 4/20/13 2:42 PM, Arnold Schwaighofer wrote:
>>> Author: arnolds
>>> Date: Sat Apr 20 16:42:09 2013
>>> New Revision: 179957
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=179957&view=rev
>>> Log:
>>> SimplifyCFG: If convert single conditional stores
>>>
>>> This transformation will transform a conditional store with a preceeding
>>> uncondtional store to the same location:
>>>
>>> a[i] =
>>> may-alias with a[i] load
>>> if (cond)
>>> a[i] = Y
>>>
>>> into an unconditional store.
>>>
>>> a[i] = X
>>> may-alias with a[i] load
>>> tmp = cond ? Y : X;
>>> a[i] = tmp
>>>
>>> We assume that on average the cost of a mispredicted branch is going to be
>>> higher than the cost of a second store to the same location, and that the
>>> secondary benefits of creating a bigger basic block for other optimizations to
>>> work on outway the potential case were the branch would be correctly predicted
>>> and the cost of the executing the second store would be noticably reflected in
>>> performance.
>>>
>>> hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
>>> this change we are on par with gcc's performance (gcc also performs this
>>> transformation). There was a 1.2 % performance improvement on a ARM swift chip.
>>> Other tests in the test-suite+external seem to be mostly uninfluenced in my
>>> experiments:
>>> This optimization was triggered on 41 tests such that the executable was
>>> different before/after the patch. Only 1 out of the 40 tests (dealII) was
>>> reproducable below 100% (by about .4%). Given that hmmer benefits so much I
>>> believe this to be a fair trade off.
>>>
>>> I am going to watch performance numbers across the builtbots and will revert
>>> this if anything unexpected comes up.
>>>
>>> Added:
>>> llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll
>>> Modified:
>>> llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp
>>>
>>> Modified: llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp?rev=179957&r1=179956&r2=179957&view=diff
>>> ==============================================================================
>>> --- llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp (original)
>>> +++ llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp Sat Apr 20 16:42:09 2013
>>> @@ -59,6 +59,10 @@ static cl::opt<bool>
>>> SinkCommon("simplifycfg-sink-common", cl::Hidden, cl::init(true),
>>> cl::desc("Sink common instructions down to the end block"));
>>> +static cl::opt<bool>
>>> +HoistCondStores("simplifycfg-hoist-cond-stores", cl::Hidden, cl::init(true),
>>> + cl::desc("Hoist conditional stores if an unconditional store preceeds"));
>>> +
>>> STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
>>> STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
>>> STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
>>> @@ -1332,6 +1336,64 @@ static bool SinkThenElseCodeToEnd(Branch
>>> return Changed;
>>> }
>>> +/// \brief Determine if we can hoist sink a sole store instruction out of a
>>> +/// conditional block.
>>> +///
>>> +/// We are looking for code like the following:
>>> +/// BrBB:
>>> +/// store i32 %add, i32* %arrayidx2
>>> +/// ... // No other stores or function calls (we could be calling a memory
>>> +/// ... // function).
>>> +/// %cmp = icmp ult %x, %y
>>> +/// br i1 %cmp, label %EndBB, label %ThenBB
>>> +/// ThenBB:
>>> +/// store i32 %add5, i32* %arrayidx2
>>> +/// br label EndBB
>>> +/// EndBB:
>>> +/// ...
>>> +/// We are going to transform this into:
>>> +/// BrBB:
>>> +/// store i32 %add, i32* %arrayidx2
>>> +/// ... //
>>> +/// %cmp = icmp ult %x, %y
>>> +/// %add.add5 = select i1 %cmp, i32 %add, %add5
>>> +/// store i32 %add.add5, i32* %arrayidx2
>>> +/// ...
>>> +///
>>> +/// \return The pointer to the value of the previous store if the store can be
>>> +/// hoisted into the predecessor block. 0 otherwise.
>>> +Value *isSafeToSpeculateStore(Instruction *I, BasicBlock *BrBB,
>>> + BasicBlock *StoreBB, BasicBlock *EndBB) {
>>> + StoreInst *StoreToHoist = dyn_cast<StoreInst>(I);
>>> + if (!StoreToHoist)
>>> + return 0;
>>> +
>>> + Value *StorePtr = StoreToHoist->getPointerOperand();
>>> + StoreInst *PreviousStore = 0;
>>> +
>>> + // Look for a store to the same pointer in BrBB.
>>> + unsigned MaxNumInstToLookAt = 10;
>>> + for (BasicBlock::reverse_iterator RI = BrBB->rbegin(),
>>> + RE = BrBB->rend(); RI != RE && (--MaxNumInstToLookAt); ++RI) {
>>> + Instruction *CurI = &*RI;
>>> +
>>> + // Could be calling an instruction that effects memory like free().
>>> + if (CurI->mayHaveSideEffects() && !isa<StoreInst>(CurI))
>>> + return 0;
>>> +
>>> + // Found the previous store make sure it stores to the same location.
>>> + if ((PreviousStore = dyn_cast<StoreInst>(CurI))) {
>>> + if (PreviousStore->getPointerOperand() == StorePtr)
>>> + // Found the previous store, return its value operand.
>>> + return PreviousStore->getValueOperand();
>>> + else
>>> + return 0; // Unknown store.
>>> + }
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> /// \brief Speculate a conditional basic block flattening the CFG.
>>> ///
>>> /// Note that this is a very risky transform currently. Speculating
>>> @@ -1395,6 +1457,8 @@ static bool SpeculativelyExecuteBB(Branc
>>> SmallDenseMap<Instruction *, unsigned, 4> SinkCandidateUseCounts;
>>> unsigned SpeculationCost = 0;
>>> + Value *SpeculatedStoreValue = 0;
>>> + StoreInst *SpeculatedStore = 0;
>>> for (BasicBlock::iterator BBI = ThenBB->begin(),
>>> BBE = llvm::prior(ThenBB->end());
>>> BBI != BBE; ++BBI) {
>>> @@ -1410,13 +1474,21 @@ static bool SpeculativelyExecuteBB(Branc
>>> return false;
>>> // Don't hoist the instruction if it's unsafe or expensive.
>>> - if (!isSafeToSpeculativelyExecute(I))
>>> + if (!isSafeToSpeculativelyExecute(I) &&
>>> + !(HoistCondStores &&
>>> + (SpeculatedStoreValue = isSafeToSpeculateStore(I, BB, ThenBB,
>>> + EndBB))))
>>> return false;
>>> - if (ComputeSpeculationCost(I) > PHINodeFoldingThreshold)
>>> + if (!SpeculatedStoreValue &&
>>> + ComputeSpeculationCost(I) > PHINodeFoldingThreshold)
>>> return false;
>>> + // Store the store speculation candidate.
>>> + if (SpeculatedStoreValue)
>>> + SpeculatedStore = cast<StoreInst>(I);
>>> +
>>> // Do not hoist the instruction if any of its operands are defined but not
>>> - // used in this BB. The transformation will prevent the operand from
>>> + // used in BB. The transformation will prevent the operand from
>>> // being sunk into the use block.
>>> for (User::op_iterator i = I->op_begin(), e = I->op_end();
>>> i != e; ++i) {
>>> @@ -1473,12 +1545,24 @@ static bool SpeculativelyExecuteBB(Branc
>>> // If there are no PHIs to process, bail early. This helps ensure idempotence
>>> // as well.
>>> - if (!HaveRewritablePHIs)
>>> + if (!HaveRewritablePHIs && !(HoistCondStores && SpeculatedStoreValue))
>>> return false;
>>> // If we get here, we can hoist the instruction and if-convert.
>>> DEBUG(dbgs() << "SPECULATIVELY EXECUTING BB" << *ThenBB << "\n";);
>>> + // Insert a select of the value of the speculated store.
>>> + if (SpeculatedStoreValue) {
>>> + IRBuilder<true, NoFolder> Builder(BI);
>>> + Value *TrueV = SpeculatedStore->getValueOperand();
>>> + Value *FalseV = SpeculatedStoreValue;
>>> + if (Invert)
>>> + std::swap(TrueV, FalseV);
>>> + Value *S = Builder.CreateSelect(BrCond, TrueV, FalseV, TrueV->getName() +
>>> + "." + FalseV->getName());
>>> + SpeculatedStore->setOperand(0, S);
>>> + }
>>> +
>>> // Hoist the instructions.
>>> BB->getInstList().splice(BI, ThenBB->getInstList(), ThenBB->begin(),
>>> llvm::prior(ThenBB->end()));
>>>
>>> Added: llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll?rev=179957&view=auto
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll (added)
>>> +++ llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll Sat Apr 20 16:42:09 2013
>>> @@ -0,0 +1,83 @@
>>> +; RUN: opt -simplifycfg -S < %s | FileCheck %s
>>> +
>>> +define void @ifconvertstore(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>>> +entry:
>>> + %arrayidx = getelementptr inbounds i32* %B, i64 0
>>> + %0 = load i32* %arrayidx, align 4
>>> + %add = add nsw i32 %0, %C
>>> + %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>>> +
>>> +; First store to the location.
>>> + store i32 %add, i32* %arrayidx2, align 4
>>> + %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>>> + %1 = load i32* %arrayidx4, align 4
>>> + %add5 = add nsw i32 %1, %D
>>> + %cmp6 = icmp sgt i32 %add5, %C
>>> + br i1 %cmp6, label %if.then, label %ret.end
>>> +
>>> +; Make sure we speculate stores like the following one. It is cheap compared to
>>> +; a mispredicated branch.
>>> +; CHECK: @ifconvertstore
>>> +; CHECK: %add5.add = select i1 %cmp6, i32 %add5, i32 %add
>>> +; CHECK: store i32 %add5.add, i32* %arrayidx2, align 4
>>> +if.then:
>>> + store i32 %add5, i32* %arrayidx2, align 4
>>> + br label %ret.end
>>> +
>>> +ret.end:
>>> + ret void
>>> +}
>>> +
>>> +define void @noifconvertstore1(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>>> +entry:
>>> + %arrayidx = getelementptr inbounds i32* %B, i64 0
>>> + %0 = load i32* %arrayidx, align 4
>>> + %add = add nsw i32 %0, %C
>>> + %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>>> +
>>> +; Store to a different location.
>>> + store i32 %add, i32* %arrayidx, align 4
>>> + %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>>> + %1 = load i32* %arrayidx4, align 4
>>> + %add5 = add nsw i32 %1, %D
>>> + %cmp6 = icmp sgt i32 %add5, %C
>>> + br i1 %cmp6, label %if.then, label %ret.end
>>> +
>>> +; CHECK: @noifconvertstore1
>>> +; CHECK-NOT: select
>>> +if.then:
>>> + store i32 %add5, i32* %arrayidx2, align 4
>>> + br label %ret.end
>>> +
>>> +ret.end:
>>> + ret void
>>> +}
>>> +
>>> +declare void @unknown_fun()
>>> +
>>> +define void @noifconvertstore2(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>>> +entry:
>>> + %arrayidx = getelementptr inbounds i32* %B, i64 0
>>> + %0 = load i32* %arrayidx, align 4
>>> + %add = add nsw i32 %0, %C
>>> + %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>>> +
>>> +; First store to the location.
>>> + store i32 %add, i32* %arrayidx2, align 4
>>> + call void @unknown_fun()
>>> + %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>>> + %1 = load i32* %arrayidx4, align 4
>>> + %add5 = add nsw i32 %1, %D
>>> + %cmp6 = icmp sgt i32 %add5, %C
>>> + br i1 %cmp6, label %if.then, label %ret.end
>>> +
>>> +; CHECK: @noifconvertstore2
>>> +; CHECK-NOT: select
>>> +if.then:
>>> + store i32 %add5, i32* %arrayidx2, align 4
>>> + br label %ret.end
>>> +
>>> +ret.end:
>>> + ret void
>>> +}
>>> +
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list