[llvm] r179957 - SimplifyCFG: If convert single conditional stores
Arnold Schwaighofer
aschwaighofer at apple.com
Sat Apr 20 18:04:38 PDT 2013
On Apr 20, 2013, at 7:54 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
> It tail dup the load, which dosen't necessarily simplify the CFG.
> I'm from CFG simplification perspective.
>
It does not perform select/if conversation of the store - hence my patch.
But the loads are already eliminated so I can't use them to motivate my patch - that is all I am saying.
>
> On 4/20/13 5:48 PM, Arnold Schwaighofer wrote:
>> Hi Shuxin,
>>
>> We already get this transformation (eliminate the extra loads) even without my patch, GVN does this for us:
>> C-code excerpt:
>>
>> mc[k] = mpp[k-1] + tpmm[k-1];
>> if ((sc = ip[k-1] + tpim[k-1]) > mc[k]) mc[k] = sc;
>> if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k]) mc[k] = sc;
>>
>> Without my patch we already produce this:
>>
>> %79 = add nsw i32 %78, %76 << else value
>> ...
>> %85 = add nsw i32 %84, %82 << if value
>> %86 = icmp sgt i32 %85, %79
>> br i1 %86, label %87, label %88
>>
>> ; <label>:87 ; preds = %.lr.ph
>> store i32 %85, i32* %80, align 4, !tbaa !0
>> br label %88
>>
>> ; <label>:88 ; preds = %87, %.lr.ph
>> %89 = phi i32 [ %85, %87 ], [ %79, %.lr.ph ] << that is your eliminated load
>> %90 = getelementptr inbounds i32* %59, i64 %74
>> %91 = load i32* %90, align 4, !tbaa !0
>> %92 = getelementptr inbounds i32* %32, i64 %74
>> %93 = load i32* %92, align 4, !tbaa !0
>> %94 = add nsw i32 %93, %91
>> %95 = icmp sgt i32 %94, %89
>> br i1 %95, label %96, label %97
>>
>> Best,
>> Arnold
>>
>> On Apr 20, 2013, at 7:24 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>>
>>> I'm not going to discuss how to optimize/hack the famous loops in hmmer, I just focus on
>>> this transformation.
>>>
>>> IMHO, this xform is better motivated by the subsequent loads, which is the case in hmmer.
>>>
>>
>>> ---------------
>>> if (cond)
>>> store val to x ; // x is just a memory unit, dose not necessary has name.
>>>
>>> ... load from x
>>> ---------------
>>>
>>> =>
>>> if (cond)
>>> t = val
>>> store x = t
>>>
>>> ... t
>>>
>>> ------------------
>>>
>>> by stripping off the condition of the store stmt, we can git rid of the load expr, which
>>> seems to be big save regardless the which combination of following conditions take place
>>> a)
>>> a1 the branch is biased (likely taken)
>>> a2) the branch is biased (likely fall-thru)
>>> a3) the branch is balanced
>>>
>>> b) the load-store unit of the underlying architecture has load-store forwarding capability.
>>>
>>> if b) dose not hold, it always win as we reduce a expensive load
>>> for combination a1 + b, the load request dose not go to cache, however, we still suffer
>>> from latency. (FIXME: load-store unit will blindly go through the pipeline).
>>> ....
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 4/20/13 2:42 PM, Arnold Schwaighofer wrote:
>>>> Author: arnolds
>>>> Date: Sat Apr 20 16:42:09 2013
>>>> New Revision: 179957
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=179957&view=rev
>>>> Log:
>>>> SimplifyCFG: If convert single conditional stores
>>>>
>>>> This transformation will transform a conditional store with a preceeding
>>>> uncondtional store to the same location:
>>>>
>>>> a[i] =
>>>> may-alias with a[i] load
>>>> if (cond)
>>>> a[i] = Y
>>>>
>>>> into an unconditional store.
>>>>
>>>> a[i] = X
>>>> may-alias with a[i] load
>>>> tmp = cond ? Y : X;
>>>> a[i] = tmp
>>>>
>>>> We assume that on average the cost of a mispredicted branch is going to be
>>>> higher than the cost of a second store to the same location, and that the
>>>> secondary benefits of creating a bigger basic block for other optimizations to
>>>> work on outway the potential case were the branch would be correctly predicted
>>>> and the cost of the executing the second store would be noticably reflected in
>>>> performance.
>>>>
>>>> hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
>>>> this change we are on par with gcc's performance (gcc also performs this
>>>> transformation). There was a 1.2 % performance improvement on a ARM swift chip.
>>>> Other tests in the test-suite+external seem to be mostly uninfluenced in my
>>>> experiments:
>>>> This optimization was triggered on 41 tests such that the executable was
>>>> different before/after the patch. Only 1 out of the 40 tests (dealII) was
>>>> reproducable below 100% (by about .4%). Given that hmmer benefits so much I
>>>> believe this to be a fair trade off.
>>>>
>>>> I am going to watch performance numbers across the builtbots and will revert
>>>> this if anything unexpected comes up.
>>>>
>>>> Added:
>>>> llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll
>>>> Modified:
>>>> llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp
>>>>
>>>> Modified: llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp?rev=179957&r1=179956&r2=179957&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp (original)
>>>> +++ llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp Sat Apr 20 16:42:09 2013
>>>> @@ -59,6 +59,10 @@ static cl::opt<bool>
>>>> SinkCommon("simplifycfg-sink-common", cl::Hidden, cl::init(true),
>>>> cl::desc("Sink common instructions down to the end block"));
>>>> +static cl::opt<bool>
>>>> +HoistCondStores("simplifycfg-hoist-cond-stores", cl::Hidden, cl::init(true),
>>>> + cl::desc("Hoist conditional stores if an unconditional store preceeds"));
>>>> +
>>>> STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
>>>> STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
>>>> STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
>>>> @@ -1332,6 +1336,64 @@ static bool SinkThenElseCodeToEnd(Branch
>>>> return Changed;
>>>> }
>>>> +/// \brief Determine if we can hoist sink a sole store instruction out of a
>>>> +/// conditional block.
>>>> +///
>>>> +/// We are looking for code like the following:
>>>> +/// BrBB:
>>>> +/// store i32 %add, i32* %arrayidx2
>>>> +/// ... // No other stores or function calls (we could be calling a memory
>>>> +/// ... // function).
>>>> +/// %cmp = icmp ult %x, %y
>>>> +/// br i1 %cmp, label %EndBB, label %ThenBB
>>>> +/// ThenBB:
>>>> +/// store i32 %add5, i32* %arrayidx2
>>>> +/// br label EndBB
>>>> +/// EndBB:
>>>> +/// ...
>>>> +/// We are going to transform this into:
>>>> +/// BrBB:
>>>> +/// store i32 %add, i32* %arrayidx2
>>>> +/// ... //
>>>> +/// %cmp = icmp ult %x, %y
>>>> +/// %add.add5 = select i1 %cmp, i32 %add, %add5
>>>> +/// store i32 %add.add5, i32* %arrayidx2
>>>> +/// ...
>>>> +///
>>>> +/// \return The pointer to the value of the previous store if the store can be
>>>> +/// hoisted into the predecessor block. 0 otherwise.
>>>> +Value *isSafeToSpeculateStore(Instruction *I, BasicBlock *BrBB,
>>>> + BasicBlock *StoreBB, BasicBlock *EndBB) {
>>>> + StoreInst *StoreToHoist = dyn_cast<StoreInst>(I);
>>>> + if (!StoreToHoist)
>>>> + return 0;
>>>> +
>>>> + Value *StorePtr = StoreToHoist->getPointerOperand();
>>>> + StoreInst *PreviousStore = 0;
>>>> +
>>>> + // Look for a store to the same pointer in BrBB.
>>>> + unsigned MaxNumInstToLookAt = 10;
>>>> + for (BasicBlock::reverse_iterator RI = BrBB->rbegin(),
>>>> + RE = BrBB->rend(); RI != RE && (--MaxNumInstToLookAt); ++RI) {
>>>> + Instruction *CurI = &*RI;
>>>> +
>>>> + // Could be calling an instruction that effects memory like free().
>>>> + if (CurI->mayHaveSideEffects() && !isa<StoreInst>(CurI))
>>>> + return 0;
>>>> +
>>>> + // Found the previous store make sure it stores to the same location.
>>>> + if ((PreviousStore = dyn_cast<StoreInst>(CurI))) {
>>>> + if (PreviousStore->getPointerOperand() == StorePtr)
>>>> + // Found the previous store, return its value operand.
>>>> + return PreviousStore->getValueOperand();
>>>> + else
>>>> + return 0; // Unknown store.
>>>> + }
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> /// \brief Speculate a conditional basic block flattening the CFG.
>>>> ///
>>>> /// Note that this is a very risky transform currently. Speculating
>>>> @@ -1395,6 +1457,8 @@ static bool SpeculativelyExecuteBB(Branc
>>>> SmallDenseMap<Instruction *, unsigned, 4> SinkCandidateUseCounts;
>>>> unsigned SpeculationCost = 0;
>>>> + Value *SpeculatedStoreValue = 0;
>>>> + StoreInst *SpeculatedStore = 0;
>>>> for (BasicBlock::iterator BBI = ThenBB->begin(),
>>>> BBE = llvm::prior(ThenBB->end());
>>>> BBI != BBE; ++BBI) {
>>>> @@ -1410,13 +1474,21 @@ static bool SpeculativelyExecuteBB(Branc
>>>> return false;
>>>> // Don't hoist the instruction if it's unsafe or expensive.
>>>> - if (!isSafeToSpeculativelyExecute(I))
>>>> + if (!isSafeToSpeculativelyExecute(I) &&
>>>> + !(HoistCondStores &&
>>>> + (SpeculatedStoreValue = isSafeToSpeculateStore(I, BB, ThenBB,
>>>> + EndBB))))
>>>> return false;
>>>> - if (ComputeSpeculationCost(I) > PHINodeFoldingThreshold)
>>>> + if (!SpeculatedStoreValue &&
>>>> + ComputeSpeculationCost(I) > PHINodeFoldingThreshold)
>>>> return false;
>>>> + // Store the store speculation candidate.
>>>> + if (SpeculatedStoreValue)
>>>> + SpeculatedStore = cast<StoreInst>(I);
>>>> +
>>>> // Do not hoist the instruction if any of its operands are defined but not
>>>> - // used in this BB. The transformation will prevent the operand from
>>>> + // used in BB. The transformation will prevent the operand from
>>>> // being sunk into the use block.
>>>> for (User::op_iterator i = I->op_begin(), e = I->op_end();
>>>> i != e; ++i) {
>>>> @@ -1473,12 +1545,24 @@ static bool SpeculativelyExecuteBB(Branc
>>>> // If there are no PHIs to process, bail early. This helps ensure idempotence
>>>> // as well.
>>>> - if (!HaveRewritablePHIs)
>>>> + if (!HaveRewritablePHIs && !(HoistCondStores && SpeculatedStoreValue))
>>>> return false;
>>>> // If we get here, we can hoist the instruction and if-convert.
>>>> DEBUG(dbgs() << "SPECULATIVELY EXECUTING BB" << *ThenBB << "\n";);
>>>> + // Insert a select of the value of the speculated store.
>>>> + if (SpeculatedStoreValue) {
>>>> + IRBuilder<true, NoFolder> Builder(BI);
>>>> + Value *TrueV = SpeculatedStore->getValueOperand();
>>>> + Value *FalseV = SpeculatedStoreValue;
>>>> + if (Invert)
>>>> + std::swap(TrueV, FalseV);
>>>> + Value *S = Builder.CreateSelect(BrCond, TrueV, FalseV, TrueV->getName() +
>>>> + "." + FalseV->getName());
>>>> + SpeculatedStore->setOperand(0, S);
>>>> + }
>>>> +
>>>> // Hoist the instructions.
>>>> BB->getInstList().splice(BI, ThenBB->getInstList(), ThenBB->begin(),
>>>> llvm::prior(ThenBB->end()));
>>>>
>>>> Added: llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll?rev=179957&view=auto
>>>> ==============================================================================
>>>> --- llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll (added)
>>>> +++ llvm/trunk/test/Transforms/SimplifyCFG/speculate-store.ll Sat Apr 20 16:42:09 2013
>>>> @@ -0,0 +1,83 @@
>>>> +; RUN: opt -simplifycfg -S < %s | FileCheck %s
>>>> +
>>>> +define void @ifconvertstore(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>>>> +entry:
>>>> + %arrayidx = getelementptr inbounds i32* %B, i64 0
>>>> + %0 = load i32* %arrayidx, align 4
>>>> + %add = add nsw i32 %0, %C
>>>> + %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>>>> +
>>>> +; First store to the location.
>>>> + store i32 %add, i32* %arrayidx2, align 4
>>>> + %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>>>> + %1 = load i32* %arrayidx4, align 4
>>>> + %add5 = add nsw i32 %1, %D
>>>> + %cmp6 = icmp sgt i32 %add5, %C
>>>> + br i1 %cmp6, label %if.then, label %ret.end
>>>> +
>>>> +; Make sure we speculate stores like the following one. It is cheap compared to
>>>> +; a mispredicated branch.
>>>> +; CHECK: @ifconvertstore
>>>> +; CHECK: %add5.add = select i1 %cmp6, i32 %add5, i32 %add
>>>> +; CHECK: store i32 %add5.add, i32* %arrayidx2, align 4
>>>> +if.then:
>>>> + store i32 %add5, i32* %arrayidx2, align 4
>>>> + br label %ret.end
>>>> +
>>>> +ret.end:
>>>> + ret void
>>>> +}
>>>> +
>>>> +define void @noifconvertstore1(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>>>> +entry:
>>>> + %arrayidx = getelementptr inbounds i32* %B, i64 0
>>>> + %0 = load i32* %arrayidx, align 4
>>>> + %add = add nsw i32 %0, %C
>>>> + %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>>>> +
>>>> +; Store to a different location.
>>>> + store i32 %add, i32* %arrayidx, align 4
>>>> + %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>>>> + %1 = load i32* %arrayidx4, align 4
>>>> + %add5 = add nsw i32 %1, %D
>>>> + %cmp6 = icmp sgt i32 %add5, %C
>>>> + br i1 %cmp6, label %if.then, label %ret.end
>>>> +
>>>> +; CHECK: @noifconvertstore1
>>>> +; CHECK-NOT: select
>>>> +if.then:
>>>> + store i32 %add5, i32* %arrayidx2, align 4
>>>> + br label %ret.end
>>>> +
>>>> +ret.end:
>>>> + ret void
>>>> +}
>>>> +
>>>> +declare void @unknown_fun()
>>>> +
>>>> +define void @noifconvertstore2(i32 %m, i32* %A, i32* %B, i32 %C, i32 %D) {
>>>> +entry:
>>>> + %arrayidx = getelementptr inbounds i32* %B, i64 0
>>>> + %0 = load i32* %arrayidx, align 4
>>>> + %add = add nsw i32 %0, %C
>>>> + %arrayidx2 = getelementptr inbounds i32* %A, i64 0
>>>> +
>>>> +; First store to the location.
>>>> + store i32 %add, i32* %arrayidx2, align 4
>>>> + call void @unknown_fun()
>>>> + %arrayidx4 = getelementptr inbounds i32* %B, i64 1
>>>> + %1 = load i32* %arrayidx4, align 4
>>>> + %add5 = add nsw i32 %1, %D
>>>> + %cmp6 = icmp sgt i32 %add5, %C
>>>> + br i1 %cmp6, label %if.then, label %ret.end
>>>> +
>>>> +; CHECK: @noifconvertstore2
>>>> +; CHECK-NOT: select
>>>> +if.then:
>>>> + store i32 %add5, i32* %arrayidx2, align 4
>>>> + br label %ret.end
>>>> +
>>>> +ret.end:
>>>> + ret void
>>>> +}
>>>> +
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list