[llvm-commits] PR12385 PATCH: Missed constant propagation in loops

Stepan Dyatkovskiy stpworld at narod.ru
Mon May 7 12:18:49 PDT 2012


Hi Nick.

I had analysed compile and exec time impact within the test suit for my 
patch. Results are really not very got. I found compile time degradation 
about 30% and more. So I need to leave this idea for the better times. 
Thank you for information about GVN.

-Stepan.

Nick Lewycky wrote:
> I'm sorry I haven't reviewed earlier. I really don't like this patch but
> didn't write anything because I don't have a constructive comment for you.
>
> As I understand this patch, you've reimplemented AliasSetTracker and an
> iterative algorithm to eventually pair up stores with loads. This is
> wildly expensive. It is at least O(n^2), and I suspect it's actually
> O(n^3).
>
> Please look at GVN, which in turn uses MemoryDependenceAnalysis, a pass
> designed to find what instruction last touched a piece of memory, but
> does so with great cleverness when analyzing across multiple blocks, and
> caches all the queries. There's been a couple blog posts about it too:
>
> http://blog.llvm.org/2009/12/introduction-to-load-elimination-in-gvn.html
> http://blog.llvm.org/2009/12/advanced-topics-in-redundant-load.html
>
> Conceptually, the reason we don't optimize PR12385 is that we only
> phi-translate the instruction we query with and not every instruction
> that we compare the query against. We don't do this because it's really
> expensive.
>
> Alternatively, the case that the existing GVN pass misses is always a
> loop. I was thinking that we could handle this case with a loop pass
> that specifically proves the base case (ie., phi translate all phis
> as-if on entry to the loop and prove the lack of aliasing between
> relevant parts) and an inductive step (ie., given that they didn't alias
> last time, show that the won't alias this time). I've started writing
> such a pass but got vigorously sidetracked.
>
> Finally, your patch doesn't consider writes caused by calling external
> functions.
>
> Nick
>
> Stepan Dyatkovskiy wrote:
>> Hello Evan.
>>>
>>> What do you mean by this?
>>>
>>>>>> By now whis optimization is integrated into SCCP. But may be it is
>>>>>> better to make separated pass. What do you think?
>>>
>>> Do you mean sccp already does this?
>>
>> Sorry. It was a copypaste typo. Now all works in instcombine. Forget
>> about SCCP.
>>
>>> Other questions:
>>> 1. What's the compile time impact?
>>
>> Let me few hours. I'll present compile time impact and tests for you.
>>
>>> 2. How frequently does this fire when you build and run the llvm test
>>> suite?
>> Hm.. Now I'm not sure what you mean. At least it allows to optimize
>> cases like this:
>> int i;
>>
>> void foo (void);
>> void bar (void)
>> {
>> int j;
>> i = 0;
>> for (j = 0; j< 10000; j++)
>> if (i)
>> foo ();
>> }
>>
>>>
>>> I'm somewhat skeptical that instcombine is the right home for this.
>>> However, I don't have suggestions as for where it belong.
>> Currently I found absolutely the same optimization in instcombine, but
>> current version is restricted by a single BB. It doesn't combine
>> instructions from different BBs.
>>>
>>> Dan, can you review this? Comments?
>>>
>>> Evan
>>>
>>> On Apr 28, 2012, at 12:38 PM, Stepan Dyatkovskiy<STPWORLD at narod.ru>
>>> wrote:
>>>
>>>> ping.
>>>>
>>>> -Stepan
>>>>
>>>> 26.04.2012, 11:37, "Stepan Dyatkovskiy"<stpworld at narod.ru>:
>>>>> ping
>>>>> Stepan Dyatkovskiy wrote:
>>>>>> Hi all. I solved PR12385 issue: Missed constant propagation in loops.
>>>>>> LLVM and clang can't optimize next case:
>>>>>> int i;
>>>>>> void bar();
>>>>>> int main(){
>>>>>> i = 0;
>>>>>> int j;
>>>>>> for(j = 0; j< 10000; ++j) {
>>>>>> if (i)
>>>>>> bar();
>>>>>> i = 0;
>>>>>> }
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> Ideally it should be folded to "int main() { return 0; }"
>>>>>>
>>>>>> I improved Instruction Combining pass, so now it is possible to
>>>>>> combine
>>>>>> store + load instructions from different basic blocks.
>>>>>>
>>>>>> The concept is next.
>>>>>> We need to scan BB graph. And if we found "X = load Addr" instruction
>>>>>> predecessed with the same "store C, Addr" instructions in all its
>>>>>> predecessors, we can replace the "load" with the stored value ("C").
>>>>>>
>>>>>> [store C, Addr] [store C, Addr] [store C, Addr]
>>>>>> \ | /
>>>>>> ------- | -------
>>>>>> \ | /
>>>>>> [X = load Addr] ==> X := C
>>>>>>
>>>>>> There are some restrictions:
>>>>>> 1. Store/Load address should be non-volatile and it should be
>>>>>> constant.
>>>>>> 2. The parent of "load" may be predecessed with some BB's that are
>>>>>> still
>>>>>> was not scanned (loops and BB's that marked as non-executable). In
>>>>>> that
>>>>>> case we should mark this load's as waiting until all predecessors
>>>>>> will
>>>>>> resolved; and after all executable edges and requested loops will
>>>>>> scanned we can determine possibility of replacement.
>>>>>> 3. If some predecessor chain doesn't contains any "store" or it
>>>>>> contains
>>>>>> "store" with the same address and different value, the "load" inst
>>>>>> could
>>>>>> NOT be converted to constant then:
>>>>>>
>>>>>> [store A, Addr] [store C, Addr]
>>>>>> \ /
>>>>>> ------- -------
>>>>>> \ /
>>>>>> [X = load Addr] ==> could NOT be replaced with a constant.
>>>>>>
>>>>>> Data structure:
>>>>>> For tracking we will use TrackedStoreInsts map, that has format:
>>>>>> map<BB, StoreInstsInfo>
>>>>>> where
>>>>>> StoreInstsInfo is struct {
>>>>>> set<BB> PendingBB;
>>>>>> map<Address, StoreInst> Stores;
>>>>>> }
>>>>>>
>>>>>> Algorithm:
>>>>>>
>>>>>> Go over all executable edges, collect stores info and propogate it
>>>>>> through the graph:
>>>>>>
>>>>>> 1. Go over executable edges
>>>>>> and for each BB do the next:
>>>>>> 2. If BB contains predecessors that was not scanned before, we can't
>>>>>> determine possibility of replacement "load" with "constant" in
>>>>>> current
>>>>>> BB, since we need to scan all predecessors before. So, add this
>>>>>> predecessor to the TrackedStoreInsts[BB].PendingBB set.
>>>>>> 3. Scan the BB instructions.
>>>>>> 3.1. If we found "store" instruction and it uses constant
>>>>>> non-volatile
>>>>>> address, and it stores constant value, save it in
>>>>>> TrackedStoreInsts[BB].Stores.
>>>>>> 3.2. If we found "X = load Addr".
>>>>>> 3.2.1. If TrackedStoreInsts[BB].Stores doesn't contains "store C,
>>>>>> Addr"
>>>>>> ignore it.
>>>>>> 3.2.2. If TrackedStoreInsts[BB].PendingBBs is not empty(), scan each
>>>>>> PendingBB, and if it was already tracked - remove it
>>>>>> from PendingBBs.
>>>>>> 3.2.3. If TrackedStoreInsts[BB].Stores contains "store Addr, C"
>>>>>> 3.2.3.1. If TrackedStoreInsts[BB].PendingBBs is empty, then replace
>>>>>> "load" with "C".
>>>>>> 3.2.3.2. If rackedStoreInsts[BB].PendingBBs is not empty, mark
>>>>>> load as
>>>>>> still unresolved.
>>>>>> 4. If we reached TerminatorInst, propogate stores info for each its
>>>>>> successor then:
>>>>>> 4.1. Add all TrackedStoreInsts[BB].PendingBBs to the
>>>>>> TrackedStoreInsts[Successor].PendingBBs.
>>>>>> 4.2. Perform set intersection of TrackedStoreInsts[BB].Stores and
>>>>>> TrackedStoreInsts[Successor].Stores and save it in the last one.
>>>>>> 5. After all executable edges was scanned, resolve all remained
>>>>>> "load"s:
>>>>>> get load's BB, and if we found corresponent "store" in
>>>>>> TrackedStoreInsts[BB].Stores, then it is possible to replace load
>>>>>> with
>>>>>> constant, otherwise zap it.
>>>>>>
>>>>>> P.S.: By now whis optimization is integrated into SCCP. But may be
>>>>>> it is
>>>>>> better to make separated pass. What do you think?
>>>>>>
>>>>>> -Stepan
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>




More information about the llvm-commits mailing list