[llvm-commits] PR12385 PATCH: Missed constant propagation in loops

Nick Lewycky nicholas at mxc.ca
Thu May 3 02:37:22 PDT 2012


I'm sorry I haven't reviewed earlier. I really don't like this patch but 
didn't write anything because I don't have a constructive comment for you.

As I understand this patch, you've reimplemented AliasSetTracker and an 
iterative algorithm to eventually pair up stores with loads. This is 
wildly expensive. It is at least O(n^2), and I suspect it's actually O(n^3).

Please look at GVN, which in turn uses MemoryDependenceAnalysis, a pass 
designed to find what instruction last touched a piece of memory, but 
does so with great cleverness when analyzing across multiple blocks, and 
caches all the queries. There's been a couple blog posts about it too:

   http://blog.llvm.org/2009/12/introduction-to-load-elimination-in-gvn.html
   http://blog.llvm.org/2009/12/advanced-topics-in-redundant-load.html

Conceptually, the reason we don't optimize PR12385 is that we only 
phi-translate the instruction we query with and not every instruction 
that we compare the query against. We don't do this because it's really 
expensive.

Alternatively, the case that the existing GVN pass misses is always a 
loop. I was thinking that we could handle this case with a loop pass 
that specifically proves the base case (ie., phi translate all phis 
as-if on entry to the loop and prove the lack of aliasing between 
relevant parts) and an inductive step (ie., given that they didn't alias 
last time, show that the won't alias this time). I've started writing 
such a pass but got vigorously sidetracked.

Finally, your patch doesn't consider writes caused by calling external 
functions.

Nick

Stepan Dyatkovskiy wrote:
> Hello Evan.
>>
>> What do you mean by this?
>>
>>>>>   By now whis optimization is integrated into SCCP. But may be it is
>>>>>    better to make separated pass. What do you think?
>>
>> Do you mean sccp already does this?
>
> Sorry. It was a copypaste typo. Now all works in instcombine. Forget about SCCP.
>
>> Other questions:
>> 1. What's the compile time impact?
>
> Let me few hours. I'll present compile time impact and tests for you.
>
>> 2. How frequently does this fire when you build and run the llvm test suite?
> Hm.. Now I'm not sure what you mean. At least it allows to optimize cases like this:
> int i;
>
> void foo (void);
> void bar (void)
> {
>    int j;
>    i = 0;
>    for (j = 0; j<  10000; j++)
>      if (i)
>        foo ();
> }
>
>>
>> I'm somewhat skeptical that instcombine is the right home for this. However, I don't have suggestions as for where it belong.
> Currently I found absolutely the same optimization in instcombine, but current version is restricted by a single BB. It doesn't combine instructions from different BBs.
>>
>> Dan, can you review this? Comments?
>>
>> Evan
>>
>> On Apr 28, 2012, at 12:38 PM, Stepan Dyatkovskiy<STPWORLD at narod.ru>  wrote:
>>
>>>   ping.
>>>
>>>   -Stepan
>>>
>>>   26.04.2012, 11:37, "Stepan Dyatkovskiy"<stpworld at narod.ru>:
>>>>   ping
>>>>   Stepan Dyatkovskiy wrote:
>>>>>    Hi all. I solved PR12385 issue: Missed constant propagation in loops.
>>>>>    LLVM and clang can't optimize next case:
>>>>>    int i;
>>>>>    void bar();
>>>>>    int main(){
>>>>>    i = 0;
>>>>>    int j;
>>>>>    for(j = 0; j<  10000; ++j) {
>>>>>    if (i)
>>>>>    bar();
>>>>>    i = 0;
>>>>>    }
>>>>>    return 0;
>>>>>    }
>>>>>
>>>>>    Ideally it should be folded to "int main() { return 0; }"
>>>>>
>>>>>    I improved Instruction Combining pass, so now it is possible to combine
>>>>>    store + load instructions from different basic blocks.
>>>>>
>>>>>    The concept is next.
>>>>>    We need to scan BB graph. And if we found "X = load Addr" instruction
>>>>>    predecessed with the same "store C, Addr" instructions in all its
>>>>>    predecessors, we can replace the "load" with the stored value ("C").
>>>>>
>>>>>    [store C, Addr] [store C, Addr] [store C, Addr]
>>>>>    \ | /
>>>>>    ------- | -------
>>>>>    \ | /
>>>>>    [X = load Addr] ==>  X := C
>>>>>
>>>>>    There are some restrictions:
>>>>>    1. Store/Load address should be non-volatile and it should be constant.
>>>>>    2. The parent of "load" may be predecessed with some BB's that are still
>>>>>    was not scanned (loops and BB's that marked as non-executable). In that
>>>>>    case we should mark this load's as waiting until all predecessors will
>>>>>    resolved; and after all executable edges and requested loops will
>>>>>    scanned we can determine possibility of replacement.
>>>>>    3. If some predecessor chain doesn't contains any "store" or it contains
>>>>>    "store" with the same address and different value, the "load" inst could
>>>>>    NOT be converted to constant then:
>>>>>
>>>>>    [store A, Addr] [store C, Addr]
>>>>>    \ /
>>>>>    ------- -------
>>>>>    \ /
>>>>>    [X = load Addr] ==>  could NOT be replaced with a constant.
>>>>>
>>>>>    Data structure:
>>>>>    For tracking we will use TrackedStoreInsts map, that has format:
>>>>>    map<BB, StoreInstsInfo>
>>>>>    where
>>>>>    StoreInstsInfo is struct {
>>>>>    set<BB>  PendingBB;
>>>>>    map<Address, StoreInst>  Stores;
>>>>>    }
>>>>>
>>>>>    Algorithm:
>>>>>
>>>>>    Go over all executable edges, collect stores info and propogate it
>>>>>    through the graph:
>>>>>
>>>>>    1. Go over executable edges
>>>>>    and for each BB do the next:
>>>>>    2. If BB contains predecessors that was not scanned before, we can't
>>>>>    determine possibility of replacement "load" with "constant" in current
>>>>>    BB, since we need to scan all predecessors before. So, add this
>>>>>    predecessor to the TrackedStoreInsts[BB].PendingBB set.
>>>>>    3. Scan the BB instructions.
>>>>>    3.1. If we found "store" instruction and it uses constant non-volatile
>>>>>    address, and it stores constant value, save it in
>>>>>    TrackedStoreInsts[BB].Stores.
>>>>>    3.2. If we found "X = load Addr".
>>>>>    3.2.1. If TrackedStoreInsts[BB].Stores doesn't contains "store C, Addr"
>>>>>    ignore it.
>>>>>    3.2.2. If TrackedStoreInsts[BB].PendingBBs is not empty(), scan each
>>>>>    PendingBB, and if it was already tracked - remove it
>>>>>    from PendingBBs.
>>>>>    3.2.3. If TrackedStoreInsts[BB].Stores contains "store Addr, C"
>>>>>    3.2.3.1. If TrackedStoreInsts[BB].PendingBBs is empty, then replace
>>>>>    "load" with "C".
>>>>>    3.2.3.2. If rackedStoreInsts[BB].PendingBBs is not empty, mark load as
>>>>>    still unresolved.
>>>>>    4. If we reached TerminatorInst, propogate stores info for each its
>>>>>    successor then:
>>>>>    4.1. Add all TrackedStoreInsts[BB].PendingBBs to the
>>>>>    TrackedStoreInsts[Successor].PendingBBs.
>>>>>    4.2. Perform set intersection of TrackedStoreInsts[BB].Stores and
>>>>>    TrackedStoreInsts[Successor].Stores and save it in the last one.
>>>>>    5. After all executable edges was scanned, resolve all remained "load"s:
>>>>>    get load's BB, and if we found corresponent "store" in
>>>>>    TrackedStoreInsts[BB].Stores, then it is possible to replace load with
>>>>>    constant, otherwise zap it.
>>>>>
>>>>>    P.S.: By now whis optimization is integrated into SCCP. But may be it is
>>>>>    better to make separated pass. What do you think?
>>>>>
>>>>>    -Stepan
>>>   _______________________________________________
>>>   llvm-commits mailing list
>>>   llvm-commits at cs.uiuc.edu
>>>   http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits




More information about the llvm-commits mailing list