[PATCH] Improvements to SSA construction

Mon Apr 6 14:02:13 PDT 2015

Yeah, that’s a bigger problem. My original complaints 2) and 3) are invalid because those are template parameters that could presumedly be swapped out for SmallVector / SmallPtrSet anyways.

Cameron

> On Apr 6, 2015, at 2:00 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> 
> Unless i'm missing something, you can't use setvector anyway, because
> it won't remove the non-determinism the sort is there to fix.
> 
> That non-determinism comes from the fact that defblocks is a
> smallptrset, and even if the priority queue's order is stable, the
> only guarantee you get is that things at the same dom level end up
> around the same place in the priority queue (but not in a specific
> order).
> 
> 
> If domtreenodepair was domtreenodetriple (<BasicBlock, DomLevel,
> BBNumbers[BB]>), and sorted by second then third part of tuple,
> dfblocks could be a std::vector and you could use setvector, and not
> sort anything, because it would be deterministic.
> 
> Whether any of this is worth it, no idea.
> 
> 
> 
> On Mon, Apr 6, 2015 at 1:56 PM, Cameron Zwarich <zwarich at apple.com> wrote:
>> I just thought it would make aesthetic sense to have less code in the inner
>> loop, but that’s arbitrary. If people think the ‘a’ patch makes more sense
>> and it’s equivalent perfwise, then I’m fine with that. Code is made for
>> reading, not writing. :P
>> 
>> I’m not sure SetVector makes sense here for a few reasons:
>> 
>> 1) The set and the vector store different types; the vector also contains
>> the BB number.
>> 2) The vector in SetVector is a std::vector, not a SmallVector.
>> 3) The set in SetVector is a SmallSet of a fixed size of 16 (smaller than
>> the 32 used here), and the ‘large’ case of SmallSet is a std::set, which is
>> probably slower than SmallPtrSet’s ‘large’ case.
>> 
>> Cameron
>> 
>> On Apr 6, 2015, at 1:40 PM, Chandler Carruth <chandlerc at google.com> wrote:
>> 
>> I'd like to understand why Cameron would prefer the 'b' patch to the 'a'
>> patch. AFAICT, the 'b' patch doesn't save any actual memory (sadly).
>> 
>> Also, if we go with 'a', would it make sense to use a SetVector rather than
>> a separate vector and set?
>> 
>> On Mon, Apr 6, 2015 at 1:36 PM Daniel Berlin <dberlin at dberlin.org> wrote:
>>> 
>>> I'm running numbers on both approaches now, to see if there is any
>>> real difference in speed.
>>> 
>>> (otherwise, i think the one with two visited worklists is easier to
>>> understand, unless someone else wants to disagree :P)
>>> 
>>> 
>>> On Mon, Apr 6, 2015 at 11:54 AM, Quentin Colombet <qcolombet at apple.com>
>>> wrote:
>>>> Hi Cameron,
>>>> 
>>>> Sounds good.
>>>> 
>>>> Let us wait the complete testing from Daniel before reviewing the patch.
>>>> 
>>>> Thanks,
>>>> -Quentin
>>>>> On Apr 6, 2015, at 11:27 AM, Cameron Zwarich <zwarich at apple.com> wrote:
>>>>> 
>>>>> It was pointed out to me (without any specifics) that the iterated
>>>>> dominance frontier algorithm in PromoteMemoryToRegister.cpp has O(n^2) worst
>>>>> case behavior.
>>>>> 
>>>>> I inspected the code and think I found the cause. The code uses a
>>>>> priority queue and a worklist, which share the same visited set, but the
>>>>> visited set is only updated when inserting into the priority queue. The
>>>>> original Sreedhar-Gao paper effectively has a second visited set (the InPhi
>>>>> flag) which is used for the priority queue, and the set called Visited is
>>>>> used for the recursive traversal that is done here with a worklist.
>>>>> 
>>>>> I’ve attached two patches, one which just adds a second visited sit,
>>>>> and another which leverages the fact that one of the visited sets is
>>>>> actually the IDF. I would prefer the latter if it has equal performance with
>>>>> the first.
>>>>> 
>>>>> They both pass `make check`, but I’m not sure I’ll have time to give
>>>>> these patches the testing they’ll deserve in the next few days. Daniel
>>>>> Berlin has offered to test them more thoroughly for me.
>>>>> 
>>>>> Note that there is still one difference with the paper. The paper uses
>>>>> a custom linked data structure instead of a priority queue, which takes
>>>>> advantage of the property that the level of all nodes being inserted is at
>>>>> most the current level. The code in LLVM uses a priority queue based on a
>>>>> binary heap. This means that the worst case is O(n log n), but I’d be
>>>>> surprised if the difference matters in practice.
>>>>> 
>>>>> Cameron
>>>>> 
>>>>> 
>>>>> <idf-faster-a.patch><idf-faster-b.patch>_______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>> 
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
>>