[PATCH] Improvements to SSA construction

Quentin Colombet qcolombet at apple.com
Fri Apr 10 09:35:52 PDT 2015


Hi Daniel,

Cameron already committed patch A in r234425.

Cheers,
-Quentin
> On Apr 10, 2015, at 9:21 AM, Daniel Berlin <dberlin at dberlin.org> wrote:
> 
> Just wanted to ask:
> Do you guys want anything else in the way of testing, or is it just
> waiting on one of (Quentin, Chandler) to look at?
> 
> 
> On Mon, Apr 6, 2015 at 9:42 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>> LGTM algorithmically.
>> 
>> On Mon, Apr 6, 2015 at 6:02 PM, Cameron Zwarich <zwarich at apple.com> wrote:
>>> Thanks. If someone reviews the content of my 'a' patch then I'll land it.
>>> 
>>> Cameron
>>> 
>>>> On Apr 6, 2015, at 4:59 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>>> 
>>>> So, to drive worst case behavior (besides all the other testing), i
>>>> wrote the attached to generate ladder graphs of size N, which are the
>>>> worst case for DF/IDF algorithms (N^2)
>>>> 
>>>> If you want perfect ladder looking graphs, you have to use
>>>> -simplifycfg on the result.
>>>> Note: Ladder graphs seems to generate at least n^2 behavior in
>>>> GraphWriter, so using a 50000 rung ladder graph doesn't seem sane with
>>>> -view-cfg.  It doesn't use n^2 memory at least, but i stopped it
>>>> after 15 minutes.
>>>> 
>>>> Anyway, as one might expect, even something like using -domfrontier on
>>>> a 50000 rung ladder graph is not possible.
>>>> 
>>>> The same is true for the algorithm before your patches.
>>>> Before your patches, i stopped it at 5 minutes.
>>>> After your patches, it takes <50ms.
>>>> 
>>>> This is tested with memoryssa, which only builds SSA for one variable,
>>>> and does no pruning, which is very easy to test against.
>>>> It's safe to say "it's always an improvement" there
>>>> 
>>>> For promotememorytoregisters, besides generating code that defeats the
>>>> pruning,  I tested across a ton of different types of code for both of
>>>> these, and can't find a single case where it's slower.
>>>> 
>>>> 
>>>>> On Mon, Apr 6, 2015 at 4:39 PM, Cameron Zwarich <zwarich at apple.com> wrote:
>>>>> How does perf look relative to the current code in the tree?
>>>>> 
>>>>> Cameron
>>>>> 
>>>>>> On Apr 6, 2015, at 3:57 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>>>>> 
>>>>>> So, i can't find a testcase up to 100k bb's where *anything* matters
>>>>>> in terms of a vs b.
>>>>>> I also tried it on 50k functions of 1000 bb's as well.
>>>>>> 
>>>>>> In trying to use setvector, i found another set of much simpler
>>>>>> reasons it won't work:
>>>>>> 
>>>>>> 1. We clear worklist every time we pop from the PQ. We can't do this
>>>>>> with setvector
>>>>>> 2. We pop values off the worklist, which would erase them from
>>>>>> setvector, but as implemented, we don't want them revisited anyway.
>>>>>> 
>>>>>> IE Right now,  a value inserted into visitedworklist stays there the
>>>>>> entire function, but a value inserted into worklist is cleared every
>>>>>> time we pop something off the PQ.
>>>>>> 
>>>>>> 
>>>>>> #1 is pointless (in the sense that the worklist is always empty at the
>>>>>> point it's called, i assume it's there to do shrink to fit or
>>>>>> something)
>>>>>> 
>>>>>> Unless #2 is a bug, we can't use setvector.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Mon, Apr 6, 2015 at 3:17 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>>>>>> I'll time it.
>>>>>>> 
>>>>>>> I'll use a GCC testcase that generates fully connected graphs.
>>>>>>> 
>>>>>>> (I've already tried 10000+ bb cases with if's and and merges, and
>>>>>>> can't get a vs b to have any real time difference)
>>>>>>> 
>>>>>>> 
>>>>>>>> On Mon, Apr 6, 2015 at 2:04 PM, Chandler Carruth <chandlerc at google.com> wrote:
>>>>>>>> Sorry, I clearly meant SmallSetVector (which uses SmallVector and
>>>>>>>> SmallPtrSet).
>>>>>>>> 
>>>>>>>> Also, I was not supposing you would touch the blocks thing at all, nor
>>>>>>>> thinking this would remove any non-determinism. It was just that if you're
>>>>>>>> pushing onto a vector only when insertion into a set vector succeeds, it
>>>>>>>> might make sense. There are still plenty of cases where it doesn't, either
>>>>>>>> due to type mis-matches, or because you want to keep growing the set while
>>>>>>>> shrinking and re-using space in the vector.
>>>>>>>> 
>>>>>>>>> On Mon, Apr 6, 2015 at 2:02 PM Cameron Zwarich <zwarich at apple.com> wrote:
>>>>>>>>> 
>>>>>>>>> Yeah, that’s a bigger problem. My original complaints 2) and 3) are
>>>>>>>>> invalid because those are template parameters that could presumedly be
>>>>>>>>> swapped out for SmallVector / SmallPtrSet anyways.
>>>>>>>>> 
>>>>>>>>> Cameron
>>>>>>>>> 
>>>>>>>>>> On Apr 6, 2015, at 2:00 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>>>>>>>>> 
>>>>>>>>>> Unless i'm missing something, you can't use setvector anyway, because
>>>>>>>>>> it won't remove the non-determinism the sort is there to fix.
>>>>>>>>>> 
>>>>>>>>>> That non-determinism comes from the fact that defblocks is a
>>>>>>>>>> smallptrset, and even if the priority queue's order is stable, the
>>>>>>>>>> only guarantee you get is that things at the same dom level end up
>>>>>>>>>> around the same place in the priority queue (but not in a specific
>>>>>>>>>> order).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> If domtreenodepair was domtreenodetriple (<BasicBlock, DomLevel,
>>>>>>>>>> BBNumbers[BB]>), and sorted by second then third part of tuple,
>>>>>>>>>> dfblocks could be a std::vector and you could use setvector, and not
>>>>>>>>>> sort anything, because it would be deterministic.
>>>>>>>>>> 
>>>>>>>>>> Whether any of this is worth it, no idea.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Mon, Apr 6, 2015 at 1:56 PM, Cameron Zwarich <zwarich at apple.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> I just thought it would make aesthetic sense to have less code in the
>>>>>>>>>>> inner
>>>>>>>>>>> loop, but that’s arbitrary. If people think the ‘a’ patch makes more
>>>>>>>>>>> sense
>>>>>>>>>>> and it’s equivalent perfwise, then I’m fine with that. Code is made for
>>>>>>>>>>> reading, not writing. :P
>>>>>>>>>>> 
>>>>>>>>>>> I’m not sure SetVector makes sense here for a few reasons:
>>>>>>>>>>> 
>>>>>>>>>>> 1) The set and the vector store different types; the vector also
>>>>>>>>>>> contains
>>>>>>>>>>> the BB number.
>>>>>>>>>>> 2) The vector in SetVector is a std::vector, not a SmallVector.
>>>>>>>>>>> 3) The set in SetVector is a SmallSet of a fixed size of 16 (smaller
>>>>>>>>>>> than
>>>>>>>>>>> the 32 used here), and the ‘large’ case of SmallSet is a std::set,
>>>>>>>>>>> which is
>>>>>>>>>>> probably slower than SmallPtrSet’s ‘large’ case.
>>>>>>>>>>> 
>>>>>>>>>>> Cameron
>>>>>>>>>>> 
>>>>>>>>>>> On Apr 6, 2015, at 1:40 PM, Chandler Carruth <chandlerc at google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I'd like to understand why Cameron would prefer the 'b' patch to the
>>>>>>>>>>> 'a'
>>>>>>>>>>> patch. AFAICT, the 'b' patch doesn't save any actual memory (sadly).
>>>>>>>>>>> 
>>>>>>>>>>> Also, if we go with 'a', would it make sense to use a SetVector rather
>>>>>>>>>>> than
>>>>>>>>>>> a separate vector and set?
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Apr 6, 2015 at 1:36 PM Daniel Berlin <dberlin at dberlin.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm running numbers on both approaches now, to see if there is any
>>>>>>>>>>>> real difference in speed.
>>>>>>>>>>>> 
>>>>>>>>>>>> (otherwise, i think the one with two visited worklists is easier to
>>>>>>>>>>>> understand, unless someone else wants to disagree :P)
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Apr 6, 2015 at 11:54 AM, Quentin Colombet
>>>>>>>>>>>> <qcolombet at apple.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi Cameron,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Sounds good.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Let us wait the complete testing from Daniel before reviewing the
>>>>>>>>>>>>> patch.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> -Quentin
>>>>>>>>>>>>>> On Apr 6, 2015, at 11:27 AM, Cameron Zwarich <zwarich at apple.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> It was pointed out to me (without any specifics) that the iterated
>>>>>>>>>>>>>> dominance frontier algorithm in PromoteMemoryToRegister.cpp has
>>>>>>>>>>>>>> O(n^2) worst
>>>>>>>>>>>>>> case behavior.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I inspected the code and think I found the cause. The code uses a
>>>>>>>>>>>>>> priority queue and a worklist, which share the same visited set, but
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> visited set is only updated when inserting into the priority queue.
>>>>>>>>>>>>>> The
>>>>>>>>>>>>>> original Sreedhar-Gao paper effectively has a second visited set
>>>>>>>>>>>>>> (the InPhi
>>>>>>>>>>>>>> flag) which is used for the priority queue, and the set called
>>>>>>>>>>>>>> Visited is
>>>>>>>>>>>>>> used for the recursive traversal that is done here with a worklist.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I’ve attached two patches, one which just adds a second visited sit,
>>>>>>>>>>>>>> and another which leverages the fact that one of the visited sets is
>>>>>>>>>>>>>> actually the IDF. I would prefer the latter if it has equal
>>>>>>>>>>>>>> performance with
>>>>>>>>>>>>>> the first.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> They both pass `make check`, but I’m not sure I’ll have time to give
>>>>>>>>>>>>>> these patches the testing they’ll deserve in the next few days.
>>>>>>>>>>>>>> Daniel
>>>>>>>>>>>>>> Berlin has offered to test them more thoroughly for me.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Note that there is still one difference with the paper. The paper
>>>>>>>>>>>>>> uses
>>>>>>>>>>>>>> a custom linked data structure instead of a priority queue, which
>>>>>>>>>>>>>> takes
>>>>>>>>>>>>>> advantage of the property that the level of all nodes being inserted
>>>>>>>>>>>>>> is at
>>>>>>>>>>>>>> most the current level. The code in LLVM uses a priority queue based
>>>>>>>>>>>>>> on a
>>>>>>>>>>>>>> binary heap. This means that the worst case is O(n log n), but I’d
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>> surprised if the difference matters in practice.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Cameron
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> <idf-faster-a.patch><idf-faster-b.patch>_______________________________________________
>>>>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>> <laddergraph.py>





More information about the llvm-commits mailing list