[PATCH] Improvements to SSA construction

Mon Apr 6 16:05:24 PDT 2015

Yeah, i know.  I went to go and test to see if it made the code easier
to understand, and hit "it can't be done" :)

On Mon, Apr 6, 2015 at 4:04 PM, Chandler Carruth <chandlerc at google.com> wrote:
> No, this seems like a good reason to use two. See my prior email which
> mentioned it.
>
> In my first email I just *asked* if there was a good reason. ;]
>
> On Mon, Apr 6, 2015 at 3:57 PM Daniel Berlin <dberlin at dberlin.org> wrote:
>>
>> So, i can't find a testcase up to 100k bb's where *anything* matters
>> in terms of a vs b.
>> I also tried it on 50k functions of 1000 bb's as well.
>>
>> In trying to use setvector, i found another set of much simpler
>> reasons it won't work:
>>
>> 1. We clear worklist every time we pop from the PQ. We can't do this
>> with setvector
>> 2. We pop values off the worklist, which would erase them from
>> setvector, but as implemented, we don't want them revisited anyway.
>>
>> IE Right now,  a value inserted into visitedworklist stays there the
>> entire function, but a value inserted into worklist is cleared every
>> time we pop something off the PQ.
>>
>>
>> #1 is pointless (in the sense that the worklist is always empty at the
>> point it's called, i assume it's there to do shrink to fit or
>> something)
>>
>> Unless #2 is a bug, we can't use setvector.
>>
>>
>>
>>
>> On Mon, Apr 6, 2015 at 3:17 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>> > I'll time it.
>> >
>> > I'll use a GCC testcase that generates fully connected graphs.
>> >
>> > (I've already tried 10000+ bb cases with if's and and merges, and
>> > can't get a vs b to have any real time difference)
>> >
>> >
>> > On Mon, Apr 6, 2015 at 2:04 PM, Chandler Carruth <chandlerc at google.com>
>> > wrote:
>> >> Sorry, I clearly meant SmallSetVector (which uses SmallVector and
>> >> SmallPtrSet).
>> >>
>> >> Also, I was not supposing you would touch the blocks thing at all, nor
>> >> thinking this would remove any non-determinism. It was just that if
>> >> you're
>> >> pushing onto a vector only when insertion into a set vector succeeds,
>> >> it
>> >> might make sense. There are still plenty of cases where it doesn't,
>> >> either
>> >> due to type mis-matches, or because you want to keep growing the set
>> >> while
>> >> shrinking and re-using space in the vector.
>> >>
>> >> On Mon, Apr 6, 2015 at 2:02 PM Cameron Zwarich <zwarich at apple.com>
>> >> wrote:
>> >>>
>> >>> Yeah, that’s a bigger problem. My original complaints 2) and 3) are
>> >>> invalid because those are template parameters that could presumedly be
>> >>> swapped out for SmallVector / SmallPtrSet anyways.
>> >>>
>> >>> Cameron
>> >>>
>> >>> > On Apr 6, 2015, at 2:00 PM, Daniel Berlin <dberlin at dberlin.org>
>> >>> > wrote:
>> >>> >
>> >>> > Unless i'm missing something, you can't use setvector anyway,
>> >>> > because
>> >>> > it won't remove the non-determinism the sort is there to fix.
>> >>> >
>> >>> > That non-determinism comes from the fact that defblocks is a
>> >>> > smallptrset, and even if the priority queue's order is stable, the
>> >>> > only guarantee you get is that things at the same dom level end up
>> >>> > around the same place in the priority queue (but not in a specific
>> >>> > order).
>> >>> >
>> >>> >
>> >>> > If domtreenodepair was domtreenodetriple (<BasicBlock, DomLevel,
>> >>> > BBNumbers[BB]>), and sorted by second then third part of tuple,
>> >>> > dfblocks could be a std::vector and you could use setvector, and not
>> >>> > sort anything, because it would be deterministic.
>> >>> >
>> >>> > Whether any of this is worth it, no idea.
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Mon, Apr 6, 2015 at 1:56 PM, Cameron Zwarich <zwarich at apple.com>
>> >>> > wrote:
>> >>> >> I just thought it would make aesthetic sense to have less code in
>> >>> >> the
>> >>> >> inner
>> >>> >> loop, but that’s arbitrary. If people think the ‘a’ patch makes
>> >>> >> more
>> >>> >> sense
>> >>> >> and it’s equivalent perfwise, then I’m fine with that. Code is made
>> >>> >> for
>> >>> >> reading, not writing. :P
>> >>> >>
>> >>> >> I’m not sure SetVector makes sense here for a few reasons:
>> >>> >>
>> >>> >> 1) The set and the vector store different types; the vector also
>> >>> >> contains
>> >>> >> the BB number.
>> >>> >> 2) The vector in SetVector is a std::vector, not a SmallVector.
>> >>> >> 3) The set in SetVector is a SmallSet of a fixed size of 16
>> >>> >> (smaller
>> >>> >> than
>> >>> >> the 32 used here), and the ‘large’ case of SmallSet is a std::set,
>> >>> >> which is
>> >>> >> probably slower than SmallPtrSet’s ‘large’ case.
>> >>> >>
>> >>> >> Cameron
>> >>> >>
>> >>> >> On Apr 6, 2015, at 1:40 PM, Chandler Carruth <chandlerc at google.com>
>> >>> >> wrote:
>> >>> >>
>> >>> >> I'd like to understand why Cameron would prefer the 'b' patch to
>> >>> >> the
>> >>> >> 'a'
>> >>> >> patch. AFAICT, the 'b' patch doesn't save any actual memory
>> >>> >> (sadly).
>> >>> >>
>> >>> >> Also, if we go with 'a', would it make sense to use a SetVector
>> >>> >> rather
>> >>> >> than
>> >>> >> a separate vector and set?
>> >>> >>
>> >>> >> On Mon, Apr 6, 2015 at 1:36 PM Daniel Berlin <dberlin at dberlin.org>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> I'm running numbers on both approaches now, to see if there is any
>> >>> >>> real difference in speed.
>> >>> >>>
>> >>> >>> (otherwise, i think the one with two visited worklists is easier
>> >>> >>> to
>> >>> >>> understand, unless someone else wants to disagree :P)
>> >>> >>>
>> >>> >>>
>> >>> >>> On Mon, Apr 6, 2015 at 11:54 AM, Quentin Colombet
>> >>> >>> <qcolombet at apple.com>
>> >>> >>> wrote:
>> >>> >>>> Hi Cameron,
>> >>> >>>>
>> >>> >>>> Sounds good.
>> >>> >>>>
>> >>> >>>> Let us wait the complete testing from Daniel before reviewing the
>> >>> >>>> patch.
>> >>> >>>>
>> >>> >>>> Thanks,
>> >>> >>>> -Quentin
>> >>> >>>>> On Apr 6, 2015, at 11:27 AM, Cameron Zwarich <zwarich at apple.com>
>> >>> >>>>> wrote:
>> >>> >>>>>
>> >>> >>>>> It was pointed out to me (without any specifics) that the
>> >>> >>>>> iterated
>> >>> >>>>> dominance frontier algorithm in PromoteMemoryToRegister.cpp has
>> >>> >>>>> O(n^2) worst
>> >>> >>>>> case behavior.
>> >>> >>>>>
>> >>> >>>>> I inspected the code and think I found the cause. The code uses
>> >>> >>>>> a
>> >>> >>>>> priority queue and a worklist, which share the same visited set,
>> >>> >>>>> but
>> >>> >>>>> the
>> >>> >>>>> visited set is only updated when inserting into the priority
>> >>> >>>>> queue.
>> >>> >>>>> The
>> >>> >>>>> original Sreedhar-Gao paper effectively has a second visited set
>> >>> >>>>> (the InPhi
>> >>> >>>>> flag) which is used for the priority queue, and the set called
>> >>> >>>>> Visited is
>> >>> >>>>> used for the recursive traversal that is done here with a
>> >>> >>>>> worklist.
>> >>> >>>>>
>> >>> >>>>> I’ve attached two patches, one which just adds a second visited
>> >>> >>>>> sit,
>> >>> >>>>> and another which leverages the fact that one of the visited
>> >>> >>>>> sets is
>> >>> >>>>> actually the IDF. I would prefer the latter if it has equal
>> >>> >>>>> performance with
>> >>> >>>>> the first.
>> >>> >>>>>
>> >>> >>>>> They both pass `make check`, but I’m not sure I’ll have time to
>> >>> >>>>> give
>> >>> >>>>> these patches the testing they’ll deserve in the next few days.
>> >>> >>>>> Daniel
>> >>> >>>>> Berlin has offered to test them more thoroughly for me.
>> >>> >>>>>
>> >>> >>>>> Note that there is still one difference with the paper. The
>> >>> >>>>> paper
>> >>> >>>>> uses
>> >>> >>>>> a custom linked data structure instead of a priority queue,
>> >>> >>>>> which
>> >>> >>>>> takes
>> >>> >>>>> advantage of the property that the level of all nodes being
>> >>> >>>>> inserted
>> >>> >>>>> is at
>> >>> >>>>> most the current level. The code in LLVM uses a priority queue
>> >>> >>>>> based
>> >>> >>>>> on a
>> >>> >>>>> binary heap. This means that the worst case is O(n log n), but
>> >>> >>>>> I’d
>> >>> >>>>> be
>> >>> >>>>> surprised if the difference matters in practice.
>> >>> >>>>>
>> >>> >>>>> Cameron
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> <idf-faster-a.patch><idf-faster-b.patch>_______________________________________________
>> >>> >>>>> llvm-commits mailing list
>> >>> >>>>> llvm-commits at cs.uiuc.edu
>> >>> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> >>> >>>>
>> >>> >>>
>> >>> >>> _______________________________________________
>> >>> >>> llvm-commits mailing list
>> >>> >>> llvm-commits at cs.uiuc.edu
>> >>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> >>> >>
>> >>> >> _______________________________________________
>> >>> >> llvm-commits mailing list
>> >>> >> llvm-commits at cs.uiuc.edu
>> >>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> >>> >>
>> >>> >>
>> >>>
>> >>