[PATCH] Improvements to SSA construction

Mon Apr 6 15:57:03 PDT 2015

So, i can't find a testcase up to 100k bb's where *anything* matters
in terms of a vs b.
I also tried it on 50k functions of 1000 bb's as well.

In trying to use setvector, i found another set of much simpler
reasons it won't work:

1. We clear worklist every time we pop from the PQ. We can't do this
with setvector
2. We pop values off the worklist, which would erase them from
setvector, but as implemented, we don't want them revisited anyway.

IE Right now,  a value inserted into visitedworklist stays there the
entire function, but a value inserted into worklist is cleared every
time we pop something off the PQ.

#1 is pointless (in the sense that the worklist is always empty at the
point it's called, i assume it's there to do shrink to fit or
something)

Unless #2 is a bug, we can't use setvector.

On Mon, Apr 6, 2015 at 3:17 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> I'll time it.
>
> I'll use a GCC testcase that generates fully connected graphs.
>
> (I've already tried 10000+ bb cases with if's and and merges, and
> can't get a vs b to have any real time difference)
>
>
> On Mon, Apr 6, 2015 at 2:04 PM, Chandler Carruth <chandlerc at google.com> wrote:
>> Sorry, I clearly meant SmallSetVector (which uses SmallVector and
>> SmallPtrSet).
>>
>> Also, I was not supposing you would touch the blocks thing at all, nor
>> thinking this would remove any non-determinism. It was just that if you're
>> pushing onto a vector only when insertion into a set vector succeeds, it
>> might make sense. There are still plenty of cases where it doesn't, either
>> due to type mis-matches, or because you want to keep growing the set while
>> shrinking and re-using space in the vector.
>>
>> On Mon, Apr 6, 2015 at 2:02 PM Cameron Zwarich <zwarich at apple.com> wrote:
>>>
>>> Yeah, that’s a bigger problem. My original complaints 2) and 3) are
>>> invalid because those are template parameters that could presumedly be
>>> swapped out for SmallVector / SmallPtrSet anyways.
>>>
>>> Cameron
>>>
>>> > On Apr 6, 2015, at 2:00 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>> >
>>> > Unless i'm missing something, you can't use setvector anyway, because
>>> > it won't remove the non-determinism the sort is there to fix.
>>> >
>>> > That non-determinism comes from the fact that defblocks is a
>>> > smallptrset, and even if the priority queue's order is stable, the
>>> > only guarantee you get is that things at the same dom level end up
>>> > around the same place in the priority queue (but not in a specific
>>> > order).
>>> >
>>> >
>>> > If domtreenodepair was domtreenodetriple (<BasicBlock, DomLevel,
>>> > BBNumbers[BB]>), and sorted by second then third part of tuple,
>>> > dfblocks could be a std::vector and you could use setvector, and not
>>> > sort anything, because it would be deterministic.
>>> >
>>> > Whether any of this is worth it, no idea.
>>> >
>>> >
>>> >
>>> > On Mon, Apr 6, 2015 at 1:56 PM, Cameron Zwarich <zwarich at apple.com>
>>> > wrote:
>>> >> I just thought it would make aesthetic sense to have less code in the
>>> >> inner
>>> >> loop, but that’s arbitrary. If people think the ‘a’ patch makes more
>>> >> sense
>>> >> and it’s equivalent perfwise, then I’m fine with that. Code is made for
>>> >> reading, not writing. :P
>>> >>
>>> >> I’m not sure SetVector makes sense here for a few reasons:
>>> >>
>>> >> 1) The set and the vector store different types; the vector also
>>> >> contains
>>> >> the BB number.
>>> >> 2) The vector in SetVector is a std::vector, not a SmallVector.
>>> >> 3) The set in SetVector is a SmallSet of a fixed size of 16 (smaller
>>> >> than
>>> >> the 32 used here), and the ‘large’ case of SmallSet is a std::set,
>>> >> which is
>>> >> probably slower than SmallPtrSet’s ‘large’ case.
>>> >>
>>> >> Cameron
>>> >>
>>> >> On Apr 6, 2015, at 1:40 PM, Chandler Carruth <chandlerc at google.com>
>>> >> wrote:
>>> >>
>>> >> I'd like to understand why Cameron would prefer the 'b' patch to the
>>> >> 'a'
>>> >> patch. AFAICT, the 'b' patch doesn't save any actual memory (sadly).
>>> >>
>>> >> Also, if we go with 'a', would it make sense to use a SetVector rather
>>> >> than
>>> >> a separate vector and set?
>>> >>
>>> >> On Mon, Apr 6, 2015 at 1:36 PM Daniel Berlin <dberlin at dberlin.org>
>>> >> wrote:
>>> >>>
>>> >>> I'm running numbers on both approaches now, to see if there is any
>>> >>> real difference in speed.
>>> >>>
>>> >>> (otherwise, i think the one with two visited worklists is easier to
>>> >>> understand, unless someone else wants to disagree :P)
>>> >>>
>>> >>>
>>> >>> On Mon, Apr 6, 2015 at 11:54 AM, Quentin Colombet
>>> >>> <qcolombet at apple.com>
>>> >>> wrote:
>>> >>>> Hi Cameron,
>>> >>>>
>>> >>>> Sounds good.
>>> >>>>
>>> >>>> Let us wait the complete testing from Daniel before reviewing the
>>> >>>> patch.
>>> >>>>
>>> >>>> Thanks,
>>> >>>> -Quentin
>>> >>>>> On Apr 6, 2015, at 11:27 AM, Cameron Zwarich <zwarich at apple.com>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>> It was pointed out to me (without any specifics) that the iterated
>>> >>>>> dominance frontier algorithm in PromoteMemoryToRegister.cpp has
>>> >>>>> O(n^2) worst
>>> >>>>> case behavior.
>>> >>>>>
>>> >>>>> I inspected the code and think I found the cause. The code uses a
>>> >>>>> priority queue and a worklist, which share the same visited set, but
>>> >>>>> the
>>> >>>>> visited set is only updated when inserting into the priority queue.
>>> >>>>> The
>>> >>>>> original Sreedhar-Gao paper effectively has a second visited set
>>> >>>>> (the InPhi
>>> >>>>> flag) which is used for the priority queue, and the set called
>>> >>>>> Visited is
>>> >>>>> used for the recursive traversal that is done here with a worklist.
>>> >>>>>
>>> >>>>> I’ve attached two patches, one which just adds a second visited sit,
>>> >>>>> and another which leverages the fact that one of the visited sets is
>>> >>>>> actually the IDF. I would prefer the latter if it has equal
>>> >>>>> performance with
>>> >>>>> the first.
>>> >>>>>
>>> >>>>> They both pass `make check`, but I’m not sure I’ll have time to give
>>> >>>>> these patches the testing they’ll deserve in the next few days.
>>> >>>>> Daniel
>>> >>>>> Berlin has offered to test them more thoroughly for me.
>>> >>>>>
>>> >>>>> Note that there is still one difference with the paper. The paper
>>> >>>>> uses
>>> >>>>> a custom linked data structure instead of a priority queue, which
>>> >>>>> takes
>>> >>>>> advantage of the property that the level of all nodes being inserted
>>> >>>>> is at
>>> >>>>> most the current level. The code in LLVM uses a priority queue based
>>> >>>>> on a
>>> >>>>> binary heap. This means that the worst case is O(n log n), but I’d
>>> >>>>> be
>>> >>>>> surprised if the difference matters in practice.
>>> >>>>>
>>> >>>>> Cameron
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> <idf-faster-a.patch><idf-faster-b.patch>_______________________________________________
>>> >>>>> llvm-commits mailing list
>>> >>>>> llvm-commits at cs.uiuc.edu
>>> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> >>>>
>>> >>>
>>> >>> _______________________________________________
>>> >>> llvm-commits mailing list
>>> >>> llvm-commits at cs.uiuc.edu
>>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> >>
>>> >> _______________________________________________
>>> >> llvm-commits mailing list
>>> >> llvm-commits at cs.uiuc.edu
>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> >>
>>> >>
>>>
>>