[PATCH] Improvements to SSA construction

Mon Apr 6 17:28:39 PDT 2015

(FYI, we have a  'utils' directory that is great for these kinds of python
scripts provided you have nice comments explaining what it is useful for
and how to use it)

On Mon, Apr 6, 2015 at 5:26 PM Daniel Berlin <dberlin at dberlin.org> wrote:

> (Small update to laddergraph.py in case anyone else cares to use it,
> it now errors if you try to give it an odd number, generates bb labels
> that are named right, etc. It has no effect on the numbers, of course)
>
> On Mon, Apr 6, 2015 at 4:59 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> > So, to drive worst case behavior (besides all the other testing), i
> > wrote the attached to generate ladder graphs of size N, which are the
> > worst case for DF/IDF algorithms (N^2)
> >
> > If you want perfect ladder looking graphs, you have to use
> > -simplifycfg on the result.
> > Note: Ladder graphs seems to generate at least n^2 behavior in
> > GraphWriter, so using a 50000 rung ladder graph doesn't seem sane with
> >  -view-cfg.  It doesn't use n^2 memory at least, but i stopped it
> > after 15 minutes.
> >
> > Anyway, as one might expect, even something like using -domfrontier on
> > a 50000 rung ladder graph is not possible.
> >
> > The same is true for the algorithm before your patches.
> > Before your patches, i stopped it at 5 minutes.
> > After your patches, it takes <50ms.
> >
> > This is tested with memoryssa, which only builds SSA for one variable,
> > and does no pruning, which is very easy to test against.
> > It's safe to say "it's always an improvement" there
> >
> > For promotememorytoregisters, besides generating code that defeats the
> > pruning,  I tested across a ton of different types of code for both of
> > these, and can't find a single case where it's slower.
> >
> >
> > On Mon, Apr 6, 2015 at 4:39 PM, Cameron Zwarich <zwarich at apple.com>
> wrote:
> >> How does perf look relative to the current code in the tree?
> >>
> >> Cameron
> >>
> >>> On Apr 6, 2015, at 3:57 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> >>>
> >>> So, i can't find a testcase up to 100k bb's where *anything* matters
> >>> in terms of a vs b.
> >>> I also tried it on 50k functions of 1000 bb's as well.
> >>>
> >>> In trying to use setvector, i found another set of much simpler
> >>> reasons it won't work:
> >>>
> >>> 1. We clear worklist every time we pop from the PQ. We can't do this
> >>> with setvector
> >>> 2. We pop values off the worklist, which would erase them from
> >>> setvector, but as implemented, we don't want them revisited anyway.
> >>>
> >>> IE Right now,  a value inserted into visitedworklist stays there the
> >>> entire function, but a value inserted into worklist is cleared every
> >>> time we pop something off the PQ.
> >>>
> >>>
> >>> #1 is pointless (in the sense that the worklist is always empty at the
> >>> point it's called, i assume it's there to do shrink to fit or
> >>> something)
> >>>
> >>> Unless #2 is a bug, we can't use setvector.
> >>>
> >>>
> >>>
> >>>
> >>> On Mon, Apr 6, 2015 at 3:17 PM, Daniel Berlin <dberlin at dberlin.org>
> wrote:
> >>>> I'll time it.
> >>>>
> >>>> I'll use a GCC testcase that generates fully connected graphs.
> >>>>
> >>>> (I've already tried 10000+ bb cases with if's and and merges, and
> >>>> can't get a vs b to have any real time difference)
> >>>>
> >>>>
> >>>> On Mon, Apr 6, 2015 at 2:04 PM, Chandler Carruth <
> chandlerc at google.com> wrote:
> >>>>> Sorry, I clearly meant SmallSetVector (which uses SmallVector and
> >>>>> SmallPtrSet).
> >>>>>
> >>>>> Also, I was not supposing you would touch the blocks thing at all,
> nor
> >>>>> thinking this would remove any non-determinism. It was just that if
> you're
> >>>>> pushing onto a vector only when insertion into a set vector
> succeeds, it
> >>>>> might make sense. There are still plenty of cases where it doesn't,
> either
> >>>>> due to type mis-matches, or because you want to keep growing the set
> while
> >>>>> shrinking and re-using space in the vector.
> >>>>>
> >>>>> On Mon, Apr 6, 2015 at 2:02 PM Cameron Zwarich <zwarich at apple.com>
> wrote:
> >>>>>>
> >>>>>> Yeah, that’s a bigger problem. My original complaints 2) and 3) are
> >>>>>> invalid because those are template parameters that could presumedly
> be
> >>>>>> swapped out for SmallVector / SmallPtrSet anyways.
> >>>>>>
> >>>>>> Cameron
> >>>>>>
> >>>>>>> On Apr 6, 2015, at 2:00 PM, Daniel Berlin <dberlin at dberlin.org>
> wrote:
> >>>>>>>
> >>>>>>> Unless i'm missing something, you can't use setvector anyway,
> because
> >>>>>>> it won't remove the non-determinism the sort is there to fix.
> >>>>>>>
> >>>>>>> That non-determinism comes from the fact that defblocks is a
> >>>>>>> smallptrset, and even if the priority queue's order is stable, the
> >>>>>>> only guarantee you get is that things at the same dom level end up
> >>>>>>> around the same place in the priority queue (but not in a specific
> >>>>>>> order).
> >>>>>>>
> >>>>>>>
> >>>>>>> If domtreenodepair was domtreenodetriple (<BasicBlock, DomLevel,
> >>>>>>> BBNumbers[BB]>), and sorted by second then third part of tuple,
> >>>>>>> dfblocks could be a std::vector and you could use setvector, and
> not
> >>>>>>> sort anything, because it would be deterministic.
> >>>>>>>
> >>>>>>> Whether any of this is worth it, no idea.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Apr 6, 2015 at 1:56 PM, Cameron Zwarich <zwarich at apple.com
> >
> >>>>>>> wrote:
> >>>>>>>> I just thought it would make aesthetic sense to have less code in
> the
> >>>>>>>> inner
> >>>>>>>> loop, but that’s arbitrary. If people think the ‘a’ patch makes
> more
> >>>>>>>> sense
> >>>>>>>> and it’s equivalent perfwise, then I’m fine with that. Code is
> made for
> >>>>>>>> reading, not writing. :P
> >>>>>>>>
> >>>>>>>> I’m not sure SetVector makes sense here for a few reasons:
> >>>>>>>>
> >>>>>>>> 1) The set and the vector store different types; the vector also
> >>>>>>>> contains
> >>>>>>>> the BB number.
> >>>>>>>> 2) The vector in SetVector is a std::vector, not a SmallVector.
> >>>>>>>> 3) The set in SetVector is a SmallSet of a fixed size of 16
> (smaller
> >>>>>>>> than
> >>>>>>>> the 32 used here), and the ‘large’ case of SmallSet is a std::set,
> >>>>>>>> which is
> >>>>>>>> probably slower than SmallPtrSet’s ‘large’ case.
> >>>>>>>>
> >>>>>>>> Cameron
> >>>>>>>>
> >>>>>>>> On Apr 6, 2015, at 1:40 PM, Chandler Carruth <
> chandlerc at google.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> I'd like to understand why Cameron would prefer the 'b' patch to
> the
> >>>>>>>> 'a'
> >>>>>>>> patch. AFAICT, the 'b' patch doesn't save any actual memory
> (sadly).
> >>>>>>>>
> >>>>>>>> Also, if we go with 'a', would it make sense to use a SetVector
> rather
> >>>>>>>> than
> >>>>>>>> a separate vector and set?
> >>>>>>>>
> >>>>>>>> On Mon, Apr 6, 2015 at 1:36 PM Daniel Berlin <dberlin at dberlin.org
> >
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> I'm running numbers on both approaches now, to see if there is
> any
> >>>>>>>>> real difference in speed.
> >>>>>>>>>
> >>>>>>>>> (otherwise, i think the one with two visited worklists is easier
> to
> >>>>>>>>> understand, unless someone else wants to disagree :P)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Apr 6, 2015 at 11:54 AM, Quentin Colombet
> >>>>>>>>> <qcolombet at apple.com>
> >>>>>>>>> wrote:
> >>>>>>>>>> Hi Cameron,
> >>>>>>>>>>
> >>>>>>>>>> Sounds good.
> >>>>>>>>>>
> >>>>>>>>>> Let us wait the complete testing from Daniel before reviewing
> the
> >>>>>>>>>> patch.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> -Quentin
> >>>>>>>>>>> On Apr 6, 2015, at 11:27 AM, Cameron Zwarich <
> zwarich at apple.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> It was pointed out to me (without any specifics) that the
> iterated
> >>>>>>>>>>> dominance frontier algorithm in PromoteMemoryToRegister.cpp has
> >>>>>>>>>>> O(n^2) worst
> >>>>>>>>>>> case behavior.
> >>>>>>>>>>>
> >>>>>>>>>>> I inspected the code and think I found the cause. The code
> uses a
> >>>>>>>>>>> priority queue and a worklist, which share the same visited
> set, but
> >>>>>>>>>>> the
> >>>>>>>>>>> visited set is only updated when inserting into the priority
> queue.
> >>>>>>>>>>> The
> >>>>>>>>>>> original Sreedhar-Gao paper effectively has a second visited
> set
> >>>>>>>>>>> (the InPhi
> >>>>>>>>>>> flag) which is used for the priority queue, and the set called
> >>>>>>>>>>> Visited is
> >>>>>>>>>>> used for the recursive traversal that is done here with a
> worklist.
> >>>>>>>>>>>
> >>>>>>>>>>> I’ve attached two patches, one which just adds a second
> visited sit,
> >>>>>>>>>>> and another which leverages the fact that one of the visited
> sets is
> >>>>>>>>>>> actually the IDF. I would prefer the latter if it has equal
> >>>>>>>>>>> performance with
> >>>>>>>>>>> the first.
> >>>>>>>>>>>
> >>>>>>>>>>> They both pass `make check`, but I’m not sure I’ll have time
> to give
> >>>>>>>>>>> these patches the testing they’ll deserve in the next few days.
> >>>>>>>>>>> Daniel
> >>>>>>>>>>> Berlin has offered to test them more thoroughly for me.
> >>>>>>>>>>>
> >>>>>>>>>>> Note that there is still one difference with the paper. The
> paper
> >>>>>>>>>>> uses
> >>>>>>>>>>> a custom linked data structure instead of a priority queue,
> which
> >>>>>>>>>>> takes
> >>>>>>>>>>> advantage of the property that the level of all nodes being
> inserted
> >>>>>>>>>>> is at
> >>>>>>>>>>> most the current level. The code in LLVM uses a priority queue
> based
> >>>>>>>>>>> on a
> >>>>>>>>>>> binary heap. This means that the worst case is O(n log n), but
> I’d
> >>>>>>>>>>> be
> >>>>>>>>>>> surprised if the difference matters in practice.
> >>>>>>>>>>>
> >>>>>>>>>>> Cameron
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> <idf-faster-a.patch><idf-faster-b.patch>_______________
> ________________________________
> >>>>>>>>>>> llvm-commits mailing list
> >>>>>>>>>>> llvm-commits at cs.uiuc.edu
> >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> llvm-commits mailing list
> >>>>>>>>> llvm-commits at cs.uiuc.edu
> >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> llvm-commits mailing list
> >>>>>>>> llvm-commits at cs.uiuc.edu
> >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150407/98bddf0d/attachment.html>