<div dir="ltr">(FYI, we have a  'utils' directory that is great for these kinds of python scripts provided you have nice comments explaining what it is useful for and how to use it)<br></div><br><div class="gmail_quote">On Mon, Apr 6, 2015 at 5:26 PM Daniel Berlin <<a href="mailto:dberlin@dberlin.org">dberlin@dberlin.org</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">(Small update to laddergraph.py in case anyone else cares to use it,<br>

it now errors if you try to give it an odd number, generates bb labels<br>

that are named right, etc. It has no effect on the numbers, of course)<br>

<br>

On Mon, Apr 6, 2015 at 4:59 PM, Daniel Berlin <<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>> wrote:<br>

> So, to drive worst case behavior (besides all the other testing), i<br>

> wrote the attached to generate ladder graphs of size N, which are the<br>

> worst case for DF/IDF algorithms (N^2)<br>

><br>

> If you want perfect ladder looking graphs, you have to use<br>

> -simplifycfg on the result.<br>

> Note: Ladder graphs seems to generate at least n^2 behavior in<br>

> GraphWriter, so using a 50000 rung ladder graph doesn't seem sane with<br>

>  -view-cfg.  It doesn't use n^2 memory at least, but i stopped it<br>

> after 15 minutes.<br>

><br>

> Anyway, as one might expect, even something like using -domfrontier on<br>

> a 50000 rung ladder graph is not possible.<br>

><br>

> The same is true for the algorithm before your patches.<br>

> Before your patches, i stopped it at 5 minutes.<br>

> After your patches, it takes <50ms.<br>

><br>

> This is tested with memoryssa, which only builds SSA for one variable,<br>

> and does no pruning, which is very easy to test against.<br>

> It's safe to say "it's always an improvement" there<br>

><br>

> For promotememorytoregisters, besides generating code that defeats the<br>

> pruning,  I tested across a ton of different types of code for both of<br>

> these, and can't find a single case where it's slower.<br>

><br>

><br>

> On Mon, Apr 6, 2015 at 4:39 PM, Cameron Zwarich <<a href="mailto:zwarich@apple.com" target="_blank">zwarich@apple.com</a>> wrote:<br>

>> How does perf look relative to the current code in the tree?<br>

>><br>

>> Cameron<br>

>><br>

>>> On Apr 6, 2015, at 3:57 PM, Daniel Berlin <<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>> wrote:<br>

>>><br>

>>> So, i can't find a testcase up to 100k bb's where *anything* matters<br>

>>> in terms of a vs b.<br>

>>> I also tried it on 50k functions of 1000 bb's as well.<br>

>>><br>

>>> In trying to use setvector, i found another set of much simpler<br>

>>> reasons it won't work:<br>

>>><br>

>>> 1. We clear worklist every time we pop from the PQ. We can't do this<br>

>>> with setvector<br>

>>> 2. We pop values off the worklist, which would erase them from<br>

>>> setvector, but as implemented, we don't want them revisited anyway.<br>

>>><br>

>>> IE Right now,  a value inserted into visitedworklist stays there the<br>

>>> entire function, but a value inserted into worklist is cleared every<br>

>>> time we pop something off the PQ.<br>

>>><br>

>>><br>

>>> #1 is pointless (in the sense that the worklist is always empty at the<br>

>>> point it's called, i assume it's there to do shrink to fit or<br>

>>> something)<br>

>>><br>

>>> Unless #2 is a bug, we can't use setvector.<br>

>>><br>

>>><br>

>>><br>

>>><br>

>>> On Mon, Apr 6, 2015 at 3:17 PM, Daniel Berlin <<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>> wrote:<br>

>>>> I'll time it.<br>

>>>><br>

>>>> I'll use a GCC testcase that generates fully connected graphs.<br>

>>>><br>

>>>> (I've already tried 10000+ bb cases with if's and and merges, and<br>

>>>> can't get a vs b to have any real time difference)<br>

>>>><br>

>>>><br>

>>>> On Mon, Apr 6, 2015 at 2:04 PM, Chandler Carruth <<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>> wrote:<br>

>>>>> Sorry, I clearly meant SmallSetVector (which uses SmallVector and<br>

>>>>> SmallPtrSet).<br>

>>>>><br>

>>>>> Also, I was not supposing you would touch the blocks thing at all, nor<br>

>>>>> thinking this would remove any non-determinism. It was just that if you're<br>

>>>>> pushing onto a vector only when insertion into a set vector succeeds, it<br>

>>>>> might make sense. There are still plenty of cases where it doesn't, either<br>

>>>>> due to type mis-matches, or because you want to keep growing the set while<br>

>>>>> shrinking and re-using space in the vector.<br>

>>>>><br>

>>>>> On Mon, Apr 6, 2015 at 2:02 PM Cameron Zwarich <<a href="mailto:zwarich@apple.com" target="_blank">zwarich@apple.com</a>> wrote:<br>

>>>>>><br>

>>>>>> Yeah, that’s a bigger problem. My original complaints 2) and 3) are<br>

>>>>>> invalid because those are template parameters that could presumedly be<br>

>>>>>> swapped out for SmallVector / SmallPtrSet anyways.<br>

>>>>>><br>

>>>>>> Cameron<br>

>>>>>><br>

>>>>>>> On Apr 6, 2015, at 2:00 PM, Daniel Berlin <<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>> wrote:<br>

>>>>>>><br>

>>>>>>> Unless i'm missing something, you can't use setvector anyway, because<br>

>>>>>>> it won't remove the non-determinism the sort is there to fix.<br>

>>>>>>><br>

>>>>>>> That non-determinism comes from the fact that defblocks is a<br>

>>>>>>> smallptrset, and even if the priority queue's order is stable, the<br>

>>>>>>> only guarantee you get is that things at the same dom level end up<br>

>>>>>>> around the same place in the priority queue (but not in a specific<br>

>>>>>>> order).<br>

>>>>>>><br>

>>>>>>><br>

>>>>>>> If domtreenodepair was domtreenodetriple (<BasicBlock, DomLevel,<br>

>>>>>>> BBNumbers[BB]>), and sorted by second then third part of tuple,<br>

>>>>>>> dfblocks could be a std::vector and you could use setvector, and not<br>

>>>>>>> sort anything, because it would be deterministic.<br>

>>>>>>><br>

>>>>>>> Whether any of this is worth it, no idea.<br>

>>>>>>><br>

>>>>>>><br>

>>>>>>><br>

>>>>>>> On Mon, Apr 6, 2015 at 1:56 PM, Cameron Zwarich <<a href="mailto:zwarich@apple.com" target="_blank">zwarich@apple.com</a>><br>

>>>>>>> wrote:<br>

>>>>>>>> I just thought it would make aesthetic sense to have less code in the<br>

>>>>>>>> inner<br>

>>>>>>>> loop, but that’s arbitrary. If people think the ‘a’ patch makes more<br>

>>>>>>>> sense<br>

>>>>>>>> and it’s equivalent perfwise, then I’m fine with that. Code is made for<br>

>>>>>>>> reading, not writing. :P<br>

>>>>>>>><br>

>>>>>>>> I’m not sure SetVector makes sense here for a few reasons:<br>

>>>>>>>><br>

>>>>>>>> 1) The set and the vector store different types; the vector also<br>

>>>>>>>> contains<br>

>>>>>>>> the BB number.<br>

>>>>>>>> 2) The vector in SetVector is a std::vector, not a SmallVector.<br>

>>>>>>>> 3) The set in SetVector is a SmallSet of a fixed size of 16 (smaller<br>

>>>>>>>> than<br>

>>>>>>>> the 32 used here), and the ‘large’ case of SmallSet is a std::set,<br>

>>>>>>>> which is<br>

>>>>>>>> probably slower than SmallPtrSet’s ‘large’ case.<br>

>>>>>>>><br>

>>>>>>>> Cameron<br>

>>>>>>>><br>

>>>>>>>> On Apr 6, 2015, at 1:40 PM, Chandler Carruth <<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>><br>

>>>>>>>> wrote:<br>

>>>>>>>><br>

>>>>>>>> I'd like to understand why Cameron would prefer the 'b' patch to the<br>

>>>>>>>> 'a'<br>

>>>>>>>> patch. AFAICT, the 'b' patch doesn't save any actual memory (sadly).<br>

>>>>>>>><br>

>>>>>>>> Also, if we go with 'a', would it make sense to use a SetVector rather<br>

>>>>>>>> than<br>

>>>>>>>> a separate vector and set?<br>

>>>>>>>><br>

>>>>>>>> On Mon, Apr 6, 2015 at 1:36 PM Daniel Berlin <<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>><br>

>>>>>>>> wrote:<br>

>>>>>>>>><br>

>>>>>>>>> I'm running numbers on both approaches now, to see if there is any<br>

>>>>>>>>> real difference in speed.<br>

>>>>>>>>><br>

>>>>>>>>> (otherwise, i think the one with two visited worklists is easier to<br>

>>>>>>>>> understand, unless someone else wants to disagree :P)<br>

>>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>> On Mon, Apr 6, 2015 at 11:54 AM, Quentin Colombet<br>

>>>>>>>>> <<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>><br>

>>>>>>>>> wrote:<br>

>>>>>>>>>> Hi Cameron,<br>

>>>>>>>>>><br>

>>>>>>>>>> Sounds good.<br>

>>>>>>>>>><br>

>>>>>>>>>> Let us wait the complete testing from Daniel before reviewing the<br>

>>>>>>>>>> patch.<br>

>>>>>>>>>><br>

>>>>>>>>>> Thanks,<br>

>>>>>>>>>> -Quentin<br>

>>>>>>>>>>> On Apr 6, 2015, at 11:27 AM, Cameron Zwarich <<a href="mailto:zwarich@apple.com" target="_blank">zwarich@apple.com</a>><br>

>>>>>>>>>>> wrote:<br>

>>>>>>>>>>><br>

>>>>>>>>>>> It was pointed out to me (without any specifics) that the iterated<br>

>>>>>>>>>>> dominance frontier algorithm in PromoteMemoryToRegister.cpp has<br>

>>>>>>>>>>> O(n^2) worst<br>

>>>>>>>>>>> case behavior.<br>

>>>>>>>>>>><br>

>>>>>>>>>>> I inspected the code and think I found the cause. The code uses a<br>

>>>>>>>>>>> priority queue and a worklist, which share the same visited set, but<br>

>>>>>>>>>>> the<br>

>>>>>>>>>>> visited set is only updated when inserting into the priority queue.<br>

>>>>>>>>>>> The<br>

>>>>>>>>>>> original Sreedhar-Gao paper effectively has a second visited set<br>

>>>>>>>>>>> (the InPhi<br>

>>>>>>>>>>> flag) which is used for the priority queue, and the set called<br>

>>>>>>>>>>> Visited is<br>

>>>>>>>>>>> used for the recursive traversal that is done here with a worklist.<br>

>>>>>>>>>>><br>

>>>>>>>>>>> I’ve attached two patches, one which just adds a second visited sit,<br>

>>>>>>>>>>> and another which leverages the fact that one of the visited sets is<br>

>>>>>>>>>>> actually the IDF. I would prefer the latter if it has equal<br>

>>>>>>>>>>> performance with<br>

>>>>>>>>>>> the first.<br>

>>>>>>>>>>><br>

>>>>>>>>>>> They both pass `make check`, but I’m not sure I’ll have time to give<br>

>>>>>>>>>>> these patches the testing they’ll deserve in the next few days.<br>

>>>>>>>>>>> Daniel<br>

>>>>>>>>>>> Berlin has offered to test them more thoroughly for me.<br>

>>>>>>>>>>><br>

>>>>>>>>>>> Note that there is still one difference with the paper. The paper<br>

>>>>>>>>>>> uses<br>

>>>>>>>>>>> a custom linked data structure instead of a priority queue, which<br>

>>>>>>>>>>> takes<br>

>>>>>>>>>>> advantage of the property that the level of all nodes being inserted<br>

>>>>>>>>>>> is at<br>

>>>>>>>>>>> most the current level. The code in LLVM uses a priority queue based<br>

>>>>>>>>>>> on a<br>

>>>>>>>>>>> binary heap. This means that the worst case is O(n log n), but I’d<br>

>>>>>>>>>>> be<br>

>>>>>>>>>>> surprised if the difference matters in practice.<br>

>>>>>>>>>>><br>

>>>>>>>>>>> Cameron<br>

>>>>>>>>>>><br>

>>>>>>>>>>><br>

>>>>>>>>>>><br>

>>>>>>>>>>> <idf-faster-a.patch><idf-<u></u>faster-b.patch>_______________<u></u>______________________________<u></u>__<br>

>>>>>>>>>>> llvm-commits mailing list<br>

>>>>>>>>>>> <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

>>>>>>>>>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailman/listinfo/llvm-commits</a><br>

>>>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>> ______________________________<u></u>_________________<br>

>>>>>>>>> llvm-commits mailing list<br>

>>>>>>>>> <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

>>>>>>>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailman/listinfo/llvm-commits</a><br>

>>>>>>>><br>

>>>>>>>> ______________________________<u></u>_________________<br>

>>>>>>>> llvm-commits mailing list<br>

>>>>>>>> <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

>>>>>>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailman/listinfo/llvm-commits</a><br>

>>>>>>>><br>

>>>>>>>><br>

>>>>>><br>

>>>>><br>

>><br>

</blockquote></div>