[PATCH] D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929)

Mon Oct 15 07:34:48 PDT 2018

NutshellySima added a comment.

In https://reviews.llvm.org/D53245#1265363, @kuhar wrote:

> In https://reviews.llvm.org/D53245#1265080, @NutshellySima wrote:
>
> > In https://reviews.llvm.org/D53245#1264864, @dmgreen wrote:
> >
> > > Interesting. Nice improvements. What about small trees? It would seem that any tree less that 75 nodes would always be recalculated. Do the timings you ran include things to show that this is better? Or was that just looking at larger trees at the time?
> >
> >
> > I haven't conducted dedicated tests on graphs with the number of vertices less than 75, but among all the domtrees recalculated in the two `clang-*.bc` I mentioned above, there are approximately 63% of them has an `n` smaller than 75.
>
>
> I'd be much more comfortable if this change didn't always recalculate for graphs smaller than 75. I think there is a lot of value in making sure the incremental updater is behaving correctly on small examples (e.g., existing unit tests, most regression tests, etc.). If for some reason it stops working, it will be much easier to detect and fix it on these small examples.

Thanks for the feedback. It makes sense to stop recalculating on small graphs but I need to have a check on whether it performs well.

> 
> 
>> In fact, the time used by directly applying updates to the domtree also depends on whether the graph is sparse or dense.  It is hard to generate accurate heuristic to estimate when to recalculate the domtree or not. Because edge insertions have a complexity of `O(m*min{n,k}+nk)`, edge deletions have a complexity of `O(m*k)` and recalculation has a complexity of `O(n^2)`. (Let `n` denote number of vertices of the graph/`m` be the number of edges after updates/`k` be the number of updates). I think if we want to fine-tune this heuristic, we need to collect a lot of statistics and mathematically fit the function (n,m,k)->bool (whether to recalculate or not).
> 
> Since we cannot measure running time directly in any realistic scenario, I think that the best way to approach this would be to count the number of graph nodes visited during DFS walks and see how many of them we get with different heuristics.
>  I think that the bugs already established that we need to have some fallback heuristic. I'd suggest being careful here and trying at least a couple of different policies on different projects (maybe clang, sqlite, the lnt test suite). There are a couple of properties I propose for the heuristic:
> 
> 1. Don't always recalculate small trees (say, n < 100)
> 2. Recalculate when k > n
> 3. Recalculate in the case of the performance bugs we saw
> 4. Visit fewer nodes than the current updater for Clang + SQLite + Oggenc + ...

For `k>n` you mentioned, I think you mean `k>n/α` (α is a constant) because for the reproducer in 37929, the minimum value of α to make the time consumed by clang acceptable is about 3. But selecting a higher value like "75" can also boost other inputs more (about 2x~4.5x on `clang*.bc`s).

> 
> 
>> @kuhar , do you think we need to use another heuristic involving `m`? Though I think maybe using only `n` is sufficient? :)
> 
> Intuitively, I think that the heuristic can be some linear function: a * n + b * k > threshold, where a and b are coefficients and the threshold is a constant. I don't think it's possible to calculate m efficiently, is it?

`m` is the number of edges after updates. I think `m` can only be calculated by traversing two levels of linked lists to count the number of edges. It seems that it is `O(m)`. I just think whether it is worthy to find some heuristic like `n^2/m` (I'm not sure whether it works better than the current one) and can avoid some cases when the graph is really sparse but also triggers the recalculation.

https://reviews.llvm.org/D53245