<div dir="ltr">First, thanks. This is a very very long time coming :)<div>Second, for those watching, note that pretty much all of the improvements and missing cases, including load forwarding/coercion, etc, actually are done.<div>It's more a matter of cleaning them up, breaking it down, and submitting it, than "implementing them".</div><div>The main experimentation at this point is "can we do it cleaner" not "can we do it" :)</div><div><br></div><div>It's also important to note that this new GVN also treats loads/stores in a unified way with scalars, unlike current GVN (which has no load or store value numbering).</div><div><br></div><div>So it will happily discover complex load/store relations (though there is some improvements we can still make here)</div><div>For example:<br></div><div><br></div><div><div>int vnum_test8(int *data)</div><div>{</div><div> int i;</div><div> int stop = data[3];</div><div> int m = data[4];</div><div> int n = m;</div><div> int p;</div><div> for (i=0; i<stop; i++) {</div><div> int k = data[2];</div><div> data[k] = 2;</div><div> data[0] = m - n;</div><div> k = data[1];</div><div> m = m + k;</div><div> n = n + k;</div><div> p = data[0];</div><div> }</div><div> return p;</div><div>}</div></div><div><br></div><div>LLVM's current GVN will eliminate a single load here[1]</div><div>NewGVN will calculate that m and n are equal, that m-n is 0, that p is 0<br></div><div><br></div><div>It's not quite perfect yet, i haven't fixed store handling, so the following is missed:<br><div>int a;</div><div>int *p;</div><div>// LLVM is too smart and if we don't do this, realizes *p is a store to undef</div><div>void foo(){</div><div> p = &a;</div><div>}</div><div>int main(int argc, char **argv) {</div><div> int result;</div><div> foo();</div><div> *p = 2;</div><div> if (argc)</div><div> *p = 2;</div><div> result = *p;</div><div> return result;</div><div>}</div></div><div><br></div><div>Here, current LLVM GVN will do nothing, because it can't understand anything really about the stores.</div><div>GCC's GVN will determine result is 2.</div><div>NewGVN is not quite that smart yet (it require a little work to what we do to stores, and value numbering memory ssa versions)</div><div><br></div><div>This issue compounds if you have conditional stores of the same value.</div><div><br></div><div>So, for example, if you add:<br><br></div><div>if (i < 30)</div><div> data[0] = 0;</div><div><br></div><div>to the first case.</div><div><br></div><div>GCC can still determine p is 0.</div></div><div><br></div><div>Currently, NewGVN cannot.</div><div><br></div><div>--Dan</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 15, 2016 at 10:49 PM, Davide Italiano <span dir="ltr"><<a href="mailto:davide@freebsd.org" target="_blank">davide@freebsd.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
we would like to propose a new Global Value Numbering pass in LLVM.<br>
The ideas/code are from Daniel Berlin (with a minor overhaul/splitting<br>
into submittable patches from me). The code has been around for a<br>
while (2012 or before), and we think it's getting ready to be<br>
committed upstream.<br>
<br>
### Motivation<br>
<br>
To put things into context: my personal motivation for having a new<br>
GVN/PRE algorithm is LTO.<br>
It's not a secret that LLVM is getting slower and slower release after<br>
release, as Rafael discovered/pointed out in March [1] (and probably<br>
many others found out). I personally took a shot at profiling LTO on<br>
many internal/opensource applications (including clang itself) and<br>
noticed that GVN always show in the top-3 passes (and it's generally<br>
the pass where we spend most of the time in the middle-end). There are<br>
cases (extreme) where 90% of the compile time goes in GVN.<br>
<br>
Example:<br>
===---------------------------<wbr>------------------------------<wbr>----------------===<br>
... Pass execution timing report ...<br>
===---------------------------<wbr>------------------------------<wbr>----------------===<br>
Total Execution Time: 684.7622 seconds (683.7141 wall clock)<br>
<br>
---User Time--- --System Time-- --User+System-- ---Wall<br>
Time--- --- Name ---<br>
130.2558 ( 20.0%) 6.3128 ( 18.7%) 136.5686 ( 19.9%) 137.6145 (<br>
20.1%) X86 DAG->DAG Instruction Selection<br>
55.4695 ( 8.5%) 0.5501 ( 1.6%) 56.0196 ( 8.2%) 56.1049 (<br>
8.2%) Function Integration/Inlining<br>
42.3492 ( 6.5%) 0.0364 ( 0.1%) 42.3856 ( 6.2%) 42.8676 (<br>
6.3%) Global Value Numbering<br>
<br>
### Problems in the current GVN<br>
<br>
There are some issues in the current GVN infrastructure. A<br>
non-exhaustive list (I'm pretty sure people have their list of<br>
complains, these are the ones I care about the most).<br>
* GVN is very slow for large test cases<br>
* GVN doesn't do a real analysis, instead it eliminates as it goes, so<br>
it's hard to reason about it.<br>
* GVN doesn't perform any phi predication, i.e. doesn't know about phi<br>
nodes, so later passes have to do some extra work to clean up<br>
* There are bugs, e.g. [2] which would require a rewrite of PRE to be fixed.<br>
<br>
### NewGVN<br>
<br>
The new algorithm implements the ideas described in [3] with some<br>
engineering optimizations by Dan (for example the set of touched<br>
instructions is represented using a Bitvector instead of a set because<br>
it's not uncommon for large functions where a predicate change that<br>
thousands of instructions need to be changed, and we both measured 30%<br>
of the whole pass time spent just modifying the set).<br>
The code pretty much maps what the paper describe so I won't try to<br>
repeat it here =)<br>
<br>
Some advantages of NewGVN:<br>
* GVN performs a real analysis (which is now part of the pass itself<br>
but potentially could be split and used as an utility by other<br>
passes). For example, Quentin/Dan pointed out that outlining at the IR<br>
level is hard without a proper value numbering analysis.<br>
* On large testcases, It's faster than current GVN, even though didn't<br>
go through a lot of profiling/optimization lately (I found it up to<br>
50% faster in compile time on some internal benchmarks). There are<br>
some places were we can improve. For example, we spend a lot of time<br>
inside Simplify* functions. Another considerable chunk of the time is<br>
spent inside MemorySSA, but this is going to change once we preserve<br>
MemSSA more aggressively.<br>
* The code is easier to understand (at least to me)<br>
[...]<br>
<br>
Some current limitations of NewGVN:<br>
* NewGVN doesn't do everything that the current GVN does. There are<br>
plans to implement missing features and more.<br>
* On small testcases NewGVN may be slower as it does some work upfront<br>
that the current GVN doesn't do. NewGVN is probably less lazy than it<br>
should be, but it didn't matter for us so we didn't consider it a<br>
priority. If somebody cares and finds a case where NewGVN<br>
substantially regresses, speak up.<br>
<br>
The initial code can be found here <a href="https://reviews.llvm.org/D26224" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>D26224</a><br>
The current patch includes only the "core" GVN algorithm, i.e. the<br>
expression framework/the logic to perform congruence finding. Other<br>
pieces (e.g. PRE will build on top of that).<br>
<br>
My rough plan is:<br>
* Try to get this reviewed/tested and in tree.<br>
* Fix miscompiles/bugs (and performance problems as/if they arise).<br>
* Build other pieces of the algorithm on top of what we have. My<br>
immediate concern will be implementing support for llvm.assume and<br>
load coercion. Dan said he will try to find the time to work on the<br>
predicate handling (actually, there's already a branch<br>
<a href="https://github.com/dberlin/llvm-gvn-rewrite/tree/newgvn-predicates" rel="noreferrer" target="_blank">https://github.com/dberlin/<wbr>llvm-gvn-rewrite/tree/newgvn-<wbr>predicates</a>).<br>
<br>
Please let us know what you think. Any feedback/review comment/testing<br>
is very appreciated!<br>
<br>
Thanks!<br>
<br>
[1] <a href="http://lists.llvm.org/pipermail/llvm-dev/2016-March/096488.html" rel="noreferrer" target="_blank">http://lists.llvm.org/<wbr>pipermail/llvm-dev/2016-March/<wbr>096488.html</a><br>
[2] <a href="https://llvm.org/bugs/show_bug.cgi?id=30692#c11" rel="noreferrer" target="_blank">https://llvm.org/bugs/show_<wbr>bug.cgi?id=30692#c11</a><br>
[3] <a href="http://dl.acm.org/citation.cfm?id=512536" rel="noreferrer" target="_blank">http://dl.acm.org/citation.<wbr>cfm?id=512536</a><br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Davide<br>
<br>
"There are no solved problems; there are only problems that are more<br>
or less solved" -- Henri Poincare<br>
</font></span></blockquote></div><br></div>