<div dir="ltr">First, thanks. This is a very very long time coming :)<div>Second, for those watching, note that pretty much all of the improvements and missing cases, including load forwarding/coercion, etc, actually are done.<div>It's more a matter of cleaning them up, breaking it down, and submitting it, than "implementing them".</div><div>The main experimentation at this point is "can we do it cleaner" not "can we do it" :)</div><div><br></div><div>It's also important to note that this new GVN also treats loads/stores in a unified way with scalars, unlike current GVN (which has no load or store value numbering).</div><div><br></div><div>So it will happily discover complex load/store relations (though there is some improvements we can still make here)</div><div>For example:<br></div><div><br></div><div><div>int vnum_test8(int *data)</div><div>{</div><div>  int i;</div><div>  int stop = data[3];</div><div>  int m = data[4];</div><div>  int n = m;</div><div>  int p;</div><div>  for (i=0; i<stop; i++) {</div><div>    int k = data[2];</div><div>    data[k] = 2;</div><div>    data[0] = m - n;</div><div>    k = data[1];</div><div>    m = m + k;</div><div>    n = n + k;</div><div>    p = data[0];</div><div>  }</div><div>  return p;</div><div>}</div></div><div><br></div><div>LLVM's current GVN will eliminate a single load here[1]</div><div>NewGVN will calculate that m and n are equal, that m-n is 0, that p is 0<br></div><div><br></div><div>It's not quite perfect yet, i haven't fixed store handling, so the following is missed:<br><div>int a;</div><div>int *p;</div><div>// LLVM is too smart and if we don't do this, realizes *p is a store to undef</div><div>void foo(){</div><div>    p = &a;</div><div>}</div><div>int main(int argc, char **argv) {</div><div>  int result;</div><div>  foo();</div><div>  *p = 2;</div><div>  if (argc)</div><div>    *p = 2;</div><div>  result = *p;</div><div>  return result;</div><div>}</div></div><div><br></div><div>Here, current LLVM GVN will do nothing, because it can't understand anything really about the stores.</div><div>GCC's GVN will determine result is 2.</div><div>NewGVN is not quite that smart yet (it require a little work to what we do to stores, and value numbering memory ssa versions)</div><div><br></div><div>This issue compounds if you have conditional stores of the same value.</div><div><br></div><div>So, for example, if you add:<br><br></div><div>if (i < 30)</div><div>  data[0] = 0;</div><div><br></div><div>to the first case.</div><div><br></div><div>GCC can still determine p is 0.</div></div><div><br></div><div>Currently, NewGVN cannot.</div><div><br></div><div>--Dan</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 15, 2016 at 10:49 PM, Davide Italiano <span dir="ltr"><<a href="mailto:davide@freebsd.org" target="_blank">davide@freebsd.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

we would like to propose  a new Global Value Numbering pass in LLVM.<br>

The ideas/code are from Daniel Berlin (with a minor overhaul/splitting<br>

into submittable patches from me). The code has been around for a<br>

while (2012 or before), and we think it's getting ready to be<br>

committed upstream.<br>

<br>

### Motivation<br>

<br>

To put things into context: my personal motivation for having a new<br>

GVN/PRE algorithm is LTO.<br>

It's not a secret that LLVM is getting slower and slower release after<br>

release, as Rafael discovered/pointed out in March [1] (and probably<br>

many others found out). I personally took a shot at profiling LTO on<br>

many internal/opensource applications (including clang itself) and<br>

noticed that GVN always show in the top-3 passes (and it's generally<br>

the pass where we spend most of the time in the middle-end). There are<br>

cases (extreme) where 90% of the compile time goes in GVN.<br>

<br>

Example:<br>

===---------------------------<wbr>------------------------------<wbr>----------------===<br>

                      ... Pass execution timing report ...<br>

===---------------------------<wbr>------------------------------<wbr>----------------===<br>

  Total Execution Time: 684.7622 seconds (683.7141 wall clock)<br>

<br>

   ---User Time---   --System Time--   --User+System--   ---Wall<br>

Time---  --- Name ---<br>

  130.2558 ( 20.0%)   6.3128 ( 18.7%)  136.5686 ( 19.9%)  137.6145 (<br>

20.1%)  X86 DAG->DAG Instruction Selection<br>

  55.4695 (  8.5%)   0.5501 (  1.6%)  56.0196 (  8.2%)  56.1049 (<br>

8.2%)  Function Integration/Inlining<br>

  42.3492 (  6.5%)   0.0364 (  0.1%)  42.3856 (  6.2%)  42.8676 (<br>

6.3%)  Global Value Numbering<br>

<br>

### Problems in the current GVN<br>

<br>

There are some issues in the current GVN infrastructure. A<br>

non-exhaustive list (I'm pretty sure people have their list of<br>

complains, these are the ones I care about the most).<br>

* GVN is very slow for large test cases<br>

* GVN doesn't do a real analysis, instead it eliminates as it goes, so<br>

it's hard to reason about it.<br>

* GVN doesn't perform any phi predication, i.e. doesn't know about phi<br>

nodes, so later passes have to do some extra work to clean up<br>

* There are bugs, e.g. [2] which would require a rewrite of PRE to be fixed.<br>

<br>

### NewGVN<br>

<br>

The new algorithm implements the ideas described in [3] with some<br>

engineering optimizations by Dan (for example the set of touched<br>

instructions is represented using a Bitvector instead of a set because<br>

it's not uncommon for large functions where a predicate change that<br>

thousands of instructions need to be changed, and we both measured 30%<br>

of the whole pass time spent just modifying the set).<br>

The code pretty much maps what the paper describe so I won't try to<br>

repeat it here =)<br>

<br>

Some advantages of NewGVN:<br>

* GVN performs a real analysis (which is now part of the pass itself<br>

but potentially could be split and used as an utility by other<br>

passes). For example, Quentin/Dan pointed out that outlining at the IR<br>

level is hard without a proper value numbering analysis.<br>

* On large testcases, It's faster than current GVN, even though didn't<br>

go through a lot of profiling/optimization lately (I found it up to<br>

50% faster in compile time on some internal benchmarks). There are<br>

some places were we can improve. For example, we spend a lot of time<br>

inside Simplify* functions. Another considerable chunk of the time is<br>

spent inside MemorySSA, but this is going to change once we preserve<br>

MemSSA more aggressively.<br>

* The code is easier to understand (at least to me)<br>

[...]<br>

<br>

Some current limitations of NewGVN:<br>

* NewGVN doesn't do everything that the current GVN does. There are<br>

plans to implement missing features and more.<br>

* On small testcases NewGVN may be slower as it does some work upfront<br>

that the current GVN doesn't do. NewGVN is probably less lazy than it<br>

should be, but it didn't matter for us so we didn't consider it a<br>

priority. If somebody cares and finds a case where NewGVN<br>

substantially regresses, speak up.<br>

<br>

The initial code can be found here <a href="https://reviews.llvm.org/D26224" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>D26224</a><br>

The current patch includes only the "core" GVN algorithm, i.e. the<br>

expression framework/the logic to perform congruence finding. Other<br>

pieces (e.g. PRE will build on top of that).<br>

<br>

My rough plan is:<br>

* Try to get this reviewed/tested and in tree.<br>

* Fix miscompiles/bugs (and performance problems as/if they arise).<br>

* Build other pieces of the algorithm on top of what we have. My<br>

immediate concern will be implementing support for llvm.assume and<br>

load coercion. Dan said he will try to find the time to work on the<br>

predicate handling (actually, there's already a branch<br>

<a href="https://github.com/dberlin/llvm-gvn-rewrite/tree/newgvn-predicates" rel="noreferrer" target="_blank">https://github.com/dberlin/<wbr>llvm-gvn-rewrite/tree/newgvn-<wbr>predicates</a>).<br>

<br>

Please let us know what you think. Any feedback/review comment/testing<br>

is very appreciated!<br>

<br>

Thanks!<br>

<br>

[1] <a href="http://lists.llvm.org/pipermail/llvm-dev/2016-March/096488.html" rel="noreferrer" target="_blank">http://lists.llvm.org/<wbr>pipermail/llvm-dev/2016-March/<wbr>096488.html</a><br>

[2] <a href="https://llvm.org/bugs/show_bug.cgi?id=30692#c11" rel="noreferrer" target="_blank">https://llvm.org/bugs/show_<wbr>bug.cgi?id=30692#c11</a><br>

[3] <a href="http://dl.acm.org/citation.cfm?id=512536" rel="noreferrer" target="_blank">http://dl.acm.org/citation.<wbr>cfm?id=512536</a><br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Davide<br>

<br>

"There are no solved problems; there are only problems that are more<br>

or less solved" -- Henri Poincare<br>

</font></span></blockquote></div><br></div>