<div dir="ltr">Hi Hal,<div><br></div><div>I still can't see any compile-time regression on llvm_test_suite.</div><div><br></div><div><div>|CC_Time<span style="white-space:pre">|</span>1st run<span style="white-space:pre">|</span>2nd run<span style="white-space:pre">|</span>3rd run<span style="white-space:pre">|</span>avg<span style="white-space:pre">|percentage|</span></div>

<div>|without my patch<span style="white-space:pre">|</span>371.0605<span style="white-space:pre">|</span>369.1106<span style="white-space:pre">|</span>368.7374<span style="white-space:pre">|</span>369.6361667<span style="white-space:pre">||</span></div>

<div>|with my patch<span style="white-space:pre">|</span>366.9967<span style="white-space:pre">|</span>367.38<span style="white-space:pre">|</span>368.0141<span style="white-space:pre">|</span>367.4636<span style="white-space:pre">|</span>-0.59%|</div>

</div><div><br></div><div><div>|CC_Real_Time<span style="white-space:pre">|</span>1st run<span style="white-space:pre">|</span>2nd run<span style="white-space:pre">|</span>3rd|run<span class="" style="white-space:pre">      </span>avg|percentage|<br>

</div><div>|without my patch<span style="white-space:pre">|</span>372.3104<span style="white-space:pre">|</span>370.2636<span style="white-space:pre">|</span>370.3317<span style="white-space:pre">|</span>370.9685667<span style="white-space:pre">||</span></div>

<div>|with my patch<span style="white-space:pre">|</span>368.8189<span style="white-space:pre">|</span>368.4998<span style="white-space:pre">|</span>369.4801<span style="white-space:pre">|</span>368.9329333<span style="white-space:pre">|</span>-0.55%|</div>

</div><div><br></div><div>Thanks,</div><div>-Jiangning</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-08-01 12:02 GMT+08:00 Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">----- Original Message -----<br>

> From: "Jiangning Liu" <<a href="mailto:liujiangning1@gmail.com">liujiangning1@gmail.com</a>><br>

</div><div class="">> To: "Hal Finkel" <<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>><br>

> Cc: "<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a> for LLVM" <<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a>>, "Nick Lewycky" <<a href="mailto:nicholas@mxc.ca">nicholas@mxc.ca</a>><br>


> Sent: Thursday, July 31, 2014 10:24:15 PM<br>

> Subject: Re: [PATCH] Fix lazy value info computation to improve jump threading for a common case<br>

><br>

><br>

</div><div class="">> Hi Hal,<br>

><br>

><br>

> Thanks for your comments!<br>

><br>

><br>

> I don't have a precised compile-time test infrastructure, so I can<br>

> only use my local box to try. And if you have a precised<br>

> compile-time test infrastructure, could you please help me to have a<br>

> try?<br>

<br>

</div>If you run the LLVM test suite, it will also collect compile-time numbers. That's normally what I use, run a few times in each configuration.<br>

<div class=""><br>

><br>

><br>

> My local box is Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, and I tried<br>

> to build spec2000 int with a single core, so now I got the following<br>

> result,<br>

><br>

><br>

><br>

><br>

> Without my patch:<br>

><br>

><br>

> real 1m28.809s<br>

> user 1m19.817s<br>

> sys 0m6.158s<br>

><br>

><br>

> real 1m27.538s<br>

> user 1m19.726s<br>

> sys 0m6.083s<br>

><br>

><br>

> real 1m27.551s<br>

> user 1m19.710s<br>

> sys 0m6.012s<br>

><br>

><br>

> With my patch:<br>

><br>

><br>

> real 1m27.935s<br>

> user 1m20.250s<br>

> sys 0m6.046s<br>

><br>

><br>

> real 1m27.867s<br>

> user 1m20.035s<br>

> sys 0m6.268s<br>

><br>

><br>

> real 1m27.511s<br>

> user 1m19.936s<br>

> sys 0m6.266s<br>

><br>

><br>

> I don't really see compile-time change at all.<br>

<br>

</div>If there is a an effect here, it seems definitely no larger than 1%.<br>

<br>

In this case, since we're checking a particular pass or two, you can compile with -mllvm -stats and look specifically at the time taken by Jump Threading and Value Propagation.<br>

<span class="HOEnZb"><font color="#888888"><br>

 -Hal<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

><br>

><br>

> Thanks,<br>

> -Jiangning<br>

><br>

><br>

><br>

><br>

><br>

> 2014-08-01 0:02 GMT+08:00 Hal Finkel < <a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a> > :<br>

><br>

><br>

> Did you measure the compile-time impact of this change? I suspect<br>

> this will be fine because, while it can increase the total amount of<br>

> work done over all visits to a block, it does not increase the total<br>

> number of visits (so the asymptotic complexity bound is still the<br>

> same). Nevertheless, we should double-check. If there is nothing<br>

> significant, then this LGTM.<br>

><br>

> Thanks,<br>

> Hal<br>

><br>

><br>

><br>

> ----- Original Message -----<br>

> > From: "Jiangning Liu" < <a href="mailto:liujiangning1@gmail.com">liujiangning1@gmail.com</a> ><br>

> > To: "Nick Lewycky" < <a href="mailto:nicholas@mxc.ca">nicholas@mxc.ca</a> ><br>

> > Cc: " <a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a> for LLVM" < <a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

> > ><br>

> > Sent: Wednesday, July 30, 2014 4:57:14 AM<br>

> > Subject: Re: [PATCH] Fix lazy value info computation to improve<br>

> > jump threading for a common case<br>

> ><br>

> ><br>

> ><br>

> > Hi Nick,<br>

> ><br>

> ><br>

> ><br>

> > 2014-07-30 16:15 GMT+08:00 Nick Lewycky < <a href="mailto:nicholas@mxc.ca">nicholas@mxc.ca</a> > :<br>

> ><br>

> ><br>

> ><br>

> > Jiangning Liu wrote:<br>

> ><br>

> ><br>

> > Hi,<br>

> ><br>

> > Attached patch is to fix an issue in lazy value info computation,<br>

> > and<br>

> > finally it can improve jump threading for a common case.<br>

> ><br>

> > Lazy value info computation algorithm tries to use lattice to<br>

> > propagate<br>

> > the given value through control flow. The problem of the algorithm<br>

> > is<br>

> > the basic block can only be visited once. Once the lattice state is<br>

> > changed from "undefined" to other states, the lattice state will be<br>

> > cached, and won't be able to be changed any longer.<br>

> ><br>

> > For the following simple control flow graph like below,<br>

> ><br>

> > BB1->BB2, BB1->BB3, BB2->BB3, BB2->BB4<br>

> ><br>

> > When computing a given value on edge BB2->BB3, if B2 doesn't have<br>

> > that<br>

> > value info at all, the algorithm will try to ask for information<br>

> > from<br>

> > BB2's predecessors, and then return to BB2 again to merge the info<br>

> > from<br>

> > predecessors, so BB2 will be visited twice. Unfortunately, at first<br>

> > visit of BB2, the lattice state will be changed from "undefined" to<br>

> > "overdefined", and then it will not be able to be changed any<br>

> > longer.<br>

> ><br>

> > That sounds correct. Once overdefined, it's stuck and can't be<br>

> > changed.<br>

> ><br>

> > How did it get marked overdefined in the first place?<br>

> ><br>

> ><br>

> > it is marked as overdefined immediately after the if statement I<br>

> > modified.<br>

> ><br>

> ><br>

> > Did we speculatively mark it overdefined in case we discovered that<br>

> > the computation is in terms of ourselves through a phi node or by<br>

> > directly depending on ourselves (only possibly if our CFG subgraph<br>

> > is not reachable from entry)? Why could we mark it undefined<br>

> > instead?<br>

> ><br>

> ><br>

> > The initialization is marked as undefined, I think. Lattice is<br>

> > trying<br>

> > to lower the value in this order undefined->overdefined->...<br>

> ><br>

> ><br>

> > The algorithm is so-called "lazy", so at the beginning the value<br>

> > doesn't exist for the block, and it is just initialized as<br>

> > undefined<br>

> > at the first time the algorithm visits it.<br>

> ><br>

> ><br>

> > Why couldn't we add a new "non-local-lookup-in-progress- here!"<br>

> > state<br>

><br>

><br>

> > instead?<br>

> ><br>

> ><br>

> > I think IsOverdefined already implies non-local-lookup, it would be<br>

> > "undefined" otherwise.<br>

> ><br>

> ><br>

> ><br>

> ><br>

> ><br>

> ><br>

> > This patch is to simply check "overdefined", and make sure the<br>

> > algorithm<br>

> > can still move on to propagate the values from predecessor edges to<br>

> > the<br>

> > block itself.<br>

> ><br>

> > That sounds like it could be really bad for compile-time<br>

> > performance.<br>

> ><br>

><br>

><br>

><br>

> ><br>

> ><br>

> ><br>

> > The performance experiment for spec shows pretty good result on<br>

> > aarch64<br>

> > for this fix, and the 253_perlbmk can even have >5% performance<br>

> > improvement.<br>

> ><br>

> > + // we need to compute it's value again.<br>

> ><br>

> > Typo, "it's" -> "its".<br>

> ><br>

> > + // check BB2 again, and at this momement BB2 has Overdefined<br>

> > value<br>

> > for %v in<br>

> ><br>

> > Typo, "momement" -> "moment".<br>

> ><br>

> > + // BB2. So we should have to follow data flow propagation<br>

> > algirithm<br>

> > to get the<br>

> ><br>

> > Typo, "algirithm" -> "algorithm".<br>

> ><br>

> ><br>

> ><br>

> > So for so many typos. Attached is the updated version.<br>

> ><br>

> ><br>

> ><br>

> > BTW, there are some unreviewed patches for lazy value info here:<br>

> > <a href="http://lists.cs.uiuc.edu/" target="_blank">http://lists.cs.uiuc.edu/</a> pipermail/llvm-commits/Week-<br>

> > of-Mon-20140127/203418.html . The problem description sounds like<br>

> > it<br>

> > could be related, it's not obviously not the same problem to me.<br>

> > Did<br>

> > his patch come with tests? Does your patch fix those tests too?<br>

> ><br>

> ><br>

> > I tried Olivier's test case in the 4th patch, and my patch can't<br>

> > really solve his issue, I think.<br>

> ><br>

> ><br>

> > I think Olivier was pursuing an even complete fix for precision<br>

> > purpose. As Nuno pointed out LVI is not designed as recursive on<br>

> > purpose, so there must be some precision lost. For example, my test<br>

> > case can be even extended with more incoming edges for BB2 to<br>

> > influence the value, but the algorithm will stop as soon as it<br>

> > finds<br>

> > the value is neither undefined nor overdefined, so we will not have<br>

> > chance to narrow down the value further.<br>

> ><br>

> ><br>

> > Thanks,<br>

> > -Jiangning<br>

> ><br>

> ><br>

> ><br>

> ><br>

> ><br>

> > Nick<br>

> ><br>

> ><br>

> ><br>

> > _______________________________________________<br>

> > llvm-commits mailing list<br>

> > <a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

> > <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

> ><br>

><br>

> --<br>

> Hal Finkel<br>

> Assistant Computational Scientist<br>

> Leadership Computing Facility<br>

> Argonne National Laboratory<br>

><br>

><br>

<br>

--<br>

Hal Finkel<br>

Assistant Computational Scientist<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

</div></div></blockquote></div><br></div>