[PATCH] Fix lazy value info computation to improve jump threading for a common case
Hal Finkel
hfinkel at anl.gov
Fri Aug 1 05:44:48 PDT 2014
----- Original Message -----
> From: "Jiangning Liu" <liujiangning1 at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "llvm-commits at cs.uiuc.edu for LLVM" <llvm-commits at cs.uiuc.edu>, "Nick Lewycky" <nicholas at mxc.ca>
> Sent: Friday, August 1, 2014 3:29:41 AM
> Subject: Re: [PATCH] Fix lazy value info computation to improve jump threading for a common case
>
>
> Hi Hal,
>
>
> I still can't see any compile-time regression on llvm_test_suite.
Okay, go ahead and commit.
Thanks again,
Hal
>
>
>
> |CC_Time | 1st run | 2nd run | 3rd run | avg |percentage|
> |without my patch | 371.0605 | 369.1106 | 368.7374 | 369.6361667 ||
> |with my patch | 366.9967 | 367.38 | 368.0141 | 367.4636 | -0.59%|
>
>
>
> |CC_Real_Time | 1st run | 2nd run | 3rd|run avg|percentage|
>
> |without my patch | 372.3104 | 370.2636 | 370.3317 | 370.9685667 ||
> |with my patch | 368.8189 | 368.4998 | 369.4801 | 368.9329333 |
> |-0.55%|
>
>
> Thanks,
> -Jiangning
>
>
>
>
>
> 2014-08-01 12:02 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
>
>
>
> ----- Original Message -----
> > From: "Jiangning Liu" < liujiangning1 at gmail.com >
>
> > To: "Hal Finkel" < hfinkel at anl.gov >
> > Cc: " llvm-commits at cs.uiuc.edu for LLVM" < llvm-commits at cs.uiuc.edu
> > >, "Nick Lewycky" < nicholas at mxc.ca >
> > Sent: Thursday, July 31, 2014 10:24:15 PM
> > Subject: Re: [PATCH] Fix lazy value info computation to improve
> > jump threading for a common case
> >
> >
>
> > Hi Hal,
> >
> >
> > Thanks for your comments!
> >
> >
> > I don't have a precised compile-time test infrastructure, so I can
> > only use my local box to try. And if you have a precised
> > compile-time test infrastructure, could you please help me to have
> > a
> > try?
>
> If you run the LLVM test suite, it will also collect compile-time
> numbers. That's normally what I use, run a few times in each
> configuration.
>
>
> >
> >
> > My local box is Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, and I
> > tried
> > to build spec2000 int with a single core, so now I got the
> > following
> > result,
> >
> >
> >
> >
> > Without my patch:
> >
> >
> > real 1m28.809s
> > user 1m19.817s
> > sys 0m6.158s
> >
> >
> > real 1m27.538s
> > user 1m19.726s
> > sys 0m6.083s
> >
> >
> > real 1m27.551s
> > user 1m19.710s
> > sys 0m6.012s
> >
> >
> > With my patch:
> >
> >
> > real 1m27.935s
> > user 1m20.250s
> > sys 0m6.046s
> >
> >
> > real 1m27.867s
> > user 1m20.035s
> > sys 0m6.268s
> >
> >
> > real 1m27.511s
> > user 1m19.936s
> > sys 0m6.266s
> >
> >
> > I don't really see compile-time change at all.
>
> If there is a an effect here, it seems definitely no larger than 1%.
>
> In this case, since we're checking a particular pass or two, you can
> compile with -mllvm -stats and look specifically at the time taken
> by Jump Threading and Value Propagation.
>
> -Hal
>
>
>
> >
> >
> > Thanks,
> > -Jiangning
> >
> >
> >
> >
> >
> > 2014-08-01 0:02 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
> >
> >
> > Did you measure the compile-time impact of this change? I suspect
> > this will be fine because, while it can increase the total amount
> > of
> > work done over all visits to a block, it does not increase the
> > total
> > number of visits (so the asymptotic complexity bound is still the
> > same). Nevertheless, we should double-check. If there is nothing
> > significant, then this LGTM.
> >
> > Thanks,
> > Hal
> >
> >
> >
> > ----- Original Message -----
> > > From: "Jiangning Liu" < liujiangning1 at gmail.com >
> > > To: "Nick Lewycky" < nicholas at mxc.ca >
> > > Cc: " llvm-commits at cs.uiuc.edu for LLVM" <
> > > llvm-commits at cs.uiuc.edu
> > > >
> > > Sent: Wednesday, July 30, 2014 4:57:14 AM
> > > Subject: Re: [PATCH] Fix lazy value info computation to improve
> > > jump threading for a common case
> > >
> > >
> > >
> > > Hi Nick,
> > >
> > >
> > >
> > > 2014-07-30 16:15 GMT+08:00 Nick Lewycky < nicholas at mxc.ca > :
> > >
> > >
> > >
> > > Jiangning Liu wrote:
> > >
> > >
> > > Hi,
> > >
> > > Attached patch is to fix an issue in lazy value info computation,
> > > and
> > > finally it can improve jump threading for a common case.
> > >
> > > Lazy value info computation algorithm tries to use lattice to
> > > propagate
> > > the given value through control flow. The problem of the
> > > algorithm
> > > is
> > > the basic block can only be visited once. Once the lattice state
> > > is
> > > changed from "undefined" to other states, the lattice state will
> > > be
> > > cached, and won't be able to be changed any longer.
> > >
> > > For the following simple control flow graph like below,
> > >
> > > BB1->BB2, BB1->BB3, BB2->BB3, BB2->BB4
> > >
> > > When computing a given value on edge BB2->BB3, if B2 doesn't have
> > > that
> > > value info at all, the algorithm will try to ask for information
> > > from
> > > BB2's predecessors, and then return to BB2 again to merge the
> > > info
> > > from
> > > predecessors, so BB2 will be visited twice. Unfortunately, at
> > > first
> > > visit of BB2, the lattice state will be changed from "undefined"
> > > to
> > > "overdefined", and then it will not be able to be changed any
> > > longer.
> > >
> > > That sounds correct. Once overdefined, it's stuck and can't be
> > > changed.
> > >
> > > How did it get marked overdefined in the first place?
> > >
> > >
> > > it is marked as overdefined immediately after the if statement I
> > > modified.
> > >
> > >
> > > Did we speculatively mark it overdefined in case we discovered
> > > that
> > > the computation is in terms of ourselves through a phi node or by
> > > directly depending on ourselves (only possibly if our CFG
> > > subgraph
> > > is not reachable from entry)? Why could we mark it undefined
> > > instead?
> > >
> > >
> > > The initialization is marked as undefined, I think. Lattice is
> > > trying
> > > to lower the value in this order undefined->overdefined->...
> > >
> > >
> > > The algorithm is so-called "lazy", so at the beginning the value
> > > doesn't exist for the block, and it is just initialized as
> > > undefined
> > > at the first time the algorithm visits it.
> > >
> > >
> > > Why couldn't we add a new "non-local-lookup-in-progress- here!"
> > > state
> >
> >
> > > instead?
> > >
> > >
> > > I think IsOverdefined already implies non-local-lookup, it would
> > > be
> > > "undefined" otherwise.
> > >
> > >
> > >
> > >
> > >
> > >
> > > This patch is to simply check "overdefined", and make sure the
> > > algorithm
> > > can still move on to propagate the values from predecessor edges
> > > to
> > > the
> > > block itself.
> > >
> > > That sounds like it could be really bad for compile-time
> > > performance.
> > >
> >
> >
> >
> > >
> > >
> > >
> > > The performance experiment for spec shows pretty good result on
> > > aarch64
> > > for this fix, and the 253_perlbmk can even have >5% performance
> > > improvement.
> > >
> > > + // we need to compute it's value again.
> > >
> > > Typo, "it's" -> "its".
> > >
> > > + // check BB2 again, and at this momement BB2 has Overdefined
> > > value
> > > for %v in
> > >
> > > Typo, "momement" -> "moment".
> > >
> > > + // BB2. So we should have to follow data flow propagation
> > > algirithm
> > > to get the
> > >
> > > Typo, "algirithm" -> "algorithm".
> > >
> > >
> > >
> > > So for so many typos. Attached is the updated version.
> > >
> > >
> > >
> > > BTW, there are some unreviewed patches for lazy value info here:
> > > http://lists.cs.uiuc.edu/ pipermail/llvm-commits/Week-
> > > of-Mon-20140127/203418.html . The problem description sounds like
> > > it
> > > could be related, it's not obviously not the same problem to me.
> > > Did
> > > his patch come with tests? Does your patch fix those tests too?
> > >
> > >
> > > I tried Olivier's test case in the 4th patch, and my patch can't
> > > really solve his issue, I think.
> > >
> > >
> > > I think Olivier was pursuing an even complete fix for precision
> > > purpose. As Nuno pointed out LVI is not designed as recursive on
> > > purpose, so there must be some precision lost. For example, my
> > > test
> > > case can be even extended with more incoming edges for BB2 to
> > > influence the value, but the algorithm will stop as soon as it
> > > finds
> > > the value is neither undefined nor overdefined, so we will not
> > > have
> > > chance to narrow down the value further.
> > >
> > >
> > > Thanks,
> > > -Jiangning
> > >
> > >
> > >
> > >
> > >
> > > Nick
> > >
> > >
> > >
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > >
> >
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> >
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-commits
mailing list