[PATCH] Fix lazy value info computation to improve jump threading fora common case

Fri Aug 1 06:39:46 PDT 2014

Alright; thanks for the explanation.
The way LVI works is slightly confusing, so I wasn't able to verify that 
myself without reading large chunks of the code again (which I did not).

Nuno

----- Original Message -----
> Nuno,
>
> I could be wrong ;) -- but, I'm pretty sure that this patch does not 
> change the number of visits made to any block. It only changes whether, on 
> subsequent visits to a block (that already happen), we redo the range 
> computation instead of taking the "early exit" path. But this just adds an 
> additional constant-bounded workload per visit.
>
> Also, I've had Jiangning look for compile-time impact (on the test suite, 
> spec, etc.) and none has been observed. We'll keep an eye on the LNT bots.
>
> -Hal
>
> ----- Original Message -----
>> From: "Nuno Lopes" <nunoplopes at sapo.pt>
>> To: "Jiangning Liu" <liujiangning1 at gmail.com>, llvm-commits at cs.uiuc.edu
>> Sent: Friday, August 1, 2014 7:05:52 AM
>> Subject: Re: [PATCH] Fix lazy value info computation to improve jump 
>> threading fora common case
>>
>> I'm concerned about loops.
>> Are you sure the algorithm will terminate when you ask for the range
>> of a
>> recursive phi value?
>> Something like this:
>>
>> entry:
>> br label %bb1
>>
>> bb1:
>> %a = phi [%a %bb1, %b %entry]
>> br i1 %cond, label %bb1, %bb2
>>
>> bb2:
>>   <get range of %a>
>>
>>
>> Apart of that, for compile-time reasons, you'll probably have to
>> limit the
>> maximum path length.  Otherwise you risk crawling thousands of BBs.
>>
>> Nuno
>>
>> ----- Original Message -----
>> > Hi,
>> >
>> > Attached patch is to fix an issue in lazy value info computation,
>> > and
>> > finally it can improve jump threading for a common case.
>> >
>> > Lazy value info computation algorithm tries to use lattice to
>> > propagate
>> > the
>> > given value through control flow. The problem of the algorithm is
>> > the
>> > basic
>> > block can only be visited once. Once the lattice state is changed
>> > from
>> > "undefined" to other states, the lattice state will be cached, and
>> > won't
>> > be
>> > able to be changed any longer.
>> >
>> > For the following simple control flow graph like below,
>> >
>> > BB1->BB2, BB1->BB3, BB2->BB3, BB2->BB4
>> >
>> > When computing a given value on edge BB2->BB3, if B2 doesn't have
>> > that
>> > value info at all, the algorithm will try to ask for information
>> > from
>> > BB2's
>> > predecessors, and then return to BB2 again to merge the info from
>> > predecessors, so BB2 will be visited twice. Unfortunately, at first
>> > visit
>> > of BB2, the lattice state will be changed from "undefined" to
>> > "overdefined", and then it will not be able to be changed any
>> > longer.
>> >
>> > This patch is to simply check "overdefined", and make sure the
>> > algorithm
>> > can still move on to propagate the values from predecessor edges to
>> > the
>> > block itself.
>> >
>> > The performance experiment for spec shows pretty good result on
>> > aarch64
>> > for
>> > this fix, and the 253_perlbmk can even have >5% performance
>> > improvement.
>> >
>> > Thanks,
>> > -Jiangning