[LLVMdev] Contribute a new precise pointer analysis to LLVM

lian li lianli at gmail.com
Sat Oct 19 04:13:37 PDT 2013


On Sat, Oct 19, 2013 at 1:41 AM, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>
>
> On Fri, Oct 18, 2013 at 7:27 AM, lian li <lianli at gmail.com> wrote:
>>
>> Hi Daniel,
>>
>> I want to clarify that our analysis is not based on CFL-reachability.
>> We apply CFL-reachability to matching context information where the
>> exist from a function to a call-site must match
>> the entry from the corresponding call-site.
>
>
> Yes, sorry, I pulled the wrong quote, it was late.
>
>>
>> The problem is a simple
>> balanced parentheses problem in CFL-reachability, and it can be
>> computed
>> efficiently.
>
>
> Yes, it can, however, it was not clear from the paper that you were :)
> You cite a fairly old paper next to that that essentially gives a >O(1)
> query time bound for this, but your
> problem exactly fits into the bidirectional graph problem, unless i'm
> missing something.
>
We use the traditional dynamic programming approach to computing value
flows with matching call sites, similar to the implementation of an
IFDS framwork.
The worst case complexity sounds high, but it is not expensive in
practice with effective memoisation techniques.
Actually in our approach, context matching is not the most expensive part.
The new CFL-reachability algorithm may be able to further improve
performance, but we haven't tried that yet.

>>
>>
>> The paper you mentioned is a very nice paper that proposed an
>> efficient algorithm to solve Dyck-CFL-reachability problem in
>> bidirectional graphs.
>
>
> Right, and your  matching call sites can be formulated as one.
>
>> However, precise pointer analysis cannot be
>> formulated as a Dyck-CFL-reachability problem.
>
>
> FWIW: This remains to be seen, and certainly also depends on your
> definitions of "precise".
>  I agree you cannot currently formulate precise andersen style field
> sensitive pointer analysis as this.
> As for whether it is impossible to get within a very very small percentage,
> ...

The analysis presented in the paper approximates on pointer
dereference, which is the tricky and complex part in pointer analysis.
It reminds me of the one-level approach proposed by Manuvir Das in
2000 ("Unification-based Pointer Analysis with Directional
Assignments"), which applies similar approximation to pointer
dereferences.
How to formulate Andersen Style analysis as Dyck-CFL-reachability
remains open, although you can argue that it is not important for
pointer analysis to solve pointer dereferences precisely.

Regards,
Lian

>
>> The alias analysis
>> presented in that paper is an over approximation. Hence the time bound
>> from our paper still applies.
>>
>> Regards,
>> Lian
>>
>>
>>
>>
>>
>> On Fri, Oct 18, 2013 at 4:39 PM, Daniel Berlin <dberlin at dberlin.org>
>> wrote:
>> > I notice you guys formulate your CFL reachability problem as a
>> > balanced parentheses problem.
>> > What algorithm do you use to solve it?
>> >
>> > Are you aware of recent work that comes up with linear time and n log
>> > n time algorithms to solve this class of problems:
>> > http://www.cse.cuhk.edu.hk/lyu/_media/paper/pldi2013.pdf
>> >
>> > In particular, the time bound from the paper:
>> >
>> > "However, if we need the precise pointer information for all
>> > variables, CFL-based
>> > pointer analysis is generally more expensive as it has O(L^3 N^3)
>> > complexity [29], where L is the size of the grammar and N is the
>> > size of the graph."
>> >
>> > should now been reduced to  O(n + m log m) time, assuming you are
>> > using a standard balanced parentheses formulation.
>> >
>> >
>> >
>> >
>> > On Thu, Oct 17, 2013 at 9:51 PM, lian li <lianli at gmail.com> wrote:
>> >> Hi Hal,
>> >>
>> >> Thanks for your interest.
>> >>
>> >> We tested with the following existing compiler optimizations in LLVM
>> >> with SPECINT2006 benchmarks:
>> >> -dse (dead store elimination),
>> >> -gvn (global value numbering),
>> >> -licm (loop invariant code motion),
>> >> -bb-vectorize (basic block vectorization),
>> >> -memcpyopt (memcpy optimization),
>> >> -sink (code sinking),
>> >> -loop-idom (recognize loop idioms),
>> >> -argpromotion (argument promotion).
>> >>
>> >> As far as I know, this list includes all optmizations in LLVM that
>> >> requires alias information. With our pointer analysis, for the three
>> >> optimizations "-dse, -gvn, -licm", we have observed significant
>> >> improvements in terms of the number of optimizations being conducted.
>> >> Take licm as an example, with our analysis, the number of instructions
>> >> being hoisted out of loops more than doubled for some benchmarks. We
>> >> have not seen big changes in terms of runtime performance though.
>> >>
>> >> We measure compile time by first linking all modules of the same
>> >> benchmark together, then running the analysis against the linked
>> >> module. It finishes in less than 1 minute over applications such as
>> >> httpd and sendmail, both with more than 100,000 LOC. For some
>> >> benchmarks, it is time-consuming: the analysis time for GCC is close
>> >> to 45 minutes.  But if you run the analysis over individual files (as
>> >> in your normal compilation process), there is no observable overhead
>> >> in terms of compile time.
>> >>
>> >> Regards,
>> >> Lian
>> >>
>> >>
>> >>
>> >> On Fri, Oct 18, 2013 at 12:15 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> >>> Hi Lian,
>> >>>
>> >>> I am certainly interested in seeing this; do you have performance
>> >>> numbers (compile time)? Also, can you share more information about the
>> >>> promising optimization results you mentioned?
>> >>>
>> >>> Thanks,
>> >>> Hal
>> >>>
>> >>> ----- Original Message -----
>> >>>> Hi All,
>> >>>>
>> >>>> This is Lian Li  from Oracle Labs in Brisbane Australia.
>> >>>>
>> >>>> We have developed a precise and highly efficient pointer analysis
>> >>>> framework on top of LLVM,  The approach is flow, context, and field
>> >>>> sensitive, details are described in the two papers below:
>> >>>>
>> >>>> "Boosting the performance of flow-sensitive points-to analysis using
>> >>>> value flow" (in ESEC-FSE 2011), and
>> >>>> "Precise and scalable context-sensitive pointer analysis via value
>> >>>> flow graph" (in ISMM 2013).
>> >>>>
>> >>>> The analysis was initially developed for the purpose of bug checking,
>> >>>> and is now extended to support compiler optimizations. We have tested
>> >>>> it with existing compiler optimizations in LLVM, and have seen
>> >>>> promising results.
>> >>>>
>> >>>> We are now considering to make this analysis available to the LLVM
>> >>>> community, and contribute resources for future maintenance needs,
>> >>>> provided that there is enough interest. We think that a new precise
>> >>>> pointer analysis in LLVM can enable more new optimization and
>> >>>> analysis
>> >>>> techniques to be developed in LLVM.
>> >>>>
>> >>>> Any people interested in seeing a new precise pointer analysis in
>> >>>> LLVM?
>> >>>>
>> >>>> Regards,
>> >>>> Lian Li
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> _______________________________________________
>> >>>> LLVM Developers mailing list
>> >>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>>
>> >>>
>> >>> --
>> >>> Hal Finkel
>> >>> Assistant Computational Scientist
>> >>> Leadership Computing Facility
>> >>> Argonne National Laboratory
>> >>
>> >>
>> >>
>> >> --
>> >> // Copyright @ 2011 Authorized by ** LIAN LI **
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>> --
>> // Copyright @ 2011 Authorized by ** LIAN LI **
>
>



-- 
// Copyright @ 2011 Authorized by ** LIAN LI **



More information about the llvm-dev mailing list