[LLVMdev] [GSoC] Applying for GSoC 2015

George Burgess IV george.burgess.iv at gmail.com
Sun Mar 15 11:58:52 PDT 2015


CFLAA already has some basic interprocedural analysis built in (see:
tryInterproceduralAnalysis; it basically boils down to CFLAA grabbing the
StratifiedSets for each function call and unifying sets based off of that
instead of unifying everything ever). The only real changes I had in mind
for it were:

- Adding context sensitivity (which kind of requires adding context
sensitivity to CFLAA first)
- Making it less terrible for {mutually,indirectly,} recursive functions
(While we're building the sets for some function A(), the sets for A()
aren't accessible, so any call to A() from a function called by A() looks
opaque).

If you want to take a stab at making IPA better/adding context sensitivity,
you're more than welcome to do so. I'm happy to work on other things in the
meantime :)

George

On Sun, Mar 15, 2015 at 8:50 AM, Mingxing Zhang <james0zan at gmail.com> wrote:

> Hello Daniel,
>
> Thank you for your comments and sorry for my mistakes, I'll revise them.
> And I'll for sure read the paper you mentioned and survey the recent
> researches before deciding the implementation technique.
>
> To George:
> May I know the exact plan of your attempt for making cfl-aa
> interprocedural?
> I do think that this is the most valuable part of my proposal, but that
> makes no sense to do it twice.
>
> Maybe I can work on the porting of the flow-sensitive method proposed by
> Prof. Ben Hardekopf at CGO11
> <http://www.cs.ucsb.edu/~benh/research/papers/hardekopf11flow.pdf>.
> It is declared in his homepage that the published source code "is written
> for a pre-release version of LLVM 2.5 and does not work in current versions
> of LLVM"
>
> Thanks!
>
>
>
> On 15 March 2015 at 08:31, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>> A few notes:
>> 1. "But these standard LLVM AA passes either take a large amount of time
>> (Anderson Analysis at cubic time and large memory requirements)"
>>
>> Neither of these is correct. Andersen's is not cubic in practice, or
>> large memory in practice, when implemented properly.  GCC uses it by
>> default as the points-to implementation, and it's never even close to the
>> top of the profile.
>>
>> It takes about 1 second to do a million lines of code.
>> And one can do better (gcc's impl is no longer state of the art).
>>
>> 2. The approach to field sensitivity you mention is not going to work
>> very well, given how much casting occurs (everything is casted). I would
>> suggest using the approach in http://dl.acm.org/citation.cfm?id=996835
>>
>> 3. George, cc'd, is planning on implementing both context sensitive and
>> context-insensitive interprocedural analysis in cfl-aa the next month or
>> two.
>>
>> 4. Using a BDD cloning approach for CFL-AA doesn't make much sense, the
>> whole point of CFL is not having to generate explicit points-to sets if you
>> don't want to. Plus, followup papers and researchers have *never* been able
>> to reproduce the scalability of Whaley's work.
>>
>> Not to mention it's done on Java. Java is a language where doing things
>> like field-sensitivity always increase precision, which is not true for C.
>>
>> If you really want to attempt this, I would suggest using one of the
>> demand driven context-sensitive approaches that will be easy to fit in CFL.
>>
>> On Sat, Mar 14, 2015 at 5:57 AM Mingxing Zhang <james0zan at gmail.com>
>> wrote:
>>
>>> Hello John,
>>>
>>> I've finished the first version of my proposal on enhancing alias
>>> analysis.
>>> The proposal can be downloaded at
>>> http://james0zan.github.io/resource/GSoC15-Proposal-AA.pdf.
>>> I hope I've successfully justified the necessity and benefits of this
>>> project.
>>> If possible, please find some time to review it and give me some more
>>> feedbacks.
>>>
>>> Thank you very much!
>>>
>>> P.S. I'm working on the other proposal, a couple of days is needed.
>>>
>>>
>>>
>>> On 8 March 2015 at 21:42, Mingxing Zhang <james0zan at gmail.com> wrote:
>>>
>>>> Got it.
>>>> I'll try to find the applications for field-sensitivity (and
>>>> interprocedural, etc) AA analysis.
>>>> And I'll do some preliminary evaluation on the tracing/slicing part for
>>>> the bloat detection.
>>>>
>>>> Thanks!
>>>>
>>>> On 8 March 2015 at 21:34, John Criswell <jtcriswel at gmail.com> wrote:
>>>>
>>>>>  On 3/8/15 8:56 AM, Mingxing Zhang wrote:
>>>>>
>>>>>  Hello John,
>>>>>
>>>>> According to the FAQ, I can submit two proposals although at most one
>>>>> of them can be accepted.
>>>>> Thus I will prepare a proposal for each of the two projects.
>>>>>
>>>>>
>>>>> Correct.  Only one proposal will be accepted.
>>>>>
>>>>>
>>>>>  And, after reading the code of cfl-aa and several related papers,
>>>>> I've listed four milestones for the AA project:
>>>>>
>>>>> 1) In order to use the fast algorithm described in PLDI'13 [1], cfl-aa
>>>>> makes a simplification on the CFL defined in POPL'08 [2], which will lead
>>>>> to a reduction on precision (I've confirmed this observation with the
>>>>> author).
>>>>> Thus a quantitative measurement on how much is the reduction is needed.
>>>>>
>>>>> 2) In cfl-aa, different fields of a same struct and the whole array
>>>>> are represented by a single node.
>>>>> This is the reason of the problem 2, 4 listed in
>>>>> http://llvm.org/OpenProjects.html.
>>>>> We should split these large nodes.
>>>>>
>>>>>
>>>>> I think the real question is whether the loss of precision matters,
>>>>> and if so, to which uses of alias analysis.  SAFECode, for example, wants
>>>>> field information to determine type safety (so that it can optimize away
>>>>> type-safe loads and stores), so field sensitivity matters.  Perhaps field
>>>>> sensitivity doesn't matter for other applications (e.g., optimization).
>>>>> There's no point in improving precision if it doesn't help the analyses
>>>>> that people care about most.
>>>>>
>>>>> As part of your project, I think you should state the uses of alias
>>>>> analysis/points-to analysis that you're aiming to improve and understand
>>>>> whether your proposed improvements will help that use.  I would also
>>>>> recommend picking a use that matters to a significant portion of the LLVM
>>>>> community.
>>>>>
>>>>>
>>>>> 3) Handling special global variables, such as errno.
>>>>>
>>>>> 4) It seems that the current version of cfl-aa is an intraprocedural
>>>>> analysis.
>>>>> If the time is enough, I think we may extend it to an interprocedural
>>>>> analysis.
>>>>> The algorithm described in [3] can be applied to scaling it.
>>>>>
>>>>> As for the bloat-detection project, the final result should be a tool
>>>>> that is verified by known bugs and a set of newly detected bugs.
>>>>>
>>>>>
>>>>> For the bloat detection tool, I would like to be convinced that
>>>>> dynamic tracing will be, or can be, sufficiently efficient to be
>>>>> practical.  I hate to ask, but I think you need to run an experiment with
>>>>> Giri to show that dynamic slicing is going to be practical for the
>>>>> executions that you expect to analyze.  Either that, or you need to explain
>>>>> how you can use something more efficient than dynamic slicing (note that
>>>>> dynamic slicing and dynamic tracing are not the same, so be sure you're
>>>>> correctly stating which one you need).
>>>>>
>>>>>
>>>>> Do you have any suggestions on these objectives?
>>>>>
>>>>>
>>>>> In your proposal, be sure to include a set of milestones and how long
>>>>> you think you will need to achieve those milestones.  I may have said that
>>>>> before, but it's worth repeating.
>>>>>
>>>>> Regards,
>>>>>
>>>>> John Criswell
>>>>>
>>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>  [1] Fast Algorithms for Dyck-CFL-Reachability with Applications to
>>>>> Alias Analysis. PLDI'13
>>>>> [2] Demand-Driven Alias Analysis for C. POPL'08
>>>>> [3] Demand-Driven Context-Sensitive Alias Analysis for Java. ISSTA'11
>>>>>
>>>>>
>>>>> On 5 March 2015 at 09:58, Mingxing Zhang <james0zan at gmail.com> wrote:
>>>>>
>>>>>>  Wow, that is cool!
>>>>>>  I'll check about it.
>>>>>>
>>>>>>  Thank you!
>>>>>>
>>>>>> On 4 March 2015 at 21:57, John Criswell <jtcriswel at gmail.com> wrote:
>>>>>>
>>>>>>>  On 3/4/15 2:18 AM, Mingxing Zhang wrote:
>>>>>>>
>>>>>>> Hello John,
>>>>>>>
>>>>>>> Thank you for your advices and congratulations~
>>>>>>>
>>>>>>> I'll read the code of cfl-aa and Giri first and make the decision of
>>>>>>> which project to pursue.
>>>>>>> The choice will be reported to this thread once I made the
>>>>>>> determination (hopefully within this week).
>>>>>>>
>>>>>>>
>>>>>>>  You should check for yourself, but I don't think anything prevents
>>>>>>> you from submitting two proposals.  If you have time to write two strong
>>>>>>> proposals, I see no problem with that.
>>>>>>>
>>>>>>> Just make sure that any proposal you write is strong: it provides a
>>>>>>> concrete explanation of what you want to do, some justification for why it
>>>>>>> would benefit the community (short or long term), and why you're the person
>>>>>>> qualified to do it.  Proposals should also include a set of milestones and
>>>>>>> expected dates for completing those milestones.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> John Criswell
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> On 3 March 2015 at 23:12, John Criswell <jtcriswel at gmail.com> wrote:
>>>>>>>
>>>>>>>>  Dear Mingxing,
>>>>>>>>
>>>>>>>> I think both projects are interesting and useful.
>>>>>>>>
>>>>>>>> Points-to analysis is something that is needed by research users of
>>>>>>>> LLVM, but to the best of my knowledge, no solid implementation currently
>>>>>>>> exists (although the cfl-aa work being done at Google may provide us with
>>>>>>>> something; you should check into it before writing a proposal).  My
>>>>>>>> interest is in a points-to analysis that is robust and is useful to both
>>>>>>>> research and industry users of LLVM.  A points-to analysis proposal must
>>>>>>>> indicate how it will help both of these subsets of the LLVM community, and
>>>>>>>> it must argue why current efforts do not meet the requirements of both
>>>>>>>> subsets of the community.
>>>>>>>>
>>>>>>>> The runtime bloat tool also looks interesting, and your approach
>>>>>>>> (at least to me) is interesting.  One question in my mind, though, is
>>>>>>>> whether dynamic slicing is going to work well.  Swarup Sahoo and I built a
>>>>>>>> dynamic slicer for LLVM named Giri, and we found the tracing required for
>>>>>>>> dynamic slicing to be slow.  For our purposes, the overhead was okay as we
>>>>>>>> only needed to record execution until a crash (which happened quickly).  In
>>>>>>>> your bloat tool, the program will probably run for awhile, creating a long
>>>>>>>> trace record.  You should take a look at the Giri code, use it to trace
>>>>>>>> some programs, and see if the overheads are going to be tolerable.  If they
>>>>>>>> are not, then your first task would be to optimize Giri for your bloat tool.
>>>>>>>>
>>>>>>>> You should also be more specific about which LLVM instructions will
>>>>>>>> be traced.  For example, I wouldn't record the outputs of every LLVM
>>>>>>>> instruction; I might only record the outputs of loads and stores or the end
>>>>>>>> of a def-use chain.
>>>>>>>>
>>>>>>>> I'd be interested in mentoring either project.
>>>>>>>>
>>>>>>>> BTW, it looks like your FSE paper won an award.  Congrats.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> John Criswell
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/3/15 2:30 AM, Mingxing Zhang wrote:
>>>>>>>>
>>>>>>>>  Hi all,
>>>>>>>>
>>>>>>>> As a Ph.D. student majored in Software Reliability, I have used
>>>>>>>> LLVM in many of my projects, such as the Anticipating Invariant (
>>>>>>>> http://james0zan.github.io/AI.html) and some other undergoing ones.
>>>>>>>> Thus, it would be a great pleasure for me if I could take this
>>>>>>>> opportunity to contribute to this awesome project.
>>>>>>>>
>>>>>>>> After reading the idea list (http://llvm.org/OpenProjects.html), I
>>>>>>>> was most interested in the idea of improving the "Pointer and Alias
>>>>>>>> Analysis" passes.
>>>>>>>> Could you please give me some more tips or advices on how to get
>>>>>>>> started on working on the application?
>>>>>>>>
>>>>>>>> Simultaneously, I also have another idea about using LLVM to detect
>>>>>>>> runtime bloat, just like the ThreadSanitizer tool for data races.
>>>>>>>> If there is anyone here who would like to mentor this project,
>>>>>>>> could you please find some time to review the more detailed
>>>>>>>> proposal on gist
>>>>>>>> <https://gist.github.com/james0zan/d03737c60b10d0d11d34> and give
>>>>>>>> me some feedbacks?
>>>>>>>>
>>>>>>>> P.S.
>>>>>>>>   I do prefer the bloat detection tool, but I'm not sure about
>>>>>>>> whether it is suitable for GSoC.
>>>>>>>>   Thus I will apply for the Alias Analysis one if it is not
>>>>>>>> suitable.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> --
>>>>>>>>   Mingxing Zhang
>>>>>>>>
>>>>>>>>  Tel.: +86-10-62797143
>>>>>>>>  Web: http://james0zan.github.io/
>>>>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
>>>>>>>> 100084, China
>>>>>>>>
>>>>>>>>
>>>>>>>>  _______________________________________________
>>>>>>>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>>
>>>>>>>>  /
>>>>>>>>
>>>>>>>> --
>>>>>>>> John Criswell
>>>>>>>> Assistant Professor
>>>>>>>> Department of Computer Science, University of Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>   Mingxing Zhang
>>>>>>>
>>>>>>>  Tel.: +86-10-62797143
>>>>>>>  Web: http://james0zan.github.io/
>>>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
>>>>>>> 100084, China
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> John Criswell
>>>>>>> Assistant Professor
>>>>>>> Department of Computer Science, University of Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>   Mingxing Zhang
>>>>>>
>>>>>>  Tel.: +86-10-62797143
>>>>>>  Web: http://james0zan.github.io/
>>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
>>>>>> 100084, China
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>   Mingxing Zhang
>>>>>
>>>>>  Tel.: +86-10-62797143
>>>>>  Web: http://james0zan.github.io/
>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084,
>>>>> China
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> John Criswell
>>>>> Assistant Professor
>>>>> Department of Computer Science, University of Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Mingxing Zhang
>>>>
>>>> Tel.: +86-10-62797143
>>>> Web: http://james0zan.github.io/
>>>> Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084,
>>>> China
>>>>
>>>
>>>
>>>
>>> --
>>> Mingxing Zhang
>>>
>>> Tel.: +86-10-62797143
>>> Web: http://james0zan.github.io/
>>> Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084,
>>> China
>>>  _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>
>
> --
> Mingxing Zhang
>
> Tel.: +86-10-62797143
> Web: http://james0zan.github.io/
> Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150315/8b880e23/attachment.html>


More information about the llvm-dev mailing list