[LLVMdev] [GSoC] Applying for GSoC 2015

Mingxing Zhang james0zan at gmail.com
Fri Mar 27 08:55:10 PDT 2015


Hello John,

In fact I'm rushing toward a paper submission in recent days.
Thus I'm very sorry that I do have enough time to re-write a detailed
timeline.
However, I've revised the "Preliminary Results" section and add some more
information about the current prototype
(updated both on google-melange and the url
http://james0zan.github.io/resource/GSoC15-Proposal-BloatDetection.pdf ).
I hope it will address your main concerns.

Thank you very much!



On 27 March 2015 at 03:03, John Criswell <jtcriswel at gmail.com> wrote:

>  Dear Mingxing,
>
> Sorry for the late reply.  I've been gallivanting about Europe giving
> talks and attending a conference. :)
>
> Attached is feedback on your performance diagnosis proposal.  I think the
> proposal has two flaws which, if possible, you should rectify before the
> deadline tomorrow:
>
> 1) It is not clear what the current prototype can do and how well it
> works.  Is it implemented only in PIN, or also in LLVM?  If in LLVM, how
> far along is that prototype, and how well does it perform?  What is missing
> from it that you want to implement for GSoC?
>
> 2) Your reasoning for why dynamic slicing is going to work is flawed.  You
> assume that Giri is slow because it instruments a lot of instructions.
> That is incorrect.  Giri is slow because it generates so much data during
> program execution that this data cannot be kept in memory and must
> therefore be flushed to persistent storage (which was magnetic disk at the
> time we wrote it).  If instrumentation just needs to update data
> structures, you get 2x-4x slowdown and life is slow but livable.  If you're
> streaming tons of data to disk about every load, store, and branch, that's
> another thing entirely.
>
> I realize that the time to the deadline is short, but if possible, please
> improve your proposal before the deadline.  Despite the above issues, I
> find your proposal interesting and would like to see it have the best
> chance possible of being accepted for GSoC.
>
> Regards,
>
> John Criswell
>
>
>
> On 3/16/15 11:55 AM, Mingxing Zhang wrote:
>
>  Thank you very much for all your advices!
> I'll revise the proposal according to them.
>
> To George,
>
> As mentioned in the former emails of this thread, I intend to prepare two
> proposals for the AA project listed in the idea list and the bloat
> detection project proposed by myself respectively and at most one of them
> will be accepted by GSoC.
> Personally, I do prefer the second project since I'm more familiar with
> that field and the technique (static instrumentation) it uses.
>
> According to the timeline of GSoC
> <https://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2015/help_page>,
> the accepted proposals will only be announced at 27 April. Is it too late
> since I do not know which project will be selected or even none of them
> will be accepted until then?
> Otherwise, will the mentors know the decision earlier (e.g., at 15 April
> after the slot is allocated to organizations)?
>
>
> To John,
>
> The proposal for bloat detection is also available now at
> http://james0zan.github.io/resource/GSoC15-Proposal-BloatDetection.pdf
> (not completely finished yet).
> Some preliminary evaluation results on overhead and detecting ability
> based on a simple prototype are given.
> (Actually I came up with this idea during my visitation in Columbia U and
> the prototype is also implemented at those days, but the project is paused
> until recent days due to my internship in Google and some other works.)
>
> P.S. The tex template is downloaded at
> http://www.latextemplates.com/template/large-colored-title-article.
>
>  Once again, thank you for your time!
>
>
> On 16 March 2015 at 02:58, George Burgess IV <george.burgess.iv at gmail.com>
> wrote:
>
>> CFLAA already has some basic interprocedural analysis built in (see:
>> tryInterproceduralAnalysis; it basically boils down to CFLAA grabbing the
>> StratifiedSets for each function call and unifying sets based off of that
>> instead of unifying everything ever). The only real changes I had in mind
>> for it were:
>>
>> - Adding context sensitivity (which kind of requires adding context
>> sensitivity to CFLAA first)
>> - Making it less terrible for {mutually,indirectly,} recursive functions
>> (While we're building the sets for some function A(), the sets for A()
>> aren't accessible, so any call to A() from a function called by A() looks
>> opaque).
>>
>>  If you want to take a stab at making IPA better/adding context
>> sensitivity, you're more than welcome to do so. I'm happy to work on other
>> things in the meantime :)
>>
>>  George
>>
>> On Sun, Mar 15, 2015 at 8:50 AM, Mingxing Zhang <james0zan at gmail.com>
>> wrote:
>>
>>> Hello Daniel,
>>>
>>> Thank you for your comments and sorry for my mistakes, I'll revise them.
>>> And I'll for sure read the paper you mentioned and survey the recent
>>> researches before deciding the implementation technique.
>>>
>>> To George:
>>> May I know the exact plan of your attempt for making cfl-aa
>>> interprocedural?
>>> I do think that this is the most valuable part of my proposal, but that
>>> makes no sense to do it twice.
>>>
>>> Maybe I can work on the porting of the flow-sensitive method proposed by
>>> Prof. Ben Hardekopf at CGO11
>>> <http://www.cs.ucsb.edu/%7Ebenh/research/papers/hardekopf11flow.pdf>.
>>> It is declared in his homepage that the published source code "is
>>> written for a pre-release version of LLVM 2.5 and does not work in current
>>> versions of LLVM"
>>>
>>> Thanks!
>>>
>>>
>>>
>>> On 15 March 2015 at 08:31, Daniel Berlin <dberlin at dberlin.org> wrote:
>>>
>>>> A few notes:
>>>> 1. "But these standard LLVM AA passes either take a large amount of
>>>> time (Anderson Analysis at cubic time and large memory requirements)"
>>>>
>>>>  Neither of these is correct. Andersen's is not cubic in practice, or
>>>> large memory in practice, when implemented properly.  GCC uses it by
>>>> default as the points-to implementation, and it's never even close to the
>>>> top of the profile.
>>>>
>>>>  It takes about 1 second to do a million lines of code.
>>>> And one can do better (gcc's impl is no longer state of the art).
>>>>
>>>>  2. The approach to field sensitivity you mention is not going to work
>>>> very well, given how much casting occurs (everything is casted). I would
>>>> suggest using the approach in http://dl.acm.org/citation.cfm?id=996835
>>>>
>>>>  3. George, cc'd, is planning on implementing both context sensitive
>>>> and context-insensitive interprocedural analysis in cfl-aa the next month
>>>> or two.
>>>>
>>>>  4. Using a BDD cloning approach for CFL-AA doesn't make much sense,
>>>> the whole point of CFL is not having to generate explicit points-to sets if
>>>> you don't want to. Plus, followup papers and researchers have *never* been
>>>> able to reproduce the scalability of Whaley's work.
>>>>
>>>>  Not to mention it's done on Java. Java is a language where doing
>>>> things like field-sensitivity always increase precision, which is not true
>>>> for C.
>>>>
>>>>  If you really want to attempt this, I would suggest using one of the
>>>> demand driven context-sensitive approaches that will be easy to fit in CFL.
>>>>
>>>>  On Sat, Mar 14, 2015 at 5:57 AM Mingxing Zhang <james0zan at gmail.com>
>>>> wrote:
>>>>
>>>>> Hello John,
>>>>>
>>>>> I've finished the first version of my proposal on enhancing alias
>>>>> analysis.
>>>>> The proposal can be downloaded at
>>>>> http://james0zan.github.io/resource/GSoC15-Proposal-AA.pdf.
>>>>> I hope I've successfully justified the necessity and benefits of this
>>>>> project.
>>>>> If possible, please find some time to review it and give me some more
>>>>> feedbacks.
>>>>>
>>>>> Thank you very much!
>>>>>
>>>>> P.S. I'm working on the other proposal, a couple of days is needed.
>>>>>
>>>>>
>>>>>
>>>>> On 8 March 2015 at 21:42, Mingxing Zhang <james0zan at gmail.com> wrote:
>>>>>
>>>>>>   Got it.
>>>>>>  I'll try to find the applications for field-sensitivity (and
>>>>>> interprocedural, etc) AA analysis.
>>>>>>  And I'll do some preliminary evaluation on the tracing/slicing part
>>>>>> for the bloat detection.
>>>>>>
>>>>>>  Thanks!
>>>>>>
>>>>>> On 8 March 2015 at 21:34, John Criswell <jtcriswel at gmail.com> wrote:
>>>>>>
>>>>>>>  On 3/8/15 8:56 AM, Mingxing Zhang wrote:
>>>>>>>
>>>>>>>  Hello John,
>>>>>>>
>>>>>>>  According to the FAQ, I can submit two proposals although at most
>>>>>>> one of them can be accepted.
>>>>>>> Thus I will prepare a proposal for each of the two projects.
>>>>>>>
>>>>>>>
>>>>>>> Correct.  Only one proposal will be accepted.
>>>>>>>
>>>>>>>
>>>>>>>  And, after reading the code of cfl-aa and several related papers,
>>>>>>> I've listed four milestones for the AA project:
>>>>>>>
>>>>>>> 1) In order to use the fast algorithm described in PLDI'13 [1],
>>>>>>> cfl-aa makes a simplification on the CFL defined in POPL'08 [2], which will
>>>>>>> lead to a reduction on precision (I've confirmed this observation with the
>>>>>>> author).
>>>>>>> Thus a quantitative measurement on how much is the reduction is
>>>>>>> needed.
>>>>>>>
>>>>>>> 2) In cfl-aa, different fields of a same struct and the whole array
>>>>>>> are represented by a single node.
>>>>>>> This is the reason of the problem 2, 4 listed in
>>>>>>> http://llvm.org/OpenProjects.html.
>>>>>>> We should split these large nodes.
>>>>>>>
>>>>>>>
>>>>>>>  I think the real question is whether the loss of precision matters,
>>>>>>> and if so, to which uses of alias analysis.  SAFECode, for example, wants
>>>>>>> field information to determine type safety (so that it can optimize away
>>>>>>> type-safe loads and stores), so field sensitivity matters.  Perhaps field
>>>>>>> sensitivity doesn't matter for other applications (e.g., optimization).
>>>>>>> There's no point in improving precision if it doesn't help the analyses
>>>>>>> that people care about most.
>>>>>>>
>>>>>>> As part of your project, I think you should state the uses of alias
>>>>>>> analysis/points-to analysis that you're aiming to improve and understand
>>>>>>> whether your proposed improvements will help that use.  I would also
>>>>>>> recommend picking a use that matters to a significant portion of the LLVM
>>>>>>> community.
>>>>>>>
>>>>>>>
>>>>>>> 3) Handling special global variables, such as errno.
>>>>>>>
>>>>>>> 4) It seems that the current version of cfl-aa is an intraprocedural
>>>>>>> analysis.
>>>>>>> If the time is enough, I think we may extend it to an
>>>>>>> interprocedural analysis.
>>>>>>> The algorithm described in [3] can be applied to scaling it.
>>>>>>>
>>>>>>> As for the bloat-detection project, the final result should be a
>>>>>>> tool that is verified by known bugs and a set of newly detected bugs.
>>>>>>>
>>>>>>>
>>>>>>>  For the bloat detection tool, I would like to be convinced that
>>>>>>> dynamic tracing will be, or can be, sufficiently efficient to be
>>>>>>> practical.  I hate to ask, but I think you need to run an experiment with
>>>>>>> Giri to show that dynamic slicing is going to be practical for the
>>>>>>> executions that you expect to analyze.  Either that, or you need to explain
>>>>>>> how you can use something more efficient than dynamic slicing (note that
>>>>>>> dynamic slicing and dynamic tracing are not the same, so be sure you're
>>>>>>> correctly stating which one you need).
>>>>>>>
>>>>>>>
>>>>>>> Do you have any suggestions on these objectives?
>>>>>>>
>>>>>>>
>>>>>>>  In your proposal, be sure to include a set of milestones and how
>>>>>>> long you think you will need to achieve those milestones.  I may have said
>>>>>>> that before, but it's worth repeating.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> John Criswell
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>>  [1] Fast Algorithms for Dyck-CFL-Reachability with Applications to
>>>>>>> Alias Analysis. PLDI'13
>>>>>>> [2] Demand-Driven Alias Analysis for C. POPL'08
>>>>>>> [3] Demand-Driven Context-Sensitive Alias Analysis for Java. ISSTA'11
>>>>>>>
>>>>>>>
>>>>>>> On 5 March 2015 at 09:58, Mingxing Zhang <james0zan at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>  Wow, that is cool!
>>>>>>>>  I'll check about it.
>>>>>>>>
>>>>>>>>  Thank you!
>>>>>>>>
>>>>>>>> On 4 March 2015 at 21:57, John Criswell <jtcriswel at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>  On 3/4/15 2:18 AM, Mingxing Zhang wrote:
>>>>>>>>>
>>>>>>>>> Hello John,
>>>>>>>>>
>>>>>>>>> Thank you for your advices and congratulations~
>>>>>>>>>
>>>>>>>>> I'll read the code of cfl-aa and Giri first and make the decision
>>>>>>>>> of which project to pursue.
>>>>>>>>> The choice will be reported to this thread once I made the
>>>>>>>>> determination (hopefully within this week).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  You should check for yourself, but I don't think anything
>>>>>>>>> prevents you from submitting two proposals.  If you have time to write two
>>>>>>>>> strong proposals, I see no problem with that.
>>>>>>>>>
>>>>>>>>> Just make sure that any proposal you write is strong: it provides
>>>>>>>>> a concrete explanation of what you want to do, some justification for why
>>>>>>>>> it would benefit the community (short or long term), and why you're the
>>>>>>>>> person qualified to do it.  Proposals should also include a set of
>>>>>>>>> milestones and expected dates for completing those milestones.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> John Criswell
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3 March 2015 at 23:12, John Criswell <jtcriswel at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>  Dear Mingxing,
>>>>>>>>>>
>>>>>>>>>> I think both projects are interesting and useful.
>>>>>>>>>>
>>>>>>>>>> Points-to analysis is something that is needed by research users
>>>>>>>>>> of LLVM, but to the best of my knowledge, no solid implementation currently
>>>>>>>>>> exists (although the cfl-aa work being done at Google may provide us with
>>>>>>>>>> something; you should check into it before writing a proposal).  My
>>>>>>>>>> interest is in a points-to analysis that is robust and is useful to both
>>>>>>>>>> research and industry users of LLVM.  A points-to analysis proposal must
>>>>>>>>>> indicate how it will help both of these subsets of the LLVM community, and
>>>>>>>>>> it must argue why current efforts do not meet the requirements of both
>>>>>>>>>> subsets of the community.
>>>>>>>>>>
>>>>>>>>>> The runtime bloat tool also looks interesting, and your approach
>>>>>>>>>> (at least to me) is interesting.  One question in my mind, though, is
>>>>>>>>>> whether dynamic slicing is going to work well.  Swarup Sahoo and I built a
>>>>>>>>>> dynamic slicer for LLVM named Giri, and we found the tracing required for
>>>>>>>>>> dynamic slicing to be slow.  For our purposes, the overhead was okay as we
>>>>>>>>>> only needed to record execution until a crash (which happened quickly).  In
>>>>>>>>>> your bloat tool, the program will probably run for awhile, creating a long
>>>>>>>>>> trace record.  You should take a look at the Giri code, use it to trace
>>>>>>>>>> some programs, and see if the overheads are going to be tolerable.  If they
>>>>>>>>>> are not, then your first task would be to optimize Giri for your bloat tool.
>>>>>>>>>>
>>>>>>>>>> You should also be more specific about which LLVM instructions
>>>>>>>>>> will be traced.  For example, I wouldn't record the outputs of every LLVM
>>>>>>>>>> instruction; I might only record the outputs of loads and stores or the end
>>>>>>>>>> of a def-use chain.
>>>>>>>>>>
>>>>>>>>>> I'd be interested in mentoring either project.
>>>>>>>>>>
>>>>>>>>>> BTW, it looks like your FSE paper won an award.  Congrats.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> John Criswell
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/3/15 2:30 AM, Mingxing Zhang wrote:
>>>>>>>>>>
>>>>>>>>>>  Hi all,
>>>>>>>>>>
>>>>>>>>>> As a Ph.D. student majored in Software Reliability, I have used
>>>>>>>>>> LLVM in many of my projects, such as the Anticipating Invariant (
>>>>>>>>>> http://james0zan.github.io/AI.html) and some other undergoing
>>>>>>>>>> ones.
>>>>>>>>>> Thus, it would be a great pleasure for me if I could take this
>>>>>>>>>> opportunity to contribute to this awesome project.
>>>>>>>>>>
>>>>>>>>>> After reading the idea list (http://llvm.org/OpenProjects.html),
>>>>>>>>>> I was most interested in the idea of improving the "Pointer and Alias
>>>>>>>>>> Analysis" passes.
>>>>>>>>>> Could you please give me some more tips or advices on how to get
>>>>>>>>>> started on working on the application?
>>>>>>>>>>
>>>>>>>>>> Simultaneously, I also have another idea about using LLVM to
>>>>>>>>>> detect runtime bloat, just like the ThreadSanitizer tool for data races.
>>>>>>>>>> If there is anyone here who would like to mentor this project,
>>>>>>>>>> could you please find some time to review the more detailed
>>>>>>>>>> proposal on gist
>>>>>>>>>> <https://gist.github.com/james0zan/d03737c60b10d0d11d34> and
>>>>>>>>>> give me some feedbacks?
>>>>>>>>>>
>>>>>>>>>> P.S.
>>>>>>>>>>   I do prefer the bloat detection tool, but I'm not sure about
>>>>>>>>>> whether it is suitable for GSoC.
>>>>>>>>>>   Thus I will apply for the Alias Analysis one if it is not
>>>>>>>>>> suitable.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>   Mingxing Zhang
>>>>>>>>>>
>>>>>>>>>>  Tel.: +86-10-62797143
>>>>>>>>>>  Web: http://james0zan.github.io/
>>>>>>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
>>>>>>>>>> 100084, China
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  _______________________________________________
>>>>>>>>>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>>>>
>>>>>>>>>>  /
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> John Criswell
>>>>>>>>>> Assistant Professor
>>>>>>>>>> Department of Computer Science, University of Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>   Mingxing Zhang
>>>>>>>>>
>>>>>>>>>  Tel.: +86-10-62797143
>>>>>>>>>  Web: http://james0zan.github.io/
>>>>>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
>>>>>>>>> 100084, China
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> John Criswell
>>>>>>>>> Assistant Professor
>>>>>>>>> Department of Computer Science, University of Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>   Mingxing Zhang
>>>>>>>>
>>>>>>>>  Tel.: +86-10-62797143
>>>>>>>>  Web: http://james0zan.github.io/
>>>>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
>>>>>>>> 100084, China
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>   Mingxing Zhang
>>>>>>>
>>>>>>>  Tel.: +86-10-62797143
>>>>>>>  Web: http://james0zan.github.io/
>>>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
>>>>>>> 100084, China
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> John Criswell
>>>>>>> Assistant Professor
>>>>>>> Department of Computer Science, University of Rochesterhttp://www.cs.rochester.edu/u/criswell
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>   Mingxing Zhang
>>>>>>
>>>>>>  Tel.: +86-10-62797143
>>>>>>  Web: http://james0zan.github.io/
>>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing
>>>>>> 100084, China
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>   Mingxing Zhang
>>>>>
>>>>>  Tel.: +86-10-62797143
>>>>>  Web: http://james0zan.github.io/
>>>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084,
>>>>> China
>>>>>    _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>
>>>
>>>
>>> --
>>>   Mingxing Zhang
>>>
>>>  Tel.: +86-10-62797143
>>>  Web: http://james0zan.github.io/
>>>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084,
>>> China
>>>
>>
>>
>
>
> --
>   Mingxing Zhang
>
>  Tel.: +86-10-62797143
>  Web: http://james0zan.github.io/
>  Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084,
> China
>
>
>
> --
> John Criswell
> Assistant Professor
> Department of Computer Science, University of Rochesterhttp://www.cs.rochester.edu/u/criswell
>
>


-- 
Mingxing Zhang

Tel.: +86-10-62797143
Web: http://james0zan.github.io/
Addr: Room 3-122, FIT Building, Tsinghua University, Beijing 100084, China
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150327/0fee0faf/attachment.html>


More information about the llvm-dev mailing list