<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 4/27/13 2:53 PM, Mingliang LIU

      wrote:<br>

    </div>

    <blockquote

      cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=ISO-8859-1">

      Hi all,

      <div><br>

      </div>

      <div>

        <div>This is a GSoC 2013 proposal for LLVM project. Please see

          the formatted version at here: <span style="text-decoration:

            underline; color: rgb(71, 135, 255); "><a

              moz-do-not-send="true"

href="http://pacman.cs.tsinghua.edu.cn/%7Eliuml07/files/gsoc2013-proposal-program-slicing.pdf">http://pacman.cs.tsinghua.edu.cn/~liuml07/files/gsoc2013-proposal-program-slicing.pdf</a></span></div>

      </div>

    </blockquote>

    <br>

    I wanted to comment up-front to our GSoC mentors this year that an

    LLVM dynamic slicing pass is one of the most often requested

    components we get from the research community.<br>

    <br>

    <blockquote

      cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"

      type="cite">

      <div>

        <div><span style="text-decoration: underline; color: rgb(71,

            135, 255); "><br>

          </span></div>

        <div>Program slicing has been used in many applications, the

          criteria of which is a pair of statement and variables.

          I would like to write an inter-procedural program slicing pass

          in LLVM, which is able to calculate C program slices of source

          code effectively. There is no previous work implemented in

          LLVM, which considers both the dynamic program slicing and

          source code of the sliced program.</div>

      </div>

    </blockquote>

    <br>

    Actually, there is a dynamic slicing tool built using LLVM, and it

    maps LLVM IR statements to source-level statements for its output

    using the debug metadata.  Swarup Sahoo and I built it and used it

    in our recent ASPLOS paper

    (<a class="moz-txt-link-freetext" href="http://dl.acm.org/citation.cfm?id=2451131">http://dl.acm.org/citation.cfm?id=2451131</a>).  The tool is named

    Giri.<br>

    <br>

    As an aside, the "Giri" code you described below is a static slicing

    pass that we originally intended to add to our dynamic slicing code

    and release as one project called "Giri."  That project has stalled,

    and if it doesn't get revitalized, I think it should be moved

    elsewhere  (perhaps github?).<br>

    <br>

    What I think would be good for your project would be to take the

    Giri code that we've developed, update it to LLVM mainline, and make

    it more robust.  We can provide the code to you if you get selected

    for GSoC.  Unfortunately, I won't be mentoring this year.  Perhaps

    Swarup (CC'ed above) or someone else will be interested.<br>

    <br>

    <blockquote

      cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"

      type="cite">

      <div>

        <div> Program slice contains all statements in a program that

          directly or indirectly act the value of a variable occurrence

          [5]. Program slicing has been used in many applications, e.g.,

          program verification, testing, maintenance, automatic

          parallelization of program execution, automatic integration of

          program versions.</div>

        <div><br>

        </div>

        <div>While it's straightforward to implement the slicer in the

          back-end of compiler using SSA form, the source code of the

          original program instead of intermediate</div>

        <div>representation is preferred in most cases. Moreover, we can

          further narrow the notion of slice, which contains statements

          that influence the value of a</div>

        <div>variable occurrence for specic program inputs. This is

          referred as dynamic program slicing [1]. However, there is no

          previous work implemented in LLVM</div>

        <div>which solved the two problems.</div>

      </div>

      <div><br>

      </div>

      <div>

        <div>There are two public projects which implement the backwards

          static slicing in LLVM.</div>

        <div>

          <ul class="MailOutline">

            <li>Giri Written by John Criswell from UIUC, a subproject of

              LLVM. The Giri code contains the static backwards

              intra-procedure slicing passes, and runs with an older

              version of LLVM. It also only backtracks until it hits a

              load. Additional code must be written to backtrack further

              to find potentially reaching stores.</li>

            <li>LLVMSlicer This implementation is a complete static

              backwards slicer from Masaryk University. It works on the

              well dened data and control flow equations in a white

              paper by F. Tip [4]. However, this code was written for

              special purpose, thus it's not general enough to be use

              by others. They implemented the Andersen's alias algorithm

              [2], callgraph, and modies analysis to support the

              slicer, instead of using the LLVM APIs.</li>

          </ul>

        </div>

        <div><br>

        </div>

        <div>They eliminate the useless IR statements and keep the ones

          aect the values of the criteria. However, neither of them

          generates the compilable source code</div>

        <div>slice, which is heavily needed in reality. There

          are several ways to do this. One is to generate the

          source code from sliced IR using llc tool. The issue is that</div>

        <div>the IR is not concerned with high-level semantic.

          The generated source code is different from the

          original program and not suitable for human reading. An-</div>

        <div>other approach is deleting the source code according to

          sliced IR, with line number information (in meta-data of each

          instruction). The naive script deleting</div>

        <div>sliced source code one by one fails to handle tricky cases.

          I think a better source code slicer is to take use of the AST

          info.</div>

      </div>

    </blockquote>

    <br>

    If you want to generate *compilable* source code, then I think you

    either need to go with the first option (using llc with the C

    backend) or you need to implement part or all of the slicing code

    within Clang.  The second option you describe (using debug

    meta-data) will probably be too fragile to work in practice; debug

    information can be removed, and it may not capture things like the

    inclusion of header files.<br>

    <br>

    That said, can you provide citations showing that being able to

    compile the slice is needed?  Does the slice need to be readable or

    just compilable?  It is not clear to me that a compilable and

    human-readable slice is needed by most applications needing dynamic

    slicing.  I suspect only one or the other is needed.<br>

    <br>

    <blockquote

      cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"

      type="cite">

      <div>

        <div><br>

        </div>

        <div>There are three main parts of this work.</div>

        <div>

          <ol class="MailOutline">

            <li>First, borrow an implementation of static back-wards

              slicing from Giri or LLVMSlicer, and use the LLVM

              callgraph, mod/ref and alias interfaces as much as

              possible.</li>

          </ol>

        </div>

      </div>

    </blockquote>

    <br>

    For what purpose would you use the static analysis?  Would you use

    it to optimize the instrumentation needed for dynamic slicing?<br>

    <br>

    <blockquote

      cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"

      type="cite">

      <div>

        <div>

          <ol class="MailOutline">

            <li>Second, implement the dynamic program slicing using the

              approach 3 in the paper [1].</li>

          </ol>

        </div>

      </div>

    </blockquote>

    <br>

    You should take a look at Giri's algorithm in our ASPLOS paper. 

    While not novel, it does have a nifty feature that uses the SSA

    graph to reduce the number of events that it needs to trace to

    create the dynamic slice.<br>

    <br>

    <blockquote

      cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"

      type="cite">

      <div>

        <div>

          <ol class="MailOutline">

            <li>Third, generate the source code of the sliced program.

              To make the sliced source code compile directly, we need

              to employ clang front-end</li>

          </ol>

        </div>

      </div>

      <div><br>

      </div>

      <div>

        <div>The final result of this summer of code is to make this

          pass work eeffectively and documented well. Further, I'll

          write test cases and behave as the active</div>

        <div>maintainer for this project. My long-term plan is to add

          more features, e.g. Objective-C/C++ support, thing slicing

          [3], to this project.</div>

      </div>

    </blockquote>

    <br>

    What is your experience with LLVM IR and/or Clang ASTs?  Taking our

    Giri code and making it robust seems doable over the summer. 

    Building something with Clang AST's and LLVM IR and having it solid

    and well documented by end of summer seems a bit ambitious.<br>

    <br>

    <br>

    <blockquote

      cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"

      type="cite">

      <div>

        <div><br>

        </div>

        <div>Any comment is highly appreciated.</div>

      </div>

    </blockquote>

    <br>

    I think your proposal should make a case that dynamic slicing is a

    very useful thing.  For example, you should cite some papers that

    make use of it.  While I know that dynamic slicing is useful (we

    keep making one-off distributions of our dynamic slicing code for

    researchers), that is not the most convincing argument.  If you can

    find real-world tools that use it (or could benefit from it), that's

    even better.<br>

    <br>

    Second, your proposal should list your qualifications for the

    project.  Why are you the right person for this task?  Do you have

    experience writing code?  Working with dynamic slicing?  Have you

    written LLVM/Clang passes before?<br>

    <br>

    Third, you should explain how all the parts fit together.  If you're

    working on dynamic slicing, it's not clear to me why you want a

    static inter-procedural slicing pass.<br>

    <br>

    Fourth, you should discuss how your passes will integrate into the

    LLVM toolchain.  Will you be adding an option to clang?  Will you

    build a replacement libLTO to do the inter-procedural slicing, or do

    you expect people to use opt to run it?<br>

    <br>

    -- John T.<br>

    <br>

    <blockquote

      cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"

      type="cite">

      <div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div>

          <div>[1] H AGRAWAL. Dynamic Program Slicing. In ACM SIGPLAN

            Conference on Programming Language Design and Implementation

            (PLDI),1990.</div>

          <div>

            <div>2] L.O. Andersen. Program analysis and specialization

              for the C programming language. PhD thesis, University of

              Cophenhagen, Germany, 1994.</div>

            <div>[3] M. Sridharan, S.J. Fink, and R. Bodik.

              Thin slicing. In PLDI'07, 2007.</div>

            <div>[4] F. Tip. A survey of program slicing

              techniques.Journal of Programming Languages, 1995.</div>

            <div>[5] M. Weiser. Program slicing. In Proceedings of the

              5th International Conference on Software Engineering,

              ICSE, pages 439{449. IEEE, 1981.</div>

          </div>

          <div><br>

          </div>

          <div>

            <div>

              --<br>

              Mingliang LIU (刘明亮 in Chinese)<br>

              <br>

              PACMAN Group,  Dept. of Computer Science & Technology<br>

              Tsinghua University, Beijing 100084, China<br>

              Email: <a moz-do-not-send="true"

                href="mailto:liuml07@mails.tsinghua.edu.cn">liuml07@mails.tsinghua.edu.cn</a><br>

              Homepage: <a moz-do-not-send="true"

                href="http://pacman.cs.tsinghua.edu.cn/%7Eliuml07/">http://pacman.cs.tsinghua.edu.cn/~liuml07/</a>

            </div>

            <br>

          </div>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a class="moz-txt-link-freetext" href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a>

<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>