[LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM

John Criswell criswell at illinois.edu
Mon Apr 29 07:40:37 PDT 2013


On 4/27/13 2:53 PM, Mingliang LIU wrote:
> Hi all,
>
> This is a GSoC 2013 proposal for LLVM project. Please see the 
> formatted version at here: 
> http://pacman.cs.tsinghua.edu.cn/~liuml07/files/gsoc2013-proposal-program-slicing.pdf 
> <http://pacman.cs.tsinghua.edu.cn/%7Eliuml07/files/gsoc2013-proposal-program-slicing.pdf>

I wanted to comment up-front to our GSoC mentors this year that an LLVM 
dynamic slicing pass is one of the most often requested components we 
get from the research community.

>
> Program slicing has been used in many applications, the criteria 
> of which is a pair of statement and variables. I would like to write 
> an inter-procedural program slicing pass in LLVM, which is able 
> to calculate C program slices of source code effectively. There is no 
> previous work implemented in LLVM, which considers both the dynamic 
> program slicing and source code of the sliced program.

Actually, there is a dynamic slicing tool built using LLVM, and it maps 
LLVM IR statements to source-level statements for its output using the 
debug metadata.  Swarup Sahoo and I built it and used it in our recent 
ASPLOS paper (http://dl.acm.org/citation.cfm?id=2451131).  The tool is 
named Giri.

As an aside, the "Giri" code you described below is a static slicing 
pass that we originally intended to add to our dynamic slicing code and 
release as one project called "Giri."  That project has stalled, and if 
it doesn't get revitalized, I think it should be moved elsewhere  
(perhaps github?).

What I think would be good for your project would be to take the Giri 
code that we've developed, update it to LLVM mainline, and make it more 
robust.  We can provide the code to you if you get selected for GSoC.  
Unfortunately, I won't be mentoring this year.  Perhaps Swarup (CC'ed 
above) or someone else will be interested.

>  Program slice contains all statements in a program that directly or 
> indirectly act the value of a variable occurrence [5]. Program slicing 
> has been used in many applications, e.g., program verification, 
> testing, maintenance, automatic parallelization of program execution, 
> automatic integration of program versions.
>
> While it's straightforward to implement the slicer in the back-end of 
> compiler using SSA form, the source code of the original program 
> instead of intermediate
> representation is preferred in most cases. Moreover, we can further 
> narrow the notion of slice, which contains statements that influence 
> the value of a
> variable occurrence for speci
c program inputs. This is referred as 
> dynamic program slicing [1]. However, there is no previous work 
> implemented in LLVM
> which solved the two problems.
>
> There are two public projects which implement the backwards static 
> slicing in LLVM.
>
>   * Giri Written by John Criswell from UIUC, a subproject of LLVM. The
>     Giri code contains the static backwards intra-procedure slicing
>     passes, and runs with an older version of LLVM. It also only
>     backtracks until it hits a load. Additional code must be written
>     to backtrack further to find potentially reaching stores.
>   * LLVMSlicer This implementation is a complete static backwards
>     slicer from Masaryk University. It works on the well de
ned data
>     and control flow equations in a white paper by F. Tip
>     [4]. However, this code was written for special purpose, thus it's
>     not general enough to be use by others. They implemented the
>     Andersen's alias algorithm [2], callgraph, and modi
es analysis to
>     support the slicer, instead of using the LLVM APIs.
>
>
> They eliminate the useless IR statements and keep the ones a
ect the 
> values of the criteria. However, neither of them generates the 
> compilable source code
> slice, which is heavily needed in reality. There are several ways to 
> do this. One is to generate the source code from sliced IR using llc 
> tool. The issue is that
> the IR is not concerned with high-level semantic. The generated source 
> code is different from the original program and not suitable for human 
> reading. An-
> other approach is deleting the source code according to sliced IR, 
> with line number information (in meta-data of each instruction). The 
> naive script deleting
> sliced source code one by one fails to handle tricky cases. I think a 
> better source code slicer is to take use of the AST info.

If you want to generate *compilable* source code, then I think you 
either need to go with the first option (using llc with the C backend) 
or you need to implement part or all of the slicing code within Clang.  
The second option you describe (using debug meta-data) will probably be 
too fragile to work in practice; debug information can be removed, and 
it may not capture things like the inclusion of header files.

That said, can you provide citations showing that being able to compile 
the slice is needed?  Does the slice need to be readable or just 
compilable?  It is not clear to me that a compilable and human-readable 
slice is needed by most applications needing dynamic slicing.  I suspect 
only one or the other is needed.

>
> There are three main parts of this work.
>
>  1. First, borrow an implementation of static back-wards slicing from
>     Giri or LLVMSlicer, and use the LLVM callgraph, mod/ref and alias
>     interfaces as much as possible.
>

For what purpose would you use the static analysis?  Would you use it to 
optimize the instrumentation needed for dynamic slicing?

>  1. Second, implement the dynamic program slicing using the approach 3
>     in the paper [1].
>

You should take a look at Giri's algorithm in our ASPLOS paper. While 
not novel, it does have a nifty feature that uses the SSA graph to 
reduce the number of events that it needs to trace to create the dynamic 
slice.

>  1. Third, generate the source code of the sliced program. To make the
>     sliced source code compile directly, we need to employ clang front-end
>
>
> The final result of this summer of code is to make this pass work 
> e
effectively and documented well. Further, I'll write test cases and 
> behave as the active
> maintainer for this project. My long-term plan is to add more 
> features, e.g. Objective-C/C++ support, thing slicing [3], to this 
> project.

What is your experience with LLVM IR and/or Clang ASTs?  Taking our Giri 
code and making it robust seems doable over the summer. Building 
something with Clang AST's and LLVM IR and having it solid and well 
documented by end of summer seems a bit ambitious.


>
> Any comment is highly appreciated.

I think your proposal should make a case that dynamic slicing is a very 
useful thing.  For example, you should cite some papers that make use of 
it.  While I know that dynamic slicing is useful (we keep making one-off 
distributions of our dynamic slicing code for researchers), that is not 
the most convincing argument.  If you can find real-world tools that use 
it (or could benefit from it), that's even better.

Second, your proposal should list your qualifications for the project.  
Why are you the right person for this task?  Do you have experience 
writing code?  Working with dynamic slicing?  Have you written 
LLVM/Clang passes before?

Third, you should explain how all the parts fit together.  If you're 
working on dynamic slicing, it's not clear to me why you want a static 
inter-procedural slicing pass.

Fourth, you should discuss how your passes will integrate into the LLVM 
toolchain.  Will you be adding an option to clang?  Will you build a 
replacement libLTO to do the inter-procedural slicing, or do you expect 
people to use opt to run it?

-- John T.

>
>
> [1] H AGRAWAL. Dynamic Program Slicing. In ACM SIGPLAN Conference on 
> Programming Language Design and Implementation (PLDI),1990.
> 2] L.O. Andersen. Program analysis and specialization for the C 
> programming language. PhD thesis, University of Cophenhagen, Germany, 
> 1994.
> [3] M. Sridharan, S.J. Fink, and R. Bodik. Thin slicing. In PLDI'07, 2007.
> [4] F. Tip. A survey of program slicing techniques.Journal of 
> Programming Languages, 1995.
> [5] M. Weiser. Program slicing. In Proceedings of the 5th 
> International Conference on Software Engineering, ICSE, pages 439{449. 
> IEEE, 1981.
>
> --
> Mingliang LIU (??? in Chinese)
>
> PACMAN Group,  Dept. of Computer Science & Technology
> Tsinghua University, Beijing 100084, China
> Email: liuml07 at mails.tsinghua.edu.cn 
> <mailto:liuml07 at mails.tsinghua.edu.cn>
> Homepage: http://pacman.cs.tsinghua.edu.cn/~liuml07/ 
> <http://pacman.cs.tsinghua.edu.cn/%7Eliuml07/>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130429/0a2eb18a/attachment.html>


More information about the llvm-dev mailing list