<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 4/27/13 2:53 PM, Mingliang LIU
wrote:<br>
</div>
<blockquote
cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
Hi all,
<div><br>
</div>
<div>
<div>This is a GSoC 2013 proposal for LLVM project. Please see
the formatted version at here: <span style="text-decoration:
underline; color: rgb(71, 135, 255); "><a
moz-do-not-send="true"
href="http://pacman.cs.tsinghua.edu.cn/%7Eliuml07/files/gsoc2013-proposal-program-slicing.pdf">http://pacman.cs.tsinghua.edu.cn/~liuml07/files/gsoc2013-proposal-program-slicing.pdf</a></span></div>
</div>
</blockquote>
<br>
I wanted to comment up-front to our GSoC mentors this year that an
LLVM dynamic slicing pass is one of the most often requested
components we get from the research community.<br>
<br>
<blockquote
cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"
type="cite">
<div>
<div><span style="text-decoration: underline; color: rgb(71,
135, 255); "><br>
</span></div>
<div>Program slicing has been used in many applications, the
criteria of which is a pair of statement and variables.
I would like to write an inter-procedural program slicing pass
in LLVM, which is able to calculate C program slices of source
code effectively. There is no previous work implemented in
LLVM, which considers both the dynamic program slicing and
source code of the sliced program.</div>
</div>
</blockquote>
<br>
Actually, there is a dynamic slicing tool built using LLVM, and it
maps LLVM IR statements to source-level statements for its output
using the debug metadata. Swarup Sahoo and I built it and used it
in our recent ASPLOS paper
(<a class="moz-txt-link-freetext" href="http://dl.acm.org/citation.cfm?id=2451131">http://dl.acm.org/citation.cfm?id=2451131</a>). The tool is named
Giri.<br>
<br>
As an aside, the "Giri" code you described below is a static slicing
pass that we originally intended to add to our dynamic slicing code
and release as one project called "Giri." That project has stalled,
and if it doesn't get revitalized, I think it should be moved
elsewhere (perhaps github?).<br>
<br>
What I think would be good for your project would be to take the
Giri code that we've developed, update it to LLVM mainline, and make
it more robust. We can provide the code to you if you get selected
for GSoC. Unfortunately, I won't be mentoring this year. Perhaps
Swarup (CC'ed above) or someone else will be interested.<br>
<br>
<blockquote
cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"
type="cite">
<div>
<div> Program slice contains all statements in a program that
directly or indirectly act the value of a variable occurrence
[5]. Program slicing has been used in many applications, e.g.,
program verification, testing, maintenance, automatic
parallelization of program execution, automatic integration of
program versions.</div>
<div><br>
</div>
<div>While it's straightforward to implement the slicer in the
back-end of compiler using SSA form, the source code of the
original program instead of intermediate</div>
<div>representation is preferred in most cases. Moreover, we can
further narrow the notion of slice, which contains statements
that influence the value of a</div>
<div>variable occurrence for specic program inputs. This is
referred as dynamic program slicing [1]. However, there is no
previous work implemented in LLVM</div>
<div>which solved the two problems.</div>
</div>
<div><br>
</div>
<div>
<div>There are two public projects which implement the backwards
static slicing in LLVM.</div>
<div>
<ul class="MailOutline">
<li>Giri Written by John Criswell from UIUC, a subproject of
LLVM. The Giri code contains the static backwards
intra-procedure slicing passes, and runs with an older
version of LLVM. It also only backtracks until it hits a
load. Additional code must be written to backtrack further
to find potentially reaching stores.</li>
<li>LLVMSlicer This implementation is a complete static
backwards slicer from Masaryk University. It works on the
well dened data and control flow equations in a white
paper by F. Tip [4]. However, this code was written for
special purpose, thus it's not general enough to be use
by others. They implemented the Andersen's alias algorithm
[2], callgraph, and modies analysis to support the
slicer, instead of using the LLVM APIs.</li>
</ul>
</div>
<div><br>
</div>
<div>They eliminate the useless IR statements and keep the ones
aect the values of the criteria. However, neither of them
generates the compilable source code</div>
<div>slice, which is heavily needed in reality. There
are several ways to do this. One is to generate the
source code from sliced IR using llc tool. The issue is that</div>
<div>the IR is not concerned with high-level semantic.
The generated source code is different from the
original program and not suitable for human reading. An-</div>
<div>other approach is deleting the source code according to
sliced IR, with line number information (in meta-data of each
instruction). The naive script deleting</div>
<div>sliced source code one by one fails to handle tricky cases.
I think a better source code slicer is to take use of the AST
info.</div>
</div>
</blockquote>
<br>
If you want to generate *compilable* source code, then I think you
either need to go with the first option (using llc with the C
backend) or you need to implement part or all of the slicing code
within Clang. The second option you describe (using debug
meta-data) will probably be too fragile to work in practice; debug
information can be removed, and it may not capture things like the
inclusion of header files.<br>
<br>
That said, can you provide citations showing that being able to
compile the slice is needed? Does the slice need to be readable or
just compilable? It is not clear to me that a compilable and
human-readable slice is needed by most applications needing dynamic
slicing. I suspect only one or the other is needed.<br>
<br>
<blockquote
cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"
type="cite">
<div>
<div><br>
</div>
<div>There are three main parts of this work.</div>
<div>
<ol class="MailOutline">
<li>First, borrow an implementation of static back-wards
slicing from Giri or LLVMSlicer, and use the LLVM
callgraph, mod/ref and alias interfaces as much as
possible.</li>
</ol>
</div>
</div>
</blockquote>
<br>
For what purpose would you use the static analysis? Would you use
it to optimize the instrumentation needed for dynamic slicing?<br>
<br>
<blockquote
cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"
type="cite">
<div>
<div>
<ol class="MailOutline">
<li>Second, implement the dynamic program slicing using the
approach 3 in the paper [1].</li>
</ol>
</div>
</div>
</blockquote>
<br>
You should take a look at Giri's algorithm in our ASPLOS paper.
While not novel, it does have a nifty feature that uses the SSA
graph to reduce the number of events that it needs to trace to
create the dynamic slice.<br>
<br>
<blockquote
cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"
type="cite">
<div>
<div>
<ol class="MailOutline">
<li>Third, generate the source code of the sliced program.
To make the sliced source code compile directly, we need
to employ clang front-end</li>
</ol>
</div>
</div>
<div><br>
</div>
<div>
<div>The final result of this summer of code is to make this
pass work eeffectively and documented well. Further, I'll
write test cases and behave as the active</div>
<div>maintainer for this project. My long-term plan is to add
more features, e.g. Objective-C/C++ support, thing slicing
[3], to this project.</div>
</div>
</blockquote>
<br>
What is your experience with LLVM IR and/or Clang ASTs? Taking our
Giri code and making it robust seems doable over the summer.
Building something with Clang AST's and LLVM IR and having it solid
and well documented by end of summer seems a bit ambitious.<br>
<br>
<br>
<blockquote
cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"
type="cite">
<div>
<div><br>
</div>
<div>Any comment is highly appreciated.</div>
</div>
</blockquote>
<br>
I think your proposal should make a case that dynamic slicing is a
very useful thing. For example, you should cite some papers that
make use of it. While I know that dynamic slicing is useful (we
keep making one-off distributions of our dynamic slicing code for
researchers), that is not the most convincing argument. If you can
find real-world tools that use it (or could benefit from it), that's
even better.<br>
<br>
Second, your proposal should list your qualifications for the
project. Why are you the right person for this task? Do you have
experience writing code? Working with dynamic slicing? Have you
written LLVM/Clang passes before?<br>
<br>
Third, you should explain how all the parts fit together. If you're
working on dynamic slicing, it's not clear to me why you want a
static inter-procedural slicing pass.<br>
<br>
Fourth, you should discuss how your passes will integrate into the
LLVM toolchain. Will you be adding an option to clang? Will you
build a replacement libLTO to do the inter-procedural slicing, or do
you expect people to use opt to run it?<br>
<br>
-- John T.<br>
<br>
<blockquote
cite="mid:E62B60AA-491C-46F8-BF4D-D009F914F35A@mails.tsinghua.edu.cn"
type="cite">
<div>
<div><br>
</div>
<div><br>
</div>
<div>
<div>[1] H AGRAWAL. Dynamic Program Slicing. In ACM SIGPLAN
Conference on Programming Language Design and Implementation
(PLDI),1990.</div>
<div>
<div>2] L.O. Andersen. Program analysis and specialization
for the C programming language. PhD thesis, University of
Cophenhagen, Germany, 1994.</div>
<div>[3] M. Sridharan, S.J. Fink, and R. Bodik.
Thin slicing. In PLDI'07, 2007.</div>
<div>[4] F. Tip. A survey of program slicing
techniques.Journal of Programming Languages, 1995.</div>
<div>[5] M. Weiser. Program slicing. In Proceedings of the
5th International Conference on Software Engineering,
ICSE, pages 439{449. IEEE, 1981.</div>
</div>
<div><br>
</div>
<div>
<div>
--<br>
Mingliang LIU (刘明亮 in Chinese)<br>
<br>
PACMAN Group, Dept. of Computer Science & Technology<br>
Tsinghua University, Beijing 100084, China<br>
Email: <a moz-do-not-send="true"
href="mailto:liuml07@mails.tsinghua.edu.cn">liuml07@mails.tsinghua.edu.cn</a><br>
Homepage: <a moz-do-not-send="true"
href="http://pacman.cs.tsinghua.edu.cn/%7Eliuml07/">http://pacman.cs.tsinghua.edu.cn/~liuml07/</a>
</div>
<br>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a class="moz-txt-link-freetext" href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a>
<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>
</pre>
</blockquote>
<br>
</body>
</html>