[LLVMdev] GSoC Proposal: Inter-Procedure Program Slicing in LLVM
criswell at illinois.edu
Thu May 2 08:25:05 PDT 2013
On 5/1/13 11:36 PM, Mingliang LIU wrote:
> Hi all,
> I had a second thought of the dynamic slicing, as well as the source
> code generating.
> Firstly, the dynamic slicing is very useful to software community
> (I'll illustrate more in the refined proposal later), but it's already
> implemented by Swarup and John Criswell from UIUC. The static slicing
> code has been released as Giri project in LLVM, and they would kindly
> release the dynamic slicing too if this proposal is accepted.
> I'd like to study and enhance the Giri code to make it better.
> However, I don't know the exact part to which I can contribute. I'll
> ask Swarup (I cc this email to him) for help. I would be honored to be
> directed by him if this proposal is selected.
> Moreover, I don't know how the static and dynamic slicing fit
> together. Does the static code of Giri need some improvements? For
> example, taking into consideration the pointer aliases. Our group need
> the static slicing to generate I/O benchmarks (see below please).
It is not immediately obvious to me that dynamic slicing can be improved
by a static slicing analysis.
As far as improvements to the dynamic slicing Giri code, I can think of
1) Adding support to correctly handle asynchronous events (e.g., signal
2) Updating the code to LLVM mainline and putting it into the giri SVN
3) Reducing the trace size. Giri currently records the execution of
each basic block, load and store, call, and return instruction. There
are things you can do to make the trace smaller (e.g., creating one
trace record for control-equivalent basic blocks).
4) Making the giri run-time library thread safe and the slicing
thread/process aware. Right now, I think the events of all processes
and threads get thrown together in one trace file. Each should have a
separate trace file, or trace records should indicate which thread is
performing a particular operation.
5) Improving the giri run-time. The current run-time library maps a
portion of the trace file into memory, writes trace records into it, and
then synchronously munmaps it and then maps in the next portion of the
trace file. This design ensures that we don't swamp memory (the
application can produce trace records faster than the OS can write them
to disk), but it doesn't overlap computation with I/O very well. The
run-time also has a static size for how many trace records to hold in
memory before flushing to disk; this value should be computed dynamically.
In short, I think you would need to:
a) Make Giri up to date with LLVM and make it robust.
b) Profile it to see where to improve performance
c) Make improvements that improve performance or reduce trace size
> Secondly, to generate source code, which is human-readable and
> compilable seems a bit ambitious. My previous scripts in python can
> generate the source code of the original program by deleting useless
> line of code. It's kind of tricky. We used dozens of regular
> expressions to match the boundary of blocks, loops and functions to
> avoid deleting lines unexpectedly. I thought employing clang was a
> better idea. I don't know whether you think the source code genration
> is a good idea or not. I can release our simple script and improve it
> later. So I have more time/energy to contribute to the dynamic slicer.
> I think the slicing pass can be loaded by the opt. There is no need to
> change the front-end or other passes. I'm not sure whether the link
> time optimization can be exploited. We borrowed the LLVMSlicer code
> previously without LTO.
The problem with loading passes into opt is that you need to have a
whole-program bitcode file. In practice, those can be difficult to
build. The LLVM instrumentation passes should either be compiled into
Clang and run on each compilation unit or linked into libLTO and run on
the whole program bitcode before libLTO generates native code. Either
of these approaches will make compiling real-world programs with Giri
easier to do.
FWIW, we currently link the Giri passes into libLTO to ensure that each
instrumented instruction gets its own unique ID.
> Lastly, I'd like to introduce myself briefly. I'm a three-years PhD
> candidate student from Tsinghua University, China. My research area
> covers performance analysis, compiler techniques for high performance
> computing, and parallel computing (MPI/OpenMP).
> One of my on-going work is to generate an I/O benchmark from the
> original application. The base observation is that the computation and
> communication statements can be deleted if they're irrelevant to the
> I/O pattern, e.g. computing the buffer content to be written into a
> file. We take use of the program slicing technique to find
> relevant/irrelevant statements. The static slicer was borrowed from
> LLVMSlicer and I wrote code to make it work for our application. We
> generated the line number of sliced code. There is a very simple
> source code generation script using ugly and tricky regular
> expressions to delete original source code according to the sliced
> line number. We're going to submit the first version of the paper
> I also took part in one project in Open64 compiler several year ago,
> the purpose of which is to fast collect the communication trace. We
> used program slicing to delete the computation statements and kept the
> communication related statements. We generated executable
> binaries instead of source code from IR.
> Now we plan to do a project that can automatically find manual
> configuration errors in software deployment. The dynamic slicing can
> help a lot since the input is the key factor to locate errors. We have
> not a concrete plan for this project, but the dynamic slicing is
> heavily needed.
Your final proposal should cite other applications that use (or could
benefit from) dynamic slicing. Our ASPLOS 2013 paper would be an
example, but you should look for and cite several papers about several
different applications from several different groups. Industry groups
would be a plus. Saying that one or two people use dynamic slicing
isn't all that convincing; saying that x different groups use it,
including y industry groups, would be far more compelling.
To get you started, you may want to look at this paper by Johnson et.
-- John T.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev