<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 5/5/15 5:33 PM, Wan Zhiyuan wrote:<br>
</div>
<blockquote
cite="mid:CAFBNOftYr3F+L05q3637u=Kc26oLUu1tYueXgL71T1QqyJ2egQ@mail.gmail.com"
type="cite">
<meta http-equiv="Context-Type" content="text/html; charset=UTF-8">
<div dir="ltr">Dear John,
<div>I intend to implement the improvements on DSA.</div>
</div>
</blockquote>
<br>
There be nasty dragons in DSA. Don't say I didn't warn you.<br>
:)<br>
<br>
On a side note, I recently discovered that someone (I think Will
Dietz) updated the code to work with LLVM mainline, so you should be
able to update to a newer version of LLVM and use DSA. I've created
a snapshot of the LLVM source tree and a modified version of the
poolalloc tree in <a class="moz-txt-link-freetext" href="https://github.com/jtcriswell/llvm-dsa">https://github.com/jtcriswell/llvm-dsa</a> that you
may find useful. I think Will is also maintaining a github tree
that he updates regularly.<br>
<br>
<blockquote
cite="mid:CAFBNOftYr3F+L05q3637u=Kc26oLUu1tYueXgL71T1QqyJ2egQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>After running DSA on SPEC, I found DSA gives low precision
for mcf and bzip2.</div>
<div>I have checked the most imprecise c files in mcf an found
that the code seems to be a mixture of "PHI" and "GEP"
instructions.</div>
<div><br>
</div>
<div>Could you please give me some hints about what the big
picture of the improvement should be and how to start?</div>
</div>
</blockquote>
<br>
There may be many reasons why DSA's precision is low. You'll need
to figure out what the reason is for each program. In some cases,
it may be type inference. In other cases, it may be that DSA is
cognizant of the influence of external code (which causes it to
correctly give pessimistic results). For programs using function
pointers, there's a bug in the algorithm in which DSNodes that
should be marked complete are not. It could also be that you're
using the wrong DSA pass (e.g., using EQTD when TD will do).<br>
<br>
Looking at type inference specifically, the problem, in a nutshell,
is that DSA's type inference should not rely upon LLVM's type
annotations. It should just create a map from offsets to types.
Some work on this has already been done (e.g., DSA can figure out
that casting a pointer to a structure into a pointer to the type of
the structure's first field is still offset 0). However, there are
still places in which DSA is relying upon LLVM's type annotations.<br>
<br>
One thing that I would like to look at is changing how DSA analyzes
arrays of structures. Right now, DSA tries to infer a structure
type for a memory object and then tries to infer whether the memory
object is a singleton structure or an array of structures (you can
see this in DSA's interface; you can see a map between offsets and
types and an 'A' flag indicating that the object is an array). I
think this makes DSA needlessly complicated.<br>
<br>
I think it would be better if DSA did what Value Set Analysis (VSA)
does. VSA was designed to analyze untyped binary code. For each
abstract memory object, it creates a map of 4-tuples to types. Each
4-tuple represents a formula ax+b as (a,b,c,d) in which b if offset,
a is stride, and x is constrained between values c and d (c and d
can be constants or +-infinity). For example, if you have an array
of struct {int a ; char * b} on a 32-bit machine, DSA currently
tries to figure out that there's an array of structure elements in
which there's an int at offset 0 and a char * at offset 4 within the
structure. VSA would say that there's a memory object with an int
at every multiple of 4x+0 offset and a char * at every 4x+4 offset.<br>
<br>
The VSA approach should be agnostic to all sorts of weird casting,
embedded arrays, and embedded unions, though this is an educated
guess at present.<br>
<br>
Regards,<br>
<br>
John Criswell<br>
<br>
<blockquote
cite="mid:CAFBNOftYr3F+L05q3637u=Kc26oLUu1tYueXgL71T1QqyJ2egQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Thank you!</div>
<div><br>
</div>
<div>Regards,</div>
<div>Zhiyuan</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Apr 6, 2015 at 5:22 PM, John
Criswell <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:jtcriswel@gmail.com" target="_blank">jtcriswel@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote">
<div>
<div>Dear Zhiyuan,<br>
<br>
In order to reproduce the results from the paper, you'll
need to replicate a system from that era. You'll need
to use the same version of LLVM and DSA that the paper
used. I think that was LLVM 1.9 (the release_19 branch
of LLVM and poolalloc), but I'm not sure. You should
check to see if the paper specifies the version.<br>
<br>
As you'll be using a very old version of LLVM, it may be
worth setting up a VM with a corresponding old version
of Linux. I suspect newer compilers will not compile a
version of LLVM that is that old, so using an old
version of Linux with an old version of GCC may be
needed. I think Fedora Core 2 is the OS we were using
at the time.<br>
<br>
To answer the question of why you can't use a modern
version of LLVM and poolalloc, it's because LLVM has
changed significantly. DSA relies upon the type
annotations provided in the LLVM IR to "bootstrap" its
type inference (bootstrap is not quite the right word,
but it's the closest one of which I could think). As
LLVM matured, transformations would ditch the type
information (e.g., transforming typed GEPs into untyped
GEPs into a byte array), making DSA's ability to do
type-inference (and thereby improving field sensitivity)
more difficult. Throw into the mix the fact that DSA is
maintained by an academic research group, and the result
is that modern DSA doesn't have the accuracy that the
original DSA did.<br>
<br>
The good news is that I think DSA can be fixed by making
its type-inferencing code smarter. The bad news is that
it'd be a fair amount of work to do. So far, no one has
had sufficient desire/motivation to design and implement
the improvements.<br>
<br>
Regards,<br>
<br>
John Criswell
<div>
<div class="h5"><br>
<br>
<br>
On 4/6/15 4:56 PM, Wan Zhiyuan wrote:<br>
</div>
</div>
</div>
<blockquote type="cite">
<div>
<div class="h5">
<div dir="ltr">Dear all,
<div>
<div>I am trying to reproduce the "Percent May
Alias" result described in PLDI 07's paper
"Making Context-Sensitive Points-to Analysis
with Heap Cloning Practical For The Real
World" (<a moz-do-not-send="true"
href="http://llvm.org/pubs/2007-06-10-PLDI-DSA.html"
target="_blank">http://llvm.org/pubs/2007-06-10-PLDI-DSA.html</a>).</div>
</div>
<div><br>
</div>
<div>However, my "Percent May Alias" for all the
benchmarks is much greater, especially "bzip2".</div>
<div><br>
</div>
<div>The DSA code I use is from "<a
moz-do-not-send="true"
href="https://github.com/smackers/smack"
target="_blank">https://github.com/smackers/smack</a>".
I have diff the code between smack and
poolalloc_32. They are almost the same except
the "#include" statements.</div>
<div><br>
</div>
<div>I was wondering whether I need to do some
configuration to make DSA work properly.</div>
<div><br>
</div>
<div>Thank you!</div>
<div><br>
</div>
<div>Zhiyuan</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
</div>
</div>
<pre>_______________________________________________
LLVM Developers mailing list
<a moz-do-not-send="true" href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a moz-do-not-send="true" href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a>
<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><span class="HOEnZb">
</span></pre>
<span class="HOEnZb"> </span></blockquote>
<span class="HOEnZb"> <br>
<br>
<pre cols="72">--
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
<a moz-do-not-send="true" href="http://www.cs.rochester.edu/u/criswell" target="_blank">http://www.cs.rochester.edu/u/criswell</a></pre>
</span></div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
<a class="moz-txt-link-freetext" href="http://www.cs.rochester.edu/u/criswell">http://www.cs.rochester.edu/u/criswell</a></pre>
</body>
</html>