<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 5/5/15 5:33 PM, Wan Zhiyuan wrote:<br>

    </div>

    <blockquote

cite="mid:CAFBNOftYr3F+L05q3637u=Kc26oLUu1tYueXgL71T1QqyJ2egQ@mail.gmail.com"

      type="cite">

      <meta http-equiv="Context-Type" content="text/html; charset=UTF-8">

      <div dir="ltr">Dear John,

        <div>I intend to implement the improvements on DSA.</div>

      </div>

    </blockquote>

    <br>

    There be nasty dragons in DSA.  Don't say I didn't warn you.<br>

    :)<br>

    <br>

    On a side note, I recently discovered that someone (I think Will

    Dietz) updated the code to work with LLVM mainline, so you should be

    able to update to a newer version of LLVM and use DSA.  I've created

    a snapshot of the LLVM source tree and a modified version of the

    poolalloc tree in <a class="moz-txt-link-freetext" href="https://github.com/jtcriswell/llvm-dsa">https://github.com/jtcriswell/llvm-dsa</a> that you

    may find useful.  I think Will is also maintaining a github tree

    that he updates regularly.<br>

    <br>

    <blockquote

cite="mid:CAFBNOftYr3F+L05q3637u=Kc26oLUu1tYueXgL71T1QqyJ2egQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>After running DSA on SPEC, I found DSA gives low precision

          for mcf and bzip2.</div>

        <div>I have checked the most imprecise c files in mcf an found

          that the code seems to be a mixture of "PHI" and "GEP"

          instructions.</div>

        <div><br>

        </div>

        <div>Could you please give me some hints about what the big

          picture of the improvement should be and how to start?</div>

      </div>

    </blockquote>

    <br>

    There may be many reasons why DSA's precision is low.  You'll need

    to figure out what the reason is for each program.  In some cases,

    it may be type inference.  In other cases, it may be that DSA is

    cognizant of the influence of external code (which causes it to

    correctly give pessimistic results).  For programs using function

    pointers, there's a bug in the algorithm in which DSNodes that

    should be marked complete are not.  It could also be that you're

    using the wrong DSA pass (e.g., using EQTD when TD will do).<br>

    <br>

    Looking at type inference specifically, the problem, in a nutshell,

    is that DSA's type inference should not rely upon LLVM's type

    annotations.  It should just create a map from offsets to types. 

    Some work on this has already been done (e.g., DSA can figure out

    that casting a pointer to a structure into a pointer to the type of

    the structure's first field is still offset 0).  However, there are

    still places in which DSA is relying upon LLVM's type annotations.<br>

    <br>

    One thing that I would like to look at is changing how DSA analyzes

    arrays of structures.  Right now, DSA tries to infer a structure

    type for a memory object and then tries to infer whether the memory

    object is a singleton structure or an array of structures (you can

    see this in DSA's interface; you can see a map between offsets and

    types and an 'A' flag indicating that the object is an array).  I

    think this makes DSA needlessly complicated.<br>

    <br>

    I think it would be better if DSA did what Value Set Analysis (VSA)

    does.  VSA was designed to analyze untyped binary code.  For each

    abstract memory object, it creates a map of 4-tuples to types.  Each

    4-tuple represents a formula ax+b as (a,b,c,d) in which b if offset,

    a is stride, and x is constrained between values c and d (c and d

    can be constants or +-infinity).  For example, if you have an array

    of struct {int a ; char * b} on a 32-bit machine, DSA currently

    tries to figure out that there's an array of structure elements in

    which there's an int at offset 0 and a char * at offset 4 within the

    structure.  VSA would say that there's a memory object with an int

    at every multiple of 4x+0 offset and a char * at every 4x+4 offset.<br>

    <br>

    The VSA approach should be agnostic to all sorts of weird casting,

    embedded arrays, and embedded unions, though this is an educated

    guess at present.<br>

    <br>

    Regards,<br>

    <br>

    John Criswell<br>

    <br>

    <blockquote

cite="mid:CAFBNOftYr3F+L05q3637u=Kc26oLUu1tYueXgL71T1QqyJ2egQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>Thank you!</div>

        <div><br>

        </div>

        <div>Regards,</div>

        <div>Zhiyuan</div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Mon, Apr 6, 2015 at 5:22 PM, John

          Criswell <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:jtcriswel@gmail.com" target="_blank">jtcriswel@gmail.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote">

            <div>

              <div>Dear Zhiyuan,<br>

                <br>

                In order to reproduce the results from the paper, you'll

                need to replicate a system from that era.  You'll need

                to use the same version of LLVM and DSA that the paper

                used.  I think that was LLVM 1.9 (the release_19 branch

                of LLVM and poolalloc), but I'm not sure.  You should

                check to see if the paper specifies the version.<br>

                <br>

                As you'll be using a very old version of LLVM, it may be

                worth setting up a VM with a corresponding old version

                of Linux.  I suspect newer compilers will not compile a

                version of LLVM that is that old, so using an old

                version of Linux with an old version of GCC may be

                needed.  I think Fedora Core 2 is the OS we were using

                at the time.<br>

                <br>

                To answer the question of why you can't use a modern

                version of LLVM and poolalloc, it's because LLVM has

                changed significantly.  DSA relies upon the type

                annotations provided in the LLVM IR to "bootstrap" its

                type inference (bootstrap is not quite the right word,

                but it's the closest one of which I could think).  As

                LLVM matured, transformations would ditch the type

                information (e.g., transforming typed GEPs into untyped

                GEPs into a byte array), making DSA's ability to do

                type-inference (and thereby improving field sensitivity)

                more difficult.  Throw into the mix the fact that DSA is

                maintained by an academic research group, and the result

                is that modern DSA doesn't have the accuracy that the

                original DSA did.<br>

                <br>

                The good news is that I think DSA can be fixed by making

                its type-inferencing code smarter.  The bad news is that

                it'd be a fair amount of work to do.  So far, no one has

                had sufficient desire/motivation to design and implement

                the improvements.<br>

                <br>

                Regards,<br>

                <br>

                John Criswell

                <div>

                  <div class="h5"><br>

                    <br>

                    <br>

                    On 4/6/15 4:56 PM, Wan Zhiyuan wrote:<br>

                  </div>

                </div>

              </div>

              <blockquote type="cite">

                <div>

                  <div class="h5">

                    <div dir="ltr">Dear all,

                      <div>

                        <div>I am trying to reproduce the "Percent May

                          Alias" result described in PLDI 07's paper

                          "Making Context-Sensitive Points-to Analysis

                          with Heap Cloning Practical For The Real

                          World" (<a moz-do-not-send="true"

                            href="http://llvm.org/pubs/2007-06-10-PLDI-DSA.html"

                            target="_blank">http://llvm.org/pubs/2007-06-10-PLDI-DSA.html</a>).</div>

                      </div>

                      <div><br>

                      </div>

                      <div>However, my "Percent May Alias" for all the

                        benchmarks is much greater, especially "bzip2".</div>

                      <div><br>

                      </div>

                      <div>The DSA code I use is from "<a

                          moz-do-not-send="true"

                          href="https://github.com/smackers/smack"

                          target="_blank">https://github.com/smackers/smack</a>".

                        I have diff the code between smack and

                        poolalloc_32. They are almost the same except

                        the "#include" statements.</div>

                      <div><br>

                      </div>

                      <div>I was wondering whether I need to do some

                        configuration to make DSA work properly.</div>

                      <div><br>

                      </div>

                      <div>Thank you!</div>

                      <div><br>

                      </div>

                      <div>Zhiyuan</div>

                      <div><br>

                      </div>

                      <div><br>

                      </div>

                      <div><br>

                      </div>

                    </div>

                    <br>

                    <fieldset></fieldset>

                    <br>

                  </div>

                </div>

                <pre>_______________________________________________

LLVM Developers mailing list

<a moz-do-not-send="true" href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a moz-do-not-send="true" href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a>

<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><span class="HOEnZb">

</span></pre>

                <span class="HOEnZb"> </span></blockquote>

              <span class="HOEnZb"> <br>

                <br>

                <pre cols="72">-- 

John Criswell

Assistant Professor

Department of Computer Science, University of Rochester

<a moz-do-not-send="true" href="http://www.cs.rochester.edu/u/criswell" target="_blank">http://www.cs.rochester.edu/u/criswell</a></pre>

              </span></div>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

John Criswell

Assistant Professor

Department of Computer Science, University of Rochester

<a class="moz-txt-link-freetext" href="http://www.cs.rochester.edu/u/criswell">http://www.cs.rochester.edu/u/criswell</a></pre>

  </body>

</html>