<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 5/6/15 11:07 PM, Wan Zhiyuan wrote:<br>

    </div>

    <blockquote

cite="mid:CAFBNOftj4iE+55R+_1rOcnE1BcPVNHSMKdS-RMAxJJVMHYPaew@mail.gmail.com"

      type="cite">

      <meta http-equiv="Context-Type" content="text/html; charset=UTF-8">

      <div dir="ltr">Dear John,

        <div>Thank you so much for your reply!</div>

        <div>Please kindly correct me if my understanding is wrong. </div>

        <div><br>

        </div>

        <div>From your reply, I learn that there could be mainly four<span> reasons

            why DSA's precision is low:</span></div>

        <div><span>1. type inference</span></div>

        <div><span>2. influence of external code</span></div>

        <div><span>3. bug in handling function pointers</span></div>

      </div>

    </blockquote>

    <br>

    To be clear, this is a fault in the algorithm itself (i.e., a

    completely correct implementation would still have the problem).  No

    one at the time was aware of this flaw in the algorithm; a master's

    student discovered it several years after the paper was published.<br>

    <br>

    <blockquote

cite="mid:CAFBNOftj4iE+55R+_1rOcnE1BcPVNHSMKdS-RMAxJJVMHYPaew@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><span>4. using the wrong DSA pass</span></div>

        <div><br>

        </div>

        <div>As you mentioned in your previous mail, LLVM changes a

          lot. </div>

        <div>So I was wondering if DSA's algorithm (presented in PLDI

          '07) can achieve a comparable precision on top of LLVM 3.5+

          after implementing the improvements.<br>

        </div>

      </div>

    </blockquote>

    <br>

    That's an open question and dependent on a lot of things.<br>

    <br>

    I also think you're asking the wrong question.  It sounds like you

    want to fix DSA because doing so will make it more precise, but you

    haven't discussed why you need that precision.<br>

    <br>

    In my opinion, fixing DSA will be a lot of work (2-3 months at

    least).  I would not spend time fixing DSA just to see if some

    synthetic precision results get better.  I would fix DSA only if I

    was writing an analysis or transformation that needed points-to,

    type-inference, and/or call graph results more accurate than what

    DSA is currently providing *and* if fixing DSA was the cheapest

    option in getting those more accurate results.<br>

    <br>

    <blockquote

cite="mid:CAFBNOftj4iE+55R+_1rOcnE1BcPVNHSMKdS-RMAxJJVMHYPaew@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div><span>Also, I have diff the DSA code in poolalloc

            release_19 and release_32.  </span><span>It seems to me that

            the changes in DSA mainly comes from the aspects as below:</span></div>

        <div><span>1. release_32 introduces the DataLayout to do some

            type inference;</span></div>

      </div>

    </blockquote>

    <br>

    Not really.  DataLayout used to be called TargetData.  Someone

    changed the name within the past year or so.<br>

    <br>

    <blockquote

cite="mid:CAFBNOftj4iE+55R+_1rOcnE1BcPVNHSMKdS-RMAxJJVMHYPaew@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><span>2. release_32 replaces some data structures with LLVM

            built-in data structures;</span></div>

        <div><span>3. release_32 handles different types of LLVM IR

            instructions, and in a different way.</span></div>

        <div><span>4. the </span>inheritance of Analysis Passes has

          changed a bit.</div>

        <div><br>

        </div>

        <div>Therefore, I was wondering if the introduction of

          DataLayout and changes in LLVM IR cause the imprecision as

          well.</div>

      </div>

    </blockquote>

    <br>

    You're looking at all the cosmetic changes needed to make the DSA

    code compile with modern LLVM.  None of these things is what makes

    DSA less precise.<br>

    <br>

    When I say that changes in LLVM have caused DSA to become less

    precise, I am talking about two issues.  First, newer versions of

    the LLVM front-end generate LLVM IR that DSA does not analyze well. 

    For example, after LLVM 1.9, the C front-ends started changing the

    way they generated code for vararg functions.  DSA treated the new

    code conservatively.<br>

    <br>

    Second, some of the LLVM optimization passes started replacing

    well-typed GEP instructions with code that casted the pointer to a

    char *, did a GEP, and then casted the pointer back to its original

    type.  Since DSA wasn't designed to analyze this idiom, it just

    thought that the pointer was being used like an untyped byte array. 

    This made DSA's results a lot less precise because it dropped field

    sensitivity for these memory objects.<br>

    <br>

    My suggestion on replacing DSA's type-inference code with something

    similar to VSA is my idea on how to address the second issue in a

    way that will avoid having to change DSA every time the LLVM

    optimizations change what they do.  In essence, by making DSA less

    reliant on the LLVM type information, the LLVM optimizations can do

    whatever they want with the type information, and DSA should still

    continue to get the same results.<br>

    <br>

    To summarize, DSA lost precision because it made assumptions about

    the LLVM IR it would be analyzing.  When LLVM changed in ways that

    broke those assumptions, DSA started generating less precise

    results.<br>

    <br>

    Regards,<br>

    <br>

    John Criswell<br>

    <br>

    <blockquote

cite="mid:CAFBNOftj4iE+55R+_1rOcnE1BcPVNHSMKdS-RMAxJJVMHYPaew@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>Thank you!</div>

        <div><br>

        </div>

        <div>Regards,</div>

        <div><br>

        </div>

        <div>Zhiyuan Wan</div>

        <div><br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Wed, May 6, 2015 at 10:35 AM, John

          Criswell <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:jtcriswel@gmail.com" target="_blank">jtcriswel@gmail.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote">

            <div><span class="">

                <div>On 5/5/15 5:33 PM, Wan Zhiyuan wrote:<br>

                </div>

                <blockquote type="cite">

                  <div dir="ltr">Dear John,

                    <div>I intend to implement the improvements on DSA.</div>

                  </div>

                </blockquote>

                <br>

              </span> There be nasty dragons in DSA.  Don't say I didn't

              warn you.<br>

              :)<br>

              <br>

              On a side note, I recently discovered that someone (I

              think Will Dietz) updated the code to work with LLVM

              mainline, so you should be able to update to a newer

              version of LLVM and use DSA.  I've created a snapshot of

              the LLVM source tree and a modified version of the

              poolalloc tree in <a moz-do-not-send="true"

                href="https://github.com/jtcriswell/llvm-dsa"

                target="_blank">https://github.com/jtcriswell/llvm-dsa</a>

              that you may find useful.  I think Will is also

              maintaining a github tree that he updates regularly.<span

                class=""><br>

                <br>

                <blockquote type="cite">

                  <div dir="ltr">

                    <div>After running DSA on SPEC, I found DSA gives

                      low precision for mcf and bzip2.</div>

                    <div>I have checked the most imprecise c files in

                      mcf an found that the code seems to be a mixture

                      of "PHI" and "GEP" instructions.</div>

                    <div><br>

                    </div>

                    <div>Could you please give me some hints about what

                      the big picture of the improvement should be and

                      how to start?</div>

                  </div>

                </blockquote>

                <br>

              </span> There may be many reasons why DSA's precision is

              low.  You'll need to figure out what the reason is for

              each program.  In some cases, it may be type inference. 

              In other cases, it may be that DSA is cognizant of the

              influence of external code (which causes it to correctly

              give pessimistic results).  For programs using function

              pointers, there's a bug in the algorithm in which DSNodes

              that should be marked complete are not.  It could also be

              that you're using the wrong DSA pass (e.g., using EQTD

              when TD will do).<br>

              <br>

              Looking at type inference specifically, the problem, in a

              nutshell, is that DSA's type inference should not rely

              upon LLVM's type annotations.  It should just create a map

              from offsets to types.  Some work on this has already been

              done (e.g., DSA can figure out that casting a pointer to a

              structure into a pointer to the type of the structure's

              first field is still offset 0).  However, there are still

              places in which DSA is relying upon LLVM's type

              annotations.<br>

              <br>

              One thing that I would like to look at is changing how DSA

              analyzes arrays of structures.  Right now, DSA tries to

              infer a structure type for a memory object and then tries

              to infer whether the memory object is a singleton

              structure or an array of structures (you can see this in

              DSA's interface; you can see a map between offsets and

              types and an 'A' flag indicating that the object is an

              array).  I think this makes DSA needlessly complicated.<br>

              <br>

              I think it would be better if DSA did what Value Set

              Analysis (VSA) does.  VSA was designed to analyze untyped

              binary code.  For each abstract memory object, it creates

              a map of 4-tuples to types.  Each 4-tuple represents a

              formula ax+b as (a,b,c,d) in which b if offset, a is

              stride, and x is constrained between values c and d (c and

              d can be constants or +-infinity).  For example, if you

              have an array of struct {int a ; char * b} on a 32-bit

              machine, DSA currently tries to figure out that there's an

              array of structure elements in which there's an int at

              offset 0 and a char * at offset 4 within the structure. 

              VSA would say that there's a memory object with an int at

              every multiple of 4x+0 offset and a char * at every 4x+4

              offset.<br>

              <br>

              The VSA approach should be agnostic to all sorts of weird

              casting, embedded arrays, and embedded unions, though this

              is an educated guess at present.<br>

              <br>

              Regards,<br>

              <br>

              John Criswell

              <div>

                <div class="h5"><br>

                  <br>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div><br>

                      </div>

                      <div>Thank you!</div>

                      <div><br>

                      </div>

                      <div>Regards,</div>

                      <div>Zhiyuan</div>

                    </div>

                    <div class="gmail_extra"><br>

                      <div class="gmail_quote">On Mon, Apr 6, 2015 at

                        5:22 PM, John Criswell <span dir="ltr"><<a

                            moz-do-not-send="true"

                            href="mailto:jtcriswel@gmail.com"

                            target="_blank">jtcriswel@gmail.com</a>></span>

                        wrote:<br>

                        <blockquote class="gmail_quote">

                          <div>

                            <div>Dear Zhiyuan,<br>

                              <br>

                              In order to reproduce the results from the

                              paper, you'll need to replicate a system

                              from that era.  You'll need to use the

                              same version of LLVM and DSA that the

                              paper used.  I think that was LLVM 1.9

                              (the release_19 branch of LLVM and

                              poolalloc), but I'm not sure.  You should

                              check to see if the paper specifies the

                              version.<br>

                              <br>

                              As you'll be using a very old version of

                              LLVM, it may be worth setting up a VM with

                              a corresponding old version of Linux.  I

                              suspect newer compilers will not compile a

                              version of LLVM that is that old, so using

                              an old version of Linux with an old

                              version of GCC may be needed.  I think

                              Fedora Core 2 is the OS we were using at

                              the time.<br>

                              <br>

                              To answer the question of why you can't

                              use a modern version of LLVM and

                              poolalloc, it's because LLVM has changed

                              significantly.  DSA relies upon the type

                              annotations provided in the LLVM IR to

                              "bootstrap" its type inference (bootstrap

                              is not quite the right word, but it's the

                              closest one of which I could think).  As

                              LLVM matured, transformations would ditch

                              the type information (e.g., transforming

                              typed GEPs into untyped GEPs into a byte

                              array), making DSA's ability to do

                              type-inference (and thereby improving

                              field sensitivity) more difficult.  Throw

                              into the mix the fact that DSA is

                              maintained by an academic research group,

                              and the result is that modern DSA doesn't

                              have the accuracy that the original DSA

                              did.<br>

                              <br>

                              The good news is that I think DSA can be

                              fixed by making its type-inferencing code

                              smarter.  The bad news is that it'd be a

                              fair amount of work to do.  So far, no one

                              has had sufficient desire/motivation to

                              design and implement the improvements.<br>

                              <br>

                              Regards,<br>

                              <br>

                              John Criswell

                              <div>

                                <div><br>

                                  <br>

                                  <br>

                                  On 4/6/15 4:56 PM, Wan Zhiyuan wrote:<br>

                                </div>

                              </div>

                            </div>

                            <blockquote type="cite">

                              <div>

                                <div>

                                  <div dir="ltr">Dear all,

                                    <div>

                                      <div>I am trying to reproduce the

                                        "Percent May Alias" result

                                        described in PLDI 07's paper

                                        "Making Context-Sensitive

                                        Points-to Analysis with Heap

                                        Cloning Practical For The Real

                                        World" (<a

                                          moz-do-not-send="true"

                                          href="http://llvm.org/pubs/2007-06-10-PLDI-DSA.html"

                                          target="_blank">http://llvm.org/pubs/2007-06-10-PLDI-DSA.html</a>).</div>

                                    </div>

                                    <div><br>

                                    </div>

                                    <div>However, my "Percent May Alias"

                                      for all the benchmarks is much

                                      greater, especially "bzip2".</div>

                                    <div><br>

                                    </div>

                                    <div>The DSA code I use is from "<a

                                        moz-do-not-send="true"

                                        href="https://github.com/smackers/smack"

                                        target="_blank">https://github.com/smackers/smack</a>".

                                      I have diff the code between smack

                                      and poolalloc_32. They are almost

                                      the same except the "#include"

                                      statements.</div>

                                    <div><br>

                                    </div>

                                    <div>I was wondering whether I need

                                      to do some configuration to make

                                      DSA work properly.</div>

                                    <div><br>

                                    </div>

                                    <div>Thank you!</div>

                                    <div><br>

                                    </div>

                                    <div>Zhiyuan</div>

                                    <div><br>

                                    </div>

                                    <div><br>

                                    </div>

                                    <div><br>

                                    </div>

                                  </div>

                                  <br>

                                  <fieldset></fieldset>

                                  <br>

                                </div>

                              </div>

                              <pre>_______________________________________________

LLVM Developers mailing list

<a moz-do-not-send="true" href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a moz-do-not-send="true" href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a>

<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><span>

</span></pre>

                              <span> </span></blockquote>

                            <span> <br>

                              <br>

                              <pre cols="72">-- 

John Criswell

Assistant Professor

Department of Computer Science, University of Rochester

<a moz-do-not-send="true" href="http://www.cs.rochester.edu/u/criswell" target="_blank">http://www.cs.rochester.edu/u/criswell</a></pre>

                            </span></div>

                        </blockquote>

                      </div>

                      <br>

                    </div>

                  </blockquote>

                  <br>

                  <br>

                  <pre cols="72">-- 

John Criswell

Assistant Professor

Department of Computer Science, University of Rochester

<a moz-do-not-send="true" href="http://www.cs.rochester.edu/u/criswell" target="_blank">http://www.cs.rochester.edu/u/criswell</a></pre>

                </div>

              </div>

            </div>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

John Criswell

Assistant Professor

Department of Computer Science, University of Rochester

<a class="moz-txt-link-freetext" href="http://www.cs.rochester.edu/u/criswell">http://www.cs.rochester.edu/u/criswell</a></pre>

  </body>

</html>