<br><br><div class="gmail_quote"><div dir="ltr">On Mon, Mar 21, 2016, 10:00 AM Jia Chen <<a href="mailto:jchen@cs.utexas.edu">jchen@cs.utexas.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  

    

  

  <div bgcolor="#FFFFFF" text="#000000">

    Hi Daniel,</div><div bgcolor="#FFFFFF" text="#000000"><br>

    <br>

    <div>On 03/21/2016 11:05 AM, Daniel Berlin

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Tue, Mar 15, 2016 at 1:37 PM, Jia

            Chen via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank"></a><a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> Dear llvm devs,<br>

                <br>

                tl;dr: What prevents llvm from switching to a fancier

                pointer analysis?<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>Nothing.</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> <br>

                Currently, there exists a variety of general-purpose

                alias analyses in the LLVM codebase: basic-aa,

                globalsmodref-aa, tbaa, scev-aa, and cfl-aa. However,

                only the first three are actually turned on when

                invoking clang with -O2 or -O3 (please correct me if I'm

                wrong about this).<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>This is correct.</div>

            <div>Eventually, i hope george will have time to get back to

              CFL-AA and turn it on by default.</div>

            <div> <br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> <br>

                If one looks at existing research literatures, there are

                even more algorithm to consider for doing pointer

                analysis. Some are field-sensitive, some are

                field-based, some are flow-sensitive, some are

                context-sensitive. Even for flow-insensitive ones, they

                could also be inclusion-style (-andersen-aa) and

                equality-style (-steens-aa and -ds-aa). Those algorithms

                are often backed up by rich theoretical framework as

                well as preliminary evaluations which demonstrate their

                superior precision and/or performance.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>CFL-AA is a middle ground between steens and anders,

              can be easily made field and context sensitive, etc.</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> <br>

                Given such an abundance choices of pointer analyses that

                seem to be much better in the research land, why does

                real-world compiler infrastructures like llvm still rely

                on those three simple (and ad-hoc) ones to perform IR

                optimization? </div>

            </blockquote>

            <div><br>

              Time and energy.</div>

            <div> <br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">Based on my

                understanding (and again please correct me if I am

                wrong):<br>

                <br>

                (1) The minor reason: those "better" algorithms are very

                hard to implement in a robust way and nobody seems to be

                interested in trying to write and maintain them.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>This is false.  Heck, at the time i implemented it in

              GCC, field-sensitive andersen's analysis was unknown in

              production compilers.  That's why i'm thanked in all the

              papers - i did the engineering work to make it fast and

              reliable.</div>

            <div><br>

            </div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> (2) The major

                reason: it's not clear whether those "better" algorithms

                are actually better for llvm. More precise pointer

                analyses tend to slow down compile time a lot while

                contributing too little to the optimization passes that

                use them. The benefit one gets from a more precise

                analysis may not justify the compile-time or the

                maintenance cost.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div><br>

            </div>

            <div>CFL-AA is probably the right trade-off here. You can

              stop at any time and have correct answers, you can be as

              lazy as you like.</div>

            <div>etc.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br></div><div bgcolor="#FFFFFF" text="#000000">

    Regarding CFL-AA: in my understanding, cfl-aa does not introduce a

    new precision tradeoff.</div></blockquote></div><div><br></div><div>You can make it do what you want much easier than existing frameworks in my experience.</div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"> It is merely a demand-driven way of

    implementing existing analyses, by extending those algorithms to

    track additional "pointed-to-by" information. Laziness may help with

    the running time of the cfl analysis when only partial points-to

    info is needed, but if the client wants to do a whole-program

    analysis and require whole-program points-to info (which is usually

    true for optimizing compilers since they will eventually examine and

    touch every piece of the codes given to it), should cfl-aa be no

    different than traditional whatever-sensitive pointer analysis?</div></blockquote></div><div><br></div><div>CFL, at least when I ran the numbers, was faster at all pairs than existing analysis.</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>

    <br>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>The reality is i think you overlook the realistic

              answer:<br>

              <br>

            </div>

            <div>3. Nobody has had time or energy to fix up CFL-AA or

              SCEV-AA. They spend their time on lower-hanging fruit

              until that lower hanging fruit is gone.</div>

            <div><br>

            </div>

            <div>IE For the moment, CFL-AA and SCEV-AA and ... are not

              the thing holding llvm back.</div>

            <div><br>

            </div>

          </div>

        </div>

      </div>

    </blockquote></div><div bgcolor="#FFFFFF" text="#000000">

    I'd love to hear some examples of "lower-hanging fruit" in LLVM,

    especially in the area of middle-end analyses and optimizations. I

    thought LLVM is mature enough that any obvious chances of

    improvement in analyses and optimization have already been taken,

    no?</div></blockquote></div><div><br></div><div>No.</div><div>For example, gvn and pre are fairly simple implementations that miss obvious optimizations.</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> <br>

                So my question here is: what kind(s) of precision really

                justify the cost and what kinds do not?</div>

            </blockquote>

            <div><br>

            </div>

            <div>Depends entirely on your applications.</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> Has anybody done

                any study in the past to evaluate what kinds of features

                in pointer analyses will benefit what kinds of

                optimization passes?</div>

            </blockquote>

            <div>Yes.</div>

            <div>Chris did many years ago, and i've done one more

              recently.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br></div><div bgcolor="#FFFFFF" text="#000000">

    Great! Are they published somewhere? Can the data be shared somehow?</div><div bgcolor="#FFFFFF" text="#000000"></div></blockquote></div><div><br></div><div>No, and sadly, no</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> Could there

                potentially be more improvement on pointer analysis

                precision without adding too much

                compile-time/maintenance cost? </div>

            </blockquote>

            <div>Yes.</div>

            <div><br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">Has the

                precision/performance tradeoffs got fully explored

                before? <br>

              </div>

            </blockquote>

            <div>Yes </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> <br>

                Any pointers will be much appreciated. No pun intended

                :)<br>

                <br>

                PS1: To be more concrete, what I am looking for is not

                some black-box information like "we switched from

                basic-aa to cfl-aa and observed 1% improvement at

                runtime". I believe white-box studies such as "the licm

                pass failed to hoist x instructions because -tbaa is not

                flow sensitive" are much more interesting for

                understanding the problem here.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>White-box studies are very application specific, and

              often very pass specific.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br></div><div bgcolor="#FFFFFF" text="#000000">

    And I understand that. My goal is to look for commonalities among

    passes and applications.</div></blockquote></div><div><br></div><div>This generally just discovers things we already know, which is that certain passes have deficiencies.</div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"> However, if the existing studies you

    mentioned above are extensive and conclusive enough to show that the

    aas we have today have fully exploited almost all such

    commonalities, then it's probably a better idea for me to find

    something else to work on. But again, I'd like to see the data

    first.</div></blockquote></div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> <br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> <br>

                PS2: If no such evaluation exists in the past, I'd happy

                to do that myself and report back my findings if anyone

                here is interested.</div>

            </blockquote>

            <div>I don't think any of the world is set up to make that

              valuable.</div>

            <div><br>

            </div>

            <div>Nothing takes advantage of context sensitivity, flow

              sensitivity, etc.</div>

          </div>

        </div>

      </div>

    </blockquote></div><div bgcolor="#FFFFFF" text="#000000">

    I agree that nothing takes advantage of context sensitivity. But I

    would argue against flow sensitivity, field sensitivity, heap model

    and external function models</div></blockquote></div><div><br></div><div>I'm talking about infrastructure wise, nothing in llvm can take advantage because the APIs don't exist.</div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">. Flow sensitivity is helpful when the

    optimization pass itself is flow-sensitive (e.g. adce, gvn), </div></blockquote></div><div><br></div><div>No api exists that they could use right now for this, and you'd have to change things to understand answers are not valid over the entire function.</div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">and

    field sensitivity is helpful when a struct contains multiple

    pointers. Heap sensitivity is basically what motivates Chris

    Lattner's PLDI'07 paper, and external function models are helpful

    since without them the analysis has to be extremely conservative and

    concludes everything that external functions touch all may-alias

    each other. <br></div></blockquote></div><div><br></div><div>I don't disagree, this is the one to two years of work I said would be needed</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"></div><div bgcolor="#FFFFFF" text="#000000">

    <br>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

          </div>

        </div>

      </div>

    </blockquote>

    -- <br>

    Best Regards,<br>

    <br>

    --<br>

    Jia Chen<br>

  </div></blockquote></div>