<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Dear llvm devs,<br>

    <br>

    tl;dr: What prevents llvm from switching to a fancier pointer

    analysis?<br>

    <br>

    Currently, there exists a variety of general-purpose alias analyses

    in the LLVM codebase: basic-aa, globalsmodref-aa, tbaa, scev-aa, and

    cfl-aa. However, only the first three are actually turned on when

    invoking clang with -O2 or -O3 (please correct me if I'm wrong about

    this).<br>

    <br>

    If one looks at existing research literatures, there are even more

    algorithm to consider for doing pointer analysis. Some are

    field-sensitive, some are field-based, some are flow-sensitive, some

    are context-sensitive. Even for flow-insensitive ones, they could

    also be inclusion-style (-andersen-aa) and equality-style

    (-steens-aa and -ds-aa). Those algorithms are often backed up by

    rich theoretical framework as well as preliminary evaluations which

    demonstrate their superior precision and/or performance.<br>

    <br>

    Given such an abundance choices of pointer analyses that seem to be

    much better in the research land, why does real-world compiler

    infrastructures like llvm still rely on those three simple (and

    ad-hoc) ones to perform IR optimization? Based on my understanding

    (and again please correct me if I am wrong):<br>

    <br>

    (1) The minor reason: those "better" algorithms are very hard to

    implement in a robust way and nobody seems to be interested in

    trying to write and maintain them.<br>

    (2) The major reason: it's not clear whether those "better"

    algorithms are actually better for llvm. More precise pointer

    analyses tend to slow down compile time a lot while contributing too

    little to the optimization passes that use them. The benefit one

    gets from a more precise analysis may not justify the compile-time

    or the maintenance cost.<br>

    <br>

    So my question here is: what kind(s) of precision really justify the

    cost and what kinds do not? Has anybody done any study in the past

    to evaluate what kinds of features in pointer analyses will benefit

    what kinds of optimization passes? Could there potentially be more

    improvement on pointer analysis precision without adding too much

    compile-time/maintenance cost? Has the precision/performance

    tradeoffs got fully explored before? <br>

    <br>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

    Any pointers will be much appreciated. No pun intended :)<br>

    <br>

    PS1: To be more concrete, what I am looking for is not some

    black-box information like "we switched from basic-aa to cfl-aa and

    observed 1% improvement at runtime". I believe white-box studies

    such as "the licm pass failed to hoist x instructions because -tbaa

    is not flow sensitive" are much more interesting for understanding

    the problem here.<br>

    <br>

    PS2: If no such evaluation exists in the past, I'd happy to do that

    myself and report back my findings if anyone here is interested.<br>

    <br>

    -- <br>

    Best Regards,<br>

    <br>

    --<br>

    Jia Chen<br>

    Department of Computer Science<br>

    University of Texas at Austin<br>

  </body>

</html>