<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Dear llvm devs,<br>
<br>
tl;dr: What prevents llvm from switching to a fancier pointer
analysis?<br>
<br>
Currently, there exists a variety of general-purpose alias analyses
in the LLVM codebase: basic-aa, globalsmodref-aa, tbaa, scev-aa, and
cfl-aa. However, only the first three are actually turned on when
invoking clang with -O2 or -O3 (please correct me if I'm wrong about
this).<br>
<br>
If one looks at existing research literatures, there are even more
algorithm to consider for doing pointer analysis. Some are
field-sensitive, some are field-based, some are flow-sensitive, some
are context-sensitive. Even for flow-insensitive ones, they could
also be inclusion-style (-andersen-aa) and equality-style
(-steens-aa and -ds-aa). Those algorithms are often backed up by
rich theoretical framework as well as preliminary evaluations which
demonstrate their superior precision and/or performance.<br>
<br>
Given such an abundance choices of pointer analyses that seem to be
much better in the research land, why does real-world compiler
infrastructures like llvm still rely on those three simple (and
ad-hoc) ones to perform IR optimization? Based on my understanding
(and again please correct me if I am wrong):<br>
<br>
(1) The minor reason: those "better" algorithms are very hard to
implement in a robust way and nobody seems to be interested in
trying to write and maintain them.<br>
(2) The major reason: it's not clear whether those "better"
algorithms are actually better for llvm. More precise pointer
analyses tend to slow down compile time a lot while contributing too
little to the optimization passes that use them. The benefit one
gets from a more precise analysis may not justify the compile-time
or the maintenance cost.<br>
<br>
So my question here is: what kind(s) of precision really justify the
cost and what kinds do not? Has anybody done any study in the past
to evaluate what kinds of features in pointer analyses will benefit
what kinds of optimization passes? Could there potentially be more
improvement on pointer analysis precision without adding too much
compile-time/maintenance cost? Has the precision/performance
tradeoffs got fully explored before? <br>
<br>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
Any pointers will be much appreciated. No pun intended :)<br>
<br>
PS1: To be more concrete, what I am looking for is not some
black-box information like "we switched from basic-aa to cfl-aa and
observed 1% improvement at runtime". I believe white-box studies
such as "the licm pass failed to hoist x instructions because -tbaa
is not flow sensitive" are much more interesting for understanding
the problem here.<br>
<br>
PS2: If no such evaluation exists in the past, I'd happy to do that
myself and report back my findings if anyone here is interested.<br>
<br>
-- <br>
Best Regards,<br>
<br>
--<br>
Jia Chen<br>
Department of Computer Science<br>
University of Texas at Austin<br>
</body>
</html>