[llvm-dev] Existing studies on the benefits of pointer analysis

Tue Mar 15 13:37:20 PDT 2016

Dear llvm devs,

tl;dr: What prevents llvm from switching to a fancier pointer analysis?

Currently, there exists a variety of general-purpose alias analyses in 
the LLVM codebase: basic-aa, globalsmodref-aa, tbaa, scev-aa, and 
cfl-aa. However, only the first three are actually turned on when 
invoking clang with -O2 or -O3 (please correct me if I'm wrong about this).

If one looks at existing research literatures, there are even more 
algorithm to consider for doing pointer analysis. Some are 
field-sensitive, some are field-based, some are flow-sensitive, some are 
context-sensitive. Even for flow-insensitive ones, they could also be 
inclusion-style (-andersen-aa) and equality-style (-steens-aa and 
-ds-aa). Those algorithms are often backed up by rich theoretical 
framework as well as preliminary evaluations which demonstrate their 
superior precision and/or performance.

Given such an abundance choices of pointer analyses that seem to be much 
better in the research land, why does real-world compiler 
infrastructures like llvm still rely on those three simple (and ad-hoc) 
ones to perform IR optimization? Based on my understanding (and again 
please correct me if I am wrong):

(1) The minor reason: those "better" algorithms are very hard to 
implement in a robust way and nobody seems to be interested in trying to 
write and maintain them.
(2) The major reason: it's not clear whether those "better" algorithms 
are actually better for llvm. More precise pointer analyses tend to slow 
down compile time a lot while contributing too little to the 
optimization passes that use them. The benefit one gets from a more 
precise analysis may not justify the compile-time or the maintenance cost.

So my question here is: what kind(s) of precision really justify the 
cost and what kinds do not? Has anybody done any study in the past to 
evaluate what kinds of features in pointer analyses will benefit what 
kinds of optimization passes? Could there potentially be more 
improvement on pointer analysis precision without adding too much 
compile-time/maintenance cost? Has the precision/performance tradeoffs 
got fully explored before?

Any pointers will be much appreciated. No pun intended :)

PS1: To be more concrete, what I am looking for is not some black-box 
information like "we switched from basic-aa to cfl-aa and observed 1% 
improvement at runtime". I believe white-box studies such as "the licm 
pass failed to hoist x instructions because -tbaa is not flow sensitive" 
are much more interesting for understanding the problem here.

PS2: If no such evaluation exists in the past, I'd happy to do that 
myself and report back my findings if anyone here is interested.

-- 
Best Regards,

--
Jia Chen
Department of Computer Science
University of Texas at Austin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/7f9a6a2d/attachment.html>