[llvm-dev] [GSoC 2016] Better Alias Analysis By Default - Mid Term Summary
Jia Chen via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 21 15:29:13 PDT 2016
Dear LLVM Community,
This is a brief summary of what I've done so far for CFL-AA, and what I
plan to do next.
tl;dr: CFL-AA is getting saner. Low-hanging fruits on its improvement
have almost been picked up. I can either make CFL-AA more precise (with
certain performance cost), or teach other passes to capitalize on CFL-AA
better as the next step. Comments and suggestions are more than welcomed.
Before my project starts, I asked on the mailing list for testing
CFL-AA. According to Geoff Berry
(http://lists.llvm.org/pipermail/llvm-dev/2016-May/099900.html),
- Enabling CFL-AA in the complilation pipeline did not cause any
correctness issues.
- Enabling CFL-AA in the complilation pipeline did not affect
performance of the compiled program in any significant way.
In other words, CFL-AA was very conservative at that moment, to the
point that it did not offer much beyond what BasicAA can do.
Fortunately, over the last month the situation has been improved. Here
is a list of precision enhancement I made for CFL-AA:
- [r271322] Remove aliasing relations between GEP pointers and GEP
indices. Before this patch, CFL-AA will claim that a and b may-alias
each other in the following codes:
int a[10], b[10];
a[N] = ...;
b[N] = ...;
This seemingly insane behavior was actually there to safeguard certain
cases where people do crazy stuffs with pointer arithmetics. Later we
figured out that those crazy behaviors were in fact UB and therefore we
could drop support for it.
- [r271421] Teach CFL-AA to understand function attributes. CFL-AA used
to treat opaque function calls in a very conservative manner. However,
we know that library calls such as malloc(), free() and strcmp() are
marked with special attributes we can use to help resolve aliasing
queries. With this patch, we can safely rely on CFL-AA to say "noalias"
for a and b in the following code snippet:
char* a = (char*) malloc(len);
char* b = (char*) malloc(len);
... // popluate string a and b here
if (strcmp(a, b) == 0)
...
- [r272040] Improve precision for inttoptr/ptrtoint. If CFL-AA found
this snippet:
int a, b, *p = &a, *q = &b;
int c= (int)p + (int)q;
It used to report may-alias for p and q, just because both of them are
casted to integers. This was later relaxed in a way that pointers are
only treated conservatively when you cast it back from integers.
- [r272335, r272690, r273219, r273229] Improvement on inter-procedural
analysis. This is the one single feature of CFL-AA that enables it to
really offer something that BasicAA doesn't. The work is still ongoing,
but the central idea here is that we can make CFL-AA look past function
calls and yield almost the same result as if the function call was
inlined. So if we have
int *p = ..., *q = /* something other than *p */;
foo(&p, &q);
... // use p and q here
As long as CFL-AA can see the body of foo(), and as long as foo()
doesn't do anything that makes p alias q, CFL-AA won't count p and q as
may-alias pairs when analyzing the codes after the call to foo(). I
don't think this is currently possible with BasicAA.
Looking forward, there are certainly many other ways to further improve
CFL-AA's precision (e.g. adding field sensitivity, and making the
analysis inclusion-based rather than equality-based). However, I feel
like I've almost pushed the analysis to a point where in exchange for
more precision we may inevitably start to pay extra cost and observe
some noticeable performance drops. If the performance problem turns out
to be not that bad, I could just keep going where I'm headed.
But if it becomes a problem, there is also a plan B: Notice that what
I've said is all about alias queries. In LLVM, clients of alias analysis
also care about other things like mod-ref info, in which area CFL-AA
hasn't improved that much. Also, code snippets I listed before to
justify my patches look pathetically simple and very unrealistic. What I
really want to look into is some real-world client passes running on
real-world benchmarks, and see if any precision gets lost during the
interaction between the client pass and the alias analysis pass.
If you have any comments or suggestions, please let me know. Also, if
you have any concrete examples in mind that existing AAs (including
CFL-AA) can't do but you them to be handled properly, please tell me! I
am more than happy to make everyone's code a little bit faster :)
--
Best Regards,
--
Jia Chen
More information about the llvm-dev
mailing list