[llvm-dev] Alias analysis results

Thu May 11 01:50:10 PDT 2017

Hello,

I'm trying to get the whole picture of what we are doing in terms
of alias analysis in LLVM. Being new to this I may be missing
some well known things so my apologies if I seem to ask obvious
things.

There are lots of questions arise in my head as I read related
bug reports, sources and documentation, of which looks to be the
most simple is,

Why the current TBAA implementation returns NoAlias for pairs of
accesses for which it has no idea if they really alias?

Let me elaborate on this. My understanding of what the TBAA
implementation is supposed to respond is whether a given pair of
accesses is allowed to alias by the rules of the input language.
(Which in turn leads me to another question, that is, whether
we are really trying to perform C/C++ alias analysis in a
[language-neutral] codegen pass, but let's postpone this for
later.)

This consideration assumes that what the TBAA implementation
actually returns as a result is orthogonal to the
Must/No/MayAlias set, that used to be the result type of various
AA, including TBAA. I would expect that the result of the
complete alias analysis includes both the information on whether
given accesses alias and, as a separate element, whether they are
allowed to alias by the rules of the language. Then we may have
combinations like (MustAlias, AllowedAlias) that seem to be the
common case and combinations like (MustAlias, NotAllowedAlias)
that I would expect to a) generate the breaks-alias-rules kind of
warnings and b) proceed further as any other MustAlias case.

The latter combination is what I would expect for illegal type
puns, for whatever definition of "illegal". Note that for such
cases we currently get MustAlias from BasicAA and NoAlias from
TBAA. In some comments this is considered okay whereas I have to
admit that I don't understand how that may possibly be okay,
meaning it's hard to imagine how a pair of accesses can be
aliasing and non-aliasing at the same time. To me it appears
like that the root cause if this inconsistency is not that any of
these analyses is not correct, but that we are trying to
represent their responses in an inadequate form and we are trying
to use BasicAA as a faster replacement for TBAA in simple cases.

Any thoughts are highly appreciated.

Thanks,

--