[llvm-dev] RFC: Resolving TBAA issues

Daniel Berlin via llvm-dev llvm-dev at lists.llvm.org
Thu Aug 17 21:25:28 PDT 2017


<just want to focus on these parts for a second. *All* of these
representations are really access path representations, just encoded
slightly different ways, and, as a result, with various parts of the rules
in slightly different places>

Imagine that we took the enhancement we previously discussed, but instead
> of implementing it directly, we just directly encoded for every access the
> path from the access type to the root. I think it would look very much like
> this proposal.


Something like it, yes.
Note that this representation also has special vtable and union groups, for
example, and assumes very c-like rules in several places around the
sequencing of access types.
Also note that  in the language of access paths, C allows overlapping that
does not exist in, for example, Java
(This is true in both the points-to and type-based domains).
Ada has discriminated unions (and you could not use the "union" in this
proposal to represent them)
etc.



>
> We generally explain our current TBAA rules by saying that they're generic
>> but motivated by C/C++ rules.
>>
>
> We do say that but that's not really what our implementation does in any
> way.
>
>
> Really? I thought it was motivated by C/C++ rules. When you say that it's
> "not really what our implementation does", is this because it drops a lot
> of potential information in the translation to our generic form?
>

What we do is completely and totally unrelated to types as they exist in
the original language.  It is a completely and totally generic
implementation with no rules related to the original language.  The
original language, for example, has dynamic typing rules, object lifetime
rules, etc. We have none of these.

The types do not, in fact,even always have the same relationship we give
them.

We have a tree and some nodes, and we happen to name them after types.
Like this proposal, they also represent computed access paths, in a much
simpler language.

The nodes represent alias set members (either one or many)
The edges represent either "access at offset" (where the default is 0 if
unweighted).
We have a simple rule that that processes the access paths related to
ancestry.

You can view it as either a grammar parsing or a graph reachability problem
depending on what works for you (in fact, it's really a Dyck-CFL
reachability problem on bidirected trees)

The reachability rule is  usually given as: If node Target is reachable
from node Source, offset another through either it's children or the
upwards edges (or was it downwards, i always screw up which direction we go
in), they may-alias

You could also implement it as a real grammar parsing problem (IE you quite
literally could generate strings from tag + offset, and parse them against
the grammar, and see if the terminal nodes contain your target).  They are
equivalent.

This representation would be the same in that regard.

Our current lowering for clang happens to kind of look like c/c++
structures converted to a tree.

However, this is actually just inefficient, space wise, and done because
it's simple to lower it this way.
Because the accesses are completely unrelated to the original types, and
require *no* language rules to interpret, you could also just partition the
things that alias by the language rules in the frontend, and then output a
tree that represents the possible unique paths.

IE figure out all the answers, then re-encode it as precisely as possible.

This trades time for space.

This representation is *super* generic.  That is, the language being used
here is super simple, and the reachability rule is super simple.

I could have the frontend generate these trees based on anything i like.  I
could, in fact, encode steensgaard points-to results as this tree without
any trouble.

The access path language described in this proposal is more complex and
complete, and directly closer to access paths you find in C/C++.  It has
bitfields, vtables, and unions.
The reachability rules are more complex.
Is it possible to express other languages in that set of access path rules?
Sure.  For example, you can, as above, generate the set of answers, and
then re-encode it into access paths.

Right now, the work it takes *in the front end* is minimal, and has a
fairly efficient space encoding.
if i want to say two things alias, i just gotta be able to reach one from
the other.
If i want to say two things do not alias, i just gotta be able to not reach
one from the other only using certain types of edges
In our current language, all that takes in a frontend is "find longest
no-aliasing part of tree. Go to parent, add new child".

In the proposed language, the lowering is more complex.  Is it doable?
Sure, of course, not gonna claim otherwise.  But the more "features" of the
access path you add and expect the middle end to handle, instead of the
front-end expanding them, and the more those feature's reachability rules
are related to a specific language the more language-specific it gets.

That's the *only* tradeoff we are making here, representationally.   How
much does the frontend have to understand about how the middle end walks
these things, in order to generate whatever it wants as precisely as
possible

We can make whichever we want express everything we want in N^3 time :)
The real question is "do we try to add on to what we have in ways that work
for multiple languages, and are expressed neutrally in a simple
reachability language"
or "do we add language-specific kinds of nodes to the grammar, and have
reachability rules that are fairly language specific".

IE do you add, say, discriminated_union nodes to our current representation
for ada, or "vtable_access" nodes to our current representation for C++
vtable accesses
Or do you instead generate a metadata that has a unidirectional edge
reachability (IE up only), or whatever it takes to do vtables generically.

Both are completely and totally viable paths, and it's all about which way
you want to go.
But they *definitely* have a difference in terms of language-specificness.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/323dd762/attachment.html>


More information about the llvm-dev mailing list