[llvm-dev] RFC: Resolving TBAA issues
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Sat Aug 19 12:00:45 PDT 2017
On 08/17/2017 11:25 PM, Daniel Berlin wrote:
>
> <just want to focus on these parts for a second. *All* of these
> representations are really access path representations, just encoded
> slightly different ways, and, as a result, with various parts of the
> rules in slightly different places>
>
> Imagine that we took the enhancement we previously discussed, but
> instead of implementing it directly, we just directly encoded for
> every access the path from the access type to the root. I think it
> would look very much like this proposal.
>
>
> Something like it, yes.
> Note that this representation also has special vtable and union
> groups, for example, and assumes very c-like rules in several places
> around the sequencing of access types.
> Also note that in the language of access paths, C allows overlapping
> that does not exist in, for example, Java
Is this a statement generally favoring or disfavoring having the
frontend encode the path explicitly?
> (This is true in both the points-to and type-based domains).
> Ada has discriminated unions (and you could not use the "union" in
> this proposal to represent them)
Yes, but discriminated unions also have a dynamic element to them. As a
result, I suspect we'd need a scheme that captures that (or else we'd
need to model them like a struct with a field and something like a C union).
> etc.
>
>
>>
>>
>> We generally explain our current TBAA rules by saying that
>> they're generic but motivated by C/C++ rules.
>>
>>
>> We do say that but that's not really what our implementation does
>> in any way.
>
> Really? I thought it was motivated by C/C++ rules. When you say
> that it's "not really what our implementation does", is this
> because it drops a lot of potential information in the translation
> to our generic form?
>
>
> What we do is completely and totally unrelated to types as they exist
> in the original language. It is a completely and totally generic
> implementation with no rules related to the original language. The
> original language, for example, has dynamic typing rules, object
> lifetime rules, etc. We have none of these.
>
> The types do not, in fact,even always have the same relationship we
> give them.
>
> We have a tree and some nodes, and we happen to name them after
> types. Like this proposal, they also represent computed access paths,
> in a much simpler language.
>
> The nodes represent alias set members (either one or many)
> The edges represent either "access at offset" (where the default is 0
> if unweighted).
> We have a simple rule that that processes the access paths related to
> ancestry.
I certainly understand all of this ;)
>
> You can view it as either a grammar parsing or a graph reachability
> problem depending on what works for you (in fact, it's really a
> Dyck-CFL reachability problem on bidirected trees)
>
> The reachability rule is usually given as: If node Target is
> reachable from node Source, offset another through either it's
> children or the upwards edges (or was it downwards, i always screw up
> which direction we go in), they may-alias
>
> You could also implement it as a real grammar parsing problem (IE you
> quite literally could generate strings from tag + offset, and parse
> them against the grammar, and see if the terminal nodes contain your
> target). They are equivalent.
>
> This representation would be the same in that regard.
>
> Our current lowering for clang happens to kind of look like c/c++
> structures converted to a tree.
>
> However, this is actually just inefficient, space wise, and done
> because it's simple to lower it this way.
I agree (although this wasn't happenstance; the representation was
designed with an expected lowering scheme in mind, at least for C/C++).
> Because the accesses are completely unrelated to the original types,
> and require *no* language rules to interpret, you could also just
> partition the things that alias by the language rules in the frontend,
> and then output a tree that represents the possible unique paths.
>
> IE figure out all the answers, then re-encode it as precisely as possible.
>
> This trades time for space.
>
> This representation is *super* generic. That is, the language being
> used here is super simple, and the reachability rule is super simple.
>
> I could have the frontend generate these trees based on anything i
> like. I could, in fact, encode steensgaard points-to results as this
> tree without any trouble.
>
> The access path language described in this proposal is more complex
> and complete, and directly closer to access paths you find in C/C++.
> It has bitfields, vtables, and unions.
It is more complex, but I think this is partly in the description.
AFAIKT, only "union" has special properties under the scheme. Bit
fields, vtables, etc. are all just particular types with no particular
special rules. I don't like special rules at all for any named entities,
but as there's only one such entity right now ("union"), this is an
encoding choice we could bikeshed.
> The reachability rules are more complex.
> Is it possible to express other languages in that set of access path
> rules?
> Sure. For example, you can, as above, generate the set of answers,
> and then re-encode it into access paths.
>
> Right now, the work it takes *in the front end* is minimal, and has a
> fairly efficient space encoding.
> if i want to say two things alias, i just gotta be able to reach one
> from the other.
> If i want to say two things do not alias, i just gotta be able to not
> reach one from the other only using certain types of edges
> In our current language, all that takes in a frontend is "find longest
> no-aliasing part of tree. Go to parent, add new child".
>
> In the proposed language, the lowering is more complex. Is it doable?
> Sure, of course, not gonna claim otherwise. But the more "features"
> of the access path you add and expect the middle end to handle,
> instead of the front-end expanding them, and the more those feature's
> reachability rules are related to a specific language the more
> language-specific it gets.
>
> That's the *only* tradeoff we are making here, representationally.
> How much does the frontend have to understand about how the middle end
> walks these things, in order to generate whatever it wants as
> precisely as possible
>
> We can make whichever we want express everything we want in N^3 time :)
> The real question is "do we try to add on to what we have in ways that
> work for multiple languages, and are expressed neutrally in a simple
> reachability language"
> or "do we add language-specific kinds of nodes to the grammar, and
> have reachability rules that are fairly language specific".
I don't want language-specific kinds of nodes in the grammar, and I
believe that it's not necessary under some reasonable variant of this
scheme. Maybe I'm wrong.
>
> IE do you add, say, discriminated_union nodes to our current
> representation for ada, or "vtable_access" nodes to our current
> representation for C++ vtable accesses
> Or do you instead generate a metadata that has a unidirectional edge
> reachability (IE up only), or whatever it takes to do vtables generically.
>
> Both are completely and totally viable paths, and it's all about which
> way you want to go.
> But they *definitely* have a difference in terms of language-specificness.
>
To be clear, if the system is not generic, I'm far less interested.
Thanks again,
Hal
>
>
>
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170819/88b91f82/attachment-0001.html>
More information about the llvm-dev
mailing list