[llvm-dev] RFC: Resolving TBAA issues

Daniel Berlin via llvm-dev llvm-dev at lists.llvm.org
Sat Aug 19 12:05:27 PDT 2017


FWIW: to be completely concrete; from my perspective, the main thing we'd
get out of switching is something that is more complete already.

past that
"I don't want language-specific kinds of nodes in the grammar, and I
believe that it's not necessary under some reasonable variant of this
scheme. Maybe I'm wrong."

If your position is that you'd be fine with a variant of this scheme, than
i don't really think we disagree at all :)


On Sat, Aug 19, 2017 at 12:00 PM, Hal Finkel <hfinkel at anl.gov> wrote:

>
> On 08/17/2017 11:25 PM, Daniel Berlin wrote:
>
>
> <just want to focus on these parts for a second. *All* of these
> representations are really access path representations, just encoded
> slightly different ways, and, as a result, with various parts of the rules
> in slightly different places>
>
> Imagine that we took the enhancement we previously discussed, but instead
>> of implementing it directly, we just directly encoded for every access the
>> path from the access type to the root. I think it would look very much like
>> this proposal.
>
>
> Something like it, yes.
> Note that this representation also has special vtable and union groups,
> for example, and assumes very c-like rules in several places around the
> sequencing of access types.
> Also note that  in the language of access paths, C allows overlapping that
> does not exist in, for example, Java
>
>
> Is this a statement generally favoring or disfavoring having the frontend
> encode the path explicitly?
>
> (This is true in both the points-to and type-based domains).
> Ada has discriminated unions (and you could not use the "union" in this
> proposal to represent them)
>
>
> Yes, but discriminated unions also have a dynamic element to them. As a
> result, I suspect we'd need a scheme that captures that (or else we'd need
> to model them like a struct with a field and something like a C union).
>
> etc.
>
>
>
>>
>> We generally explain our current TBAA rules by saying that they're
>>> generic but motivated by C/C++ rules.
>>>
>>
>> We do say that but that's not really what our implementation does in any
>> way.
>>
>>
>> Really? I thought it was motivated by C/C++ rules. When you say that it's
>> "not really what our implementation does", is this because it drops a lot
>> of potential information in the translation to our generic form?
>>
>
> What we do is completely and totally unrelated to types as they exist in
> the original language.  It is a completely and totally generic
> implementation with no rules related to the original language.  The
> original language, for example, has dynamic typing rules, object lifetime
> rules, etc. We have none of these.
>
> The types do not, in fact,even always have the same relationship we give
> them.
>
> We have a tree and some nodes, and we happen to name them after types.
> Like this proposal, they also represent computed access paths, in a much
> simpler language.
>
> The nodes represent alias set members (either one or many)
> The edges represent either "access at offset" (where the default is 0 if
> unweighted).
> We have a simple rule that that processes the access paths related to
> ancestry.
>
>
> I certainly understand all of this ;)
>
>
> You can view it as either a grammar parsing or a graph reachability
> problem depending on what works for you (in fact, it's really a Dyck-CFL
> reachability problem on bidirected trees)
>
> The reachability rule is  usually given as: If node Target is reachable
> from node Source, offset another through either it's children or the
> upwards edges (or was it downwards, i always screw up which direction we go
> in), they may-alias
>
> You could also implement it as a real grammar parsing problem (IE you
> quite literally could generate strings from tag + offset, and parse them
> against the grammar, and see if the terminal nodes contain your target).
> They are equivalent.
>
> This representation would be the same in that regard.
>
> Our current lowering for clang happens to kind of look like c/c++
> structures converted to a tree.
>
> However, this is actually just inefficient, space wise, and done because
> it's simple to lower it this way.
>
>
> I agree (although this wasn't happenstance; the representation was
> designed with an expected lowering scheme in mind, at least for C/C++).
>
>
> Because the accesses are completely unrelated to the original types, and
> require *no* language rules to interpret, you could also just partition the
> things that alias by the language rules in the frontend, and then output a
> tree that represents the possible unique paths.
>
> IE figure out all the answers, then re-encode it as precisely as possible.
>
> This trades time for space.
>
> This representation is *super* generic.  That is, the language being used
> here is super simple, and the reachability rule is super simple.
>
> I could have the frontend generate these trees based on anything i like.
> I could, in fact, encode steensgaard points-to results as this tree without
> any trouble.
>
> The access path language described in this proposal is more complex and
> complete, and directly closer to access paths you find in C/C++.  It has
> bitfields, vtables, and unions.
>
>
> It is more complex, but I think this is partly in the description. AFAIKT,
> only "union" has special properties under the scheme. Bit fields, vtables,
> etc. are all just particular types with no particular special rules. I
> don't like special rules at all for any named entities, but as there's only
> one such entity right now ("union"), this is an encoding choice we could
> bikeshed.
>
> The reachability rules are more complex.
> Is it possible to express other languages in that set of access path rules?
> Sure.  For example, you can, as above, generate the set of answers, and
> then re-encode it into access paths.
>
> Right now, the work it takes *in the front end* is minimal, and has a
> fairly efficient space encoding.
> if i want to say two things alias, i just gotta be able to reach one from
> the other.
> If i want to say two things do not alias, i just gotta be able to not
> reach one from the other only using certain types of edges
> In our current language, all that takes in a frontend is "find longest
> no-aliasing part of tree. Go to parent, add new child".
>
> In the proposed language, the lowering is more complex.  Is it doable?
> Sure, of course, not gonna claim otherwise.  But the more "features" of
> the access path you add and expect the middle end to handle, instead of the
> front-end expanding them, and the more those feature's reachability rules
> are related to a specific language the more language-specific it gets.
>
> That's the *only* tradeoff we are making here, representationally.   How
> much does the frontend have to understand about how the middle end walks
> these things, in order to generate whatever it wants as precisely as
> possible
>
> We can make whichever we want express everything we want in N^3 time :)
> The real question is "do we try to add on to what we have in ways that
> work for multiple languages, and are expressed neutrally in a simple
> reachability language"
> or "do we add language-specific kinds of nodes to the grammar, and have
> reachability rules that are fairly language specific".
>
>
> I don't want language-specific kinds of nodes in the grammar, and I
> believe that it's not necessary under some reasonable variant of this
> scheme. Maybe I'm wrong.
>
>
> IE do you add, say, discriminated_union nodes to our current
> representation for ada, or "vtable_access" nodes to our current
> representation for C++ vtable accesses
> Or do you instead generate a metadata that has a unidirectional edge
> reachability (IE up only), or whatever it takes to do vtables generically.
>
> Both are completely and totally viable paths, and it's all about which way
> you want to go.
> But they *definitely* have a difference in terms of language-specificness.
>
>
> To be clear, if the system is not generic, I'm far less interested.
>
> Thanks again,
> Hal
>
>
>
>
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170819/c58896e8/attachment.html>


More information about the llvm-dev mailing list