<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<div class="moz-cite-prefix">On 08/19/2017 02:05 PM, Daniel Berlin
wrote:<br>
</div>
<blockquote
cite="mid:CAF4BwTWS9rxhCAUWadF=-s9o-M1Y9mrN=dZVungnr03zQF-Acw@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr">FWIW: to be completely concrete; from my
perspective, the main thing we'd get out of switching is
something that is more complete already.
<div><br>
</div>
<div>past that</div>
<div>"<span style="font-size:12.8px">I don't want
language-specific kinds of nodes in the grammar, and I
believe that it's not necessary under some reasonable
variant of this scheme. Maybe I'm wrong.</span>"</div>
<div><br>
</div>
<div>If your position is that you'd be fine with a variant of
this scheme, than i don't really think we disagree at all :)</div>
</div>
</blockquote>
<br>
That's exactly my position. Good :-)<br>
<br>
Here's what I find most intriguing about this: In the way we'd
discussed extending the current scheme to handle unions, we extend
the current way that we traverse the graph such that, if there are
multiple fields in one of the types with the same offset (i.e. a
union), when we need to walk up the graph through all fields. While
I still believe this is unlikely to be problematic in practice,
we're now exploring many paths, and the asymptotic complexity
doesn't thrill me.<br>
<br>
If I'm thinking about this correctly, this type of encoding makes
dealing with unions much more efficient. The frontend knows the
access path, and can encode that directly. We don't need to explore
many paths through a graph to find it (i.e. find a potential set of
paths through the graph that indicate potential aliasing). Instead
we can just examine the two paths directly encoded. If they both
have the same union type at the same offset, then they can alias. If
they have the same union type but incompatible offsets, they can't.
No path-finding required. Also, if I'm thinking about this
correctly, doing it this way is also more accurate (because it can
distinguish between structurally-identical union members, and the
proposed extension we'd discussed previously cannot (i.e., it would
give a conservative answer)). <br>
<br>
Thoughts?<br>
<br>
Thanks again,<br>
Hal<br>
<br>
<blockquote
cite="mid:CAF4BwTWS9rxhCAUWadF=-s9o-M1Y9mrN=dZVungnr03zQF-Acw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Sat, Aug 19, 2017 at 12:00 PM, Hal
Finkel <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><span class="">
<p><br>
</p>
<div class="m_-8107259744690938194moz-cite-prefix">On
08/17/2017 11:25 PM, Daniel Berlin wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div><just want to focus on these parts for a
second. *All* of these representations are
really access path representations, just
encoded slightly different ways, and, as a
result, with various parts of the rules in
slightly different places> </div>
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"><span
style="font-size:12.8px">Imagine that we
took the enhancement we previously
discussed, but instead of implementing it
directly, we just directly encoded for every
access the path from the access type to the
root. I think it would look very much like
this proposal.</span></blockquote>
<div><br>
Something like it, yes.</div>
<div>Note that this representation also has
special vtable and union groups, for example,
and assumes very c-like rules in several
places around the sequencing of access types.</div>
<div>Also note that in the language of access
paths, C allows overlapping that does not
exist in, for example, Java</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> Is this a statement generally favoring or
disfavoring having the frontend encode the path
explicitly?<span class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>(This is true in both the points-to and
type-based domains).</div>
<div>Ada has discriminated unions (and you could
not use the "union" in this proposal to
represent them)</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> Yes, but discriminated unions also have a dynamic
element to them. As a result, I suspect we'd need a scheme
that captures that (or else we'd need to model them like a
struct with a field and something like a C union).<span
class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>etc.</div>
<div><br>
</div>
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span
class="m_-8107259744690938194gmail-">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> <br>
</div>
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> We
generally explain our current
TBAA rules by saying that
they're generic but motivated
by C/C++ rules.</div>
</blockquote>
<div><br>
</div>
<div>We do say that but that's not
really what our implementation
does in any way. <br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> Really? I thought it was motivated
by C/C++ rules. When you say that it's "not
really what our implementation does", is
this because it drops a lot of potential
information in the translation to our
generic form?</div>
</blockquote>
<div><br>
</div>
<div>What we do is completely and totally
unrelated to types as they exist in the
original language. It is a completely and
totally generic implementation with no rules
related to the original language. The
original language, for example, has dynamic
typing rules, object lifetime rules, etc. We
have none of these.</div>
<div><br>
</div>
<div>The types do not, in fact,even always have
the same relationship we give them.</div>
<div> <br>
</div>
<div>We have a tree and some nodes, and we
happen to name them after types. Like this
proposal, they also represent computed access
paths, in a much simpler language.</div>
<div><br>
</div>
<div>The nodes represent alias set members
(either one or many)</div>
<div>The edges represent either "access at
offset" (where the default is 0 if
unweighted).</div>
<div>We have a simple rule that that processes
the access paths related to ancestry.</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> I certainly understand all of this ;)<span
class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>You can view it as either a grammar parsing
or a graph reachability problem depending on
what works for you (in fact, it's really a
Dyck-CFL reachability problem on bidirected
trees)</div>
<div><br>
</div>
<div>The reachability rule is usually given as:
If node Target is reachable from node Source,
offset another through either it's children or
the upwards edges (or was it downwards, i
always screw up which direction we go in),
they may-alias<br>
</div>
<div><br>
</div>
<div>You could also implement it as a real
grammar parsing problem (IE you quite
literally could generate strings from tag +
offset, and parse them against the grammar,
and see if the terminal nodes contain your
target). They are equivalent.</div>
<div><br>
</div>
<div>This representation would be the same in
that regard.<br>
</div>
<div><br>
</div>
<div>Our current lowering for clang happens to
kind of look like c/c++ structures converted
to a tree. </div>
<div><br>
</div>
<div>However, this is actually just inefficient,
space wise, and done because it's simple to
lower it this way.</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> I agree (although this wasn't happenstance; the
representation was designed with an expected lowering
scheme in mind, at least for C/C++). <br>
<span class=""> <br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<div>Because the accesses are completely
unrelated to the original types, and require
*no* language rules to interpret, you could
also just partition the things that alias by
the language rules in the frontend, and then
output a tree that represents the possible
unique paths.</div>
<div><br>
</div>
<div>IE figure out all the answers, then
re-encode it as precisely as possible.</div>
<div><br>
</div>
<div>This trades time for space.</div>
<div><br>
</div>
<div>This representation is *super* generic.
That is, the language being used here is super
simple, and the reachability rule is super
simple.</div>
<div><br>
</div>
<div>I could have the frontend generate these
trees based on anything i like. I could, in
fact, encode steensgaard points-to results as
this tree without any trouble.</div>
<div><br>
</div>
<div>The access path language described in this
proposal is more complex and complete, and
directly closer to access paths you find in
C/C++. It has bitfields, vtables, and unions.</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> It is more complex, but I think this is partly in
the description. AFAIKT, only "union" has special
properties under the scheme. Bit fields, vtables, etc. are
all just particular types with no particular special
rules. I don't like special rules at all for any named
entities, but as there's only one such entity right now
("union"), this is an encoding choice we could bikeshed.<span
class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>The reachability rules are more complex.</div>
<div>Is it possible to express other languages
in that set of access path rules?</div>
<div>Sure. For example, you can, as above,
generate the set of answers, and then
re-encode it into access paths.</div>
<div><br>
</div>
<div>Right now, the work it takes *in the front
end* is minimal, and has a fairly efficient
space encoding.</div>
<div>if i want to say two things alias, i just
gotta be able to reach one from the other.</div>
<div>If i want to say two things do not alias, i
just gotta be able to not reach one from the
other only using certain types of edges</div>
<div>In our current language, all that takes in
a frontend is "find longest no-aliasing part
of tree. Go to parent, add new child".</div>
<div><br>
</div>
<div>In the proposed language, the lowering is
more complex. Is it doable?</div>
<div>Sure, of course, not gonna claim
otherwise. But the more "features" of the
access path you add and expect the middle end
to handle, instead of the front-end expanding
them, and the more those feature's
reachability rules are related to a specific
language the more language-specific it gets.</div>
<div><br>
</div>
<div>That's the *only* tradeoff we are making
here, representationally. How much does the
frontend have to understand about how the
middle end walks these things, in order to
generate whatever it wants as precisely as
possible</div>
<div><br>
</div>
<div>We can make whichever we want express
everything we want in N^3 time :)</div>
<div>The real question is "do we try to add on
to what we have in ways that work for multiple
languages, and are expressed neutrally in a
simple reachability language"</div>
<div>or "do we add language-specific kinds of
nodes to the grammar, and have reachability
rules that are fairly language specific".</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> I don't want language-specific kinds of nodes in
the grammar, and I believe that it's not necessary under
some reasonable variant of this scheme. Maybe I'm wrong.<span
class=""><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>IE do you add, say, discriminated_union
nodes to our current representation for ada,
or "vtable_access" nodes to our current
representation for C++ vtable accesses</div>
<div>Or do you instead generate a metadata that
has a unidirectional edge reachability (IE up
only), or whatever it takes to do vtables
generically.</div>
<div><br>
</div>
<div>Both are completely and totally viable
paths, and it's all about which way you want
to go.</div>
<div>But they *definitely* have a difference in
terms of language-specificness.</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> To be clear, if the system is not generic, I'm far
less interested.<span class=""><br>
<br>
Thanks again,<br>
Hal<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
<pre class="m_-8107259744690938194moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</span></div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</body>
</html>