<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<div class="moz-cite-prefix">On 08/17/2017 07:20 PM, Daniel Berlin
wrote:<br>
</div>
<blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Aug 17, 2017 at 4:49 PM, Hal
Finkel <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span
class="gmail-m_8289637616720673898gmail-"> <br>
<div
class="gmail-m_8289637616720673898gmail-m_5209512876004350455moz-cite-prefix">On
08/17/2017 04:49 PM, Daniel Berlin via llvm-dev
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"><br>
<br>
For two given access sequences we can
determine if the accessed<br>
objects are allowed to overlap by the rules
of the input<br>
language.</blockquote>
<div><br>
</div>
<div>Sadly, this is where this becomes
"unlikely to want to use to replace TBAA",
at least for me. It may still be a thing we
want anyway.</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> The rules proposed by Ivan for handling C/C++
seem pretty generic.
</div>
</blockquote>
<div><br>
</div>
<div>Uh?</div>
<div><br>
</div>
<div>"</div>
<br style="font-size:12.8px">
<span style="font-size:12.8px"> [X...] is allowed to
overlap with [S1..., X..., S2...]</span><br
style="font-size:12.8px">
<span style="font-size:12.8px"> and the most generic access
sequence is [X...].</span><br style="font-size:12.8px">
<br style="font-size:12.8px">
<span style="font-size:12.8px"> [X1..., X2...] is allowed
to overlap with [S1..., X1...]</span><br
style="font-size:12.8px">
<span style="font-size:12.8px"> with the most generic
access sequence to be [X1...].</span><br
style="font-size:12.8px">
<br style="font-size:12.8px">
<span style="font-size:12.8px"> [X1..., U, U::m1, X2...] is
allowed to overlap with</span><br style="font-size:12.8px">
<span style="font-size:12.8px"> [S1..., X1..., U, U::m2,
S2...]</span><br style="font-size:12.8px">
<span style="font-size:12.8px"> for a union U of an unknown
effective type, provided m1 != m2</span><br
style="font-size:12.8px">
<span style="font-size:12.8px"> and the most generic access
sequence is [X1..., U].</span><br style="font-size:12.8px">
<br style="font-size:12.8px">
<span style="font-size:12.8px">If neither of the given
sequences contains the leading access</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">type of the other, then they
are allowed to overlap if the</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">leading access type of one
sequence is a direct or indirect field</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">of the final access type of
the other sequence and then the most</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">generic access sequence
consists of a single element, which is</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">that final access type.</span><br
style="font-size:12.8px">
<br style="font-size:12.8px">
<span style="font-size:12.8px">For the purpose of
determining whether one type is a direct or</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">indirect member of another
type unions are considered to have no</span><br
style="font-size:12.8px">
<span style="font-size:12.8px">members as accesses to
members of unions are only allowed to</span><br
style="font-size:12.8px">
<div><span style="font-size:12.8px">overlap if they have the
base union object explicitly specified."</span></div>
<div><br>
</div>
<div><br>
</div>
<div><span style="font-size:12.8px"></span>This sounds super
specific to me. It talks about unions, and how they
behave in C/C++, etc.</div>
</div>
</div>
</div>
</blockquote>
<br>
I realize that it sounds super specific, but I'm not sure that it
is. I was thinking about it in the following way: Imagine that we
took the enhancement we previously discussed, but instead of
implementing it directly, we just directly encoded for every access
the path from the access type to the root. I think it would look
very much like this proposal. You'd need some way of designating
when you passed through a disjunctive node (where here we're calling
"union", but the concept seems generic).<br>
<br>
I need to think more about the direct/indirect access rules and
what's being represented. I believe we end up recovering some of
this information now because of the way that we track offsets. This
might be better.<br>
<br>
<blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>It has initial sequence rules, etc.</div>
</div>
</div>
</div>
</blockquote>
<br>
How so?<br>
<br>
<blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>These would make no sense for "Rust", "Go", or
"Haskell" at all.</div>
</div>
</div>
</div>
</blockquote>
<blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> <br>
</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> We generally explain our current
TBAA rules by saying that they're generic but motivated
by C/C++ rules.</div>
</blockquote>
<div><br>
</div>
<div>We do say that but that's not really what our
implementation does in any way. <br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Really? I thought it was motivated by C/C++ rules. When you say that
it's "not really what our implementation does", is this because it
drops a lot of potential information in the translation to our
generic form?<br>
<br>
<blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> I could say the same thing about
this proposed system with its proposed rules.</div>
</blockquote>
<div>It is, IMHO, nowhere near as generic.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span
class="gmail-m_8289637616720673898gmail-"> <br>
</span></div>
</blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span
class="gmail-m_8289637616720673898gmail-">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>This scheme is really an encoding of
C/C++ TBAA info so it can be read by LLVM
and requires that *LLVM* have some set of
rules that it enforces about that scheme.</div>
<div>But that scheme is still very language
specific in how it is used.</div>
<div>GCC still has something in between this
and LLVM, where the language rules are a bit
encoded (but not as much as you have).<br>
</div>
<div><br>
</div>
<div>We (and gcc) have deliberately avoided
such schemes, in favor of transforming the
info into abstract set trees that then tag
loads and stores.</div>
<div><br>
</div>
<div>The encoding of "struct path" tbaa, is
just a way of trading space vs time in that
encoding. We trade walking time for the
space of transitive closure, etc.</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> If the provided statistic of 15% holds up, maybe
which way we go in this trade-off space doesn't matter
much?</div>
</blockquote>
<div><br>
</div>
<div>Sure, that part i'm indifferent about. </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span
class="gmail-m_8289637616720673898gmail-"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>None of the TBAA in LLVM really has any
*real* relation to the original type system
rules, and that is deliberate. I've argued
for years that calling it "TBAA" just badly
confuses people, and i believe here is a
good example :)</div>
<div><br>
</div>
<div>So i don't think what you've written can
be used to replace TBAA.</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> So *if* we just take the proposed rules for
C/C++ as the rules for the scheme in general, I'm not
sure this is true. </div>
</blockquote>
<div><br>
</div>
<div>Okay, let me try to rephrase a little more precisely:<br>
<div>It is almost certainly true that we *could* make
*any* particular lowering work, at different costs to
the front ends and middle ends. IE i totally agree that
the power of all of these systems (with small
extensions) are equal. and also, for the record, i
certainly think the rules/scheme described here would
almost certainly produce better optimization results
than what is currently implemented.</div>
<div><br>
</div>
*But*</div>
<div>I don't see how you use this as a general scheme
without ending up forcing the other frontends to lower
things to look as if it was a C/C++ based type system.</div>
<div>This is definitely *not* true today.</div>
<div><br>
</div>
<div>They only have to lower things to look like a tree of
sets. Having been down this path before, it's definitely
easier to lower things to look like trees of sets than
like groups of C/C++ structs/bitfields/etc that follow
certain rules.</div>
</div>
</div>
</div>
</blockquote>
<br>
Our current TBAA has trees of structure types, essentially (lists of
fields and offsets in each node). I don't understand how the mapping
is easier or harder depending on whether we encode tree-node
references vs. the path from the leaf to the root.<br>
<br>
<blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>This is one of the reasons I'm honestly having a lot of
trouble seeing how this is is definitively better than
what IBM's proposal was, which seemed both more generic
and incremental, and handled most concerns.<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
The claim, as I understand it, is that this is more expressive. I'm
trying to understand that.<br>
<br>
<blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>The only practical difference i see is that this just
changes the scheme from one where the language rules are
mostly in the front ends to one where the language rules
are also partially in llvm. To me that isn't necessarily
better, just different (heck, i'm even more used to that
system because it's a push closer to gcc :P)</div>
</div>
</div>
</div>
</blockquote>
<br>
Philosophically, this doesn't bother me too much provided that we
can define the required semantics in terms of IR-level constructs
(without direct appeal to some outside language standard). This
seems to be true in this case. We have a lot of IR-level features
that come about this way (where they have well defined, but
seemingly arbitrary, semantics because of the source-level language
semantics that motivated their creation).<br>
<br>
<blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>So overall I don't see one as particularly better than
the other.</div>
<div>You seem to.</div>
<div>So i'd really like to understand your perspective on
this in terms of the pro/con that you are seeing.</div>
</div>
</div>
</div>
</blockquote>
<br>
I'm not sold yet (and may never be). I'm interested in exploring the
alternative. Maybe we'll come up with some kind of hybrid design in
the end, or maybe we'll go back to the previous proposal. When you
say, "i certainly think the rules/scheme described here would almost
certainly produce better optimization results than what is currently
implemented." is this also true relative to our previous proposed
enhancements?<br>
<br>
In particular, I'd like to understand where this scheme is more
expressive than our current one (and the current one plus proposed
enhancement for unions/arrays).<br>
<br>
<blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>I mean, i'm really not even opposed (though others may
be) in principle to saying "hey, let's just have a set of
language specific tbaa passes with semi-common metadata".</div>
</div>
</div>
</div>
</blockquote>
<br>
I'd rather not go there ;)<br>
<br>
<blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>I'd just like to understand why we would change
tradeoffs, etc.</div>
</div>
</div>
</div>
</blockquote>
<br>
Me too.<br>
<br>
Thanks again,<br>
Hal<br>
<br>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</body>
</html>