<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<div class="moz-cite-prefix">On 03/09/2017 01:15 PM, Daniel Berlin
wrote:<br>
</div>
<blockquote
cite="mid:CAF4BwTWgK+fNoKXG9FAMaL0V0JRtM6VgkL+o5p8guDeTKc3C=g@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Mar 9, 2017 at 9:57 AM, Hal
Finkel <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>
<div class="h5"> <br>
<div class="m_-6586870021850522980moz-cite-prefix">On
03/01/2017 05:30 PM, Daniel Berlin via llvm-dev
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">So, <a moz-do-not-send="true"
href="https://bugs.llvm.org/show_bug.cgi?id=32056"
target="_blank">https://bugs.llvm.org/show_<wbr>bug.cgi?id=32056</a>
is an example showing our current TBAA tree for
union generation is definitely irretrievably
broken.
<div>I'll be honest here. I'm pretty sure your
proposal doesn't go far enough.</div>
<div>But truthfully, I would rather see us come
closer to a representation we know works,
which is GCC's.</div>
<div>Let me try to simplify what you are
suggesting, and what we have.</div>
<div>Our current representation is basically
inverted from GCC, but we don't allow things
that would enable it to work.<br>
</div>
<div><br>
</div>
<div>Given</div>
<div>union {int a, short b};</div>
<div><br>
</div>
<div>GCC's will be:</div>
<div><br>
</div>
<div> union</div>
<div> / \</div>
<div>short int</div>
<div><br>
</div>
<div><br>
</div>
<div>Everything is implicitly a subset of alias
set 0 (which for C/C++ is char) just to avoid
representing it.</div>
<div><br>
</div>
<div>Our metadata has no child links, and only a
single parent link.</div>
<div><br>
</div>
<div>You can't represent the above form because
you can't make a single short node a child of
every union/struct it needs to be (lack of
multiple parents means you can't just frob
them all to offset zero).</div>
<div><br>
</div>
<div>Your suggestion is to invert this as a
struct</div>
<div><br>
</div>
<div>short int</div>
<div> | / </div>
<div>union</div>
<div><br>
</div>
<div>We don't allow multiple parents, however.</div>
<div>Because of that, you suggest we follow all
nodes for unions, special casing union-type
nodes somehow</div>
<div><br>
</div>
<div>Let me suggest something different:</div>
<div><br>
</div>
<div>We know the current structure fails us in a
number of ways.</div>
<div>Instead of trying to shoehorn this into our
current structure, I suggest: we stop messing
around and just have a GCC style tree, and
represent the children instead of the parents.</div>
<div>We make contained types descendants instead
of ancestors.</div>
<div>We allow multiple children at each offset
and for scalar type nodes.x`<br>
</div>
<div><br>
</div>
<div>We could do that without changing to
children, but honestly, i strongly believe
nobody who ever looks at this really
understands it right now.</div>
<div>(If someone really does like the ancestor
form, i'd love to understand why :P)</div>
<div><br>
So if we are going to change it, we should
change it to something that is easier to
understand.</div>
<div><br>
</div>
<div>something like:<br>
<br>
</div>
<div>scalar type node -> {name, children
nodes} </div>
<div>struct node -> {name, array of {offset,
child node} }</div>
<div><br>
</div>
<div>Paths are separate from the tbaa tree
itself, and are just:</div>
<div>path node -> {struct node, offset} or
whatever.</div>
<div><br>
</div>
<div>unions are just scalar type nodes with
multiple children, not struct nodes with
special-cased offset zero.</div>
<div><br>
</div>
<div>This also has a well-defined upgrade path.</div>
<div><br>
</div>
<div>For "old-style" DAGs, like exist now, we
know we have to regen the metadata, and that
it is wrong (So we could, for example, simply
disable it for correctness reasons)</div>
<div>.</div>
<div>Anything with a "new-style" DAG, we know is
usable.</div>
<div><br>
</div>
<div>In this representation, the most generic
tbaa node is just the one that contains the
other.</div>
<div>If neither contains the other, you can just
make one that has both as children and use
that.</div>
<div>(instead of now, where you'd have to have
multiple parents again).</div>
</div>
</blockquote>
<br>
</div>
</div>
You mean making a 'scalar type node', because those
represent 'or'?<br>
</div>
</blockquote>
<div><br>
</div>
<div>Yes.</div>
<div>I would probably stop calling them scalar type nodes
and struct type nodes too, while we are bike-shedding a
bit :)</div>
<div><br>
</div>
<div>The terminology seems to be based on what we think a
type system that generates those kinds of nodes look like.</div>
<div>But uh</div>
<div>1. That's wrong a lot of the time, because each
frontend may want to use different types of nodes to be
conservatively correct, and because different things can
be represented, correctly, in multiple ways (IE, as a
stupid example, you could represent any type as a struct
node where all pieces go to the same place).</div>
<div><br>
</div>
<div>In fact, it's not even true for the few things that do
generate TBAA trees right now.</div>
<div><br>
</div>
<div>2. It gives you no idea what they do in LLVM, which is,
IMHO, what anyone reading those docs cares about.</div>
<div><br>
</div>
<div>I'd probably call them based on something related to
their actual semantic in llvm's tbaa tree is.</div>
<div><br>
</div>
<div>IE i'd call scalar type nodes "or-type nodes" or "any
of" nodes or something that gives people an idea that it
means the aliasing result is may-alias if any path from
that node is may-alias.</div>
<div>I'd call struct type nodes "offset-based nodes",
becuase that's really what they are.</div>
<div><br>
</div>
<div>(access-path nodes are actually access paths, so that
seems find)<br>
</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
I certainly agree. We should pick terminology that makes sense
independent of C/C++.<br>
<br>
-Hal<br>
<br>
<blockquote
cite="mid:CAF4BwTWgK+fNoKXG9FAMaL0V0JRtM6VgkL+o5p8guDeTKc3C=g@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> <br>
-Hal<br>
<br>
<blockquote type="cite"><span class="">
<div dir="ltr">
<div><br>
</div>
<div>(The above also may be better for other
languages)</div>
<div><br>
</div>
<div>--Dan</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Feb 28, 2017 at
4:44 PM, Steven Perron <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:perrons@ca.ibm.com"
target="_blank">perrons@ca.ibm.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex"><font size="2"
face="sans-serif">Seems like the comments
have stopped. I'll try to get a patch
together. Then we can continue the
discussion from there. </font><br>
<br>
<font size="2" face="sans-serif">Hubert, as
for the issue with the llvm optimizations
losing the TBAA information, it should be
the responsibility to make sure the aliasing
is changed in the correct way. One function
related to this has already been mentioned:
getMostGenericTBAA.</font><br>
<br>
<font size="2" face="sans-serif">Later,</font><br>
<font size="2" face="sans-serif">Steven Perron</font><br>
<br>
<br>
<br>
<font color="#5f5f5f" size="1"
face="sans-serif">From: </font><font
size="1" face="sans-serif">Hubert Tong <<a
moz-do-not-send="true"
href="mailto:hubert.reinterpretcast@gmail.com"
target="_blank">hubert.reinterpretcast@gmail.<wbr>com</a>></font><br>
<font color="#5f5f5f" size="1"
face="sans-serif">To: </font><font
size="1" face="sans-serif">Steven
Perron/Toronto/IBM@IBMCA</font><br>
<font color="#5f5f5f" size="1"
face="sans-serif">Cc: </font><font
size="1" face="sans-serif">Daniel Berlin
<<a moz-do-not-send="true"
href="mailto:dberlin@dberlin.org"
target="_blank">dberlin@dberlin.org</a>>,
llvm-dev <<a moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org"
target="_blank">llvm-dev@lists.llvm.org</a>>,
Sanjoy Das <<a moz-do-not-send="true"
href="mailto:sanjoy@playingwithpointers.com"
target="_blank">sanjoy@playingwithpointers.co<wbr>m</a>></font><br>
<font color="#5f5f5f" size="1"
face="sans-serif">Date: </font><font
size="1" face="sans-serif">2017/02/15 07:44
AM</font><span><br>
<font color="#5f5f5f" size="1"
face="sans-serif">Subject: </font><font
size="1" face="sans-serif">Re: [llvm-dev]
RFC: Representing unions in TBAA</font><br>
</span>
<hr noshade="noshade">
<div class="m_-6586870021850522980HOEnZb">
<div class="m_-6586870021850522980h5"><br>
<br>
<br>
<font size="3">On Tue, Feb 14, 2017 at
11:22 PM, Steven Perron <</font><a
moz-do-not-send="true"
href="mailto:perrons@ca.ibm.com"
target="_blank"><font color="blue"
size="3"><u>perrons@ca.ibm.com</u></font></a><font
size="3">> wrote:</font><br>
<font size="2" face="sans-serif">3) How
should we handle a reference directly
through a union, and a reference that is
not through the union? </font><font
size="3"><br>
</font><font size="2" face="sans-serif"><br>
My solution was to look for each member
of the union overlaps the given offset,
and see if any of those members aliased
the other reference. If no member
aliases the other reference, then the
answer is no alias. Otherwise the
answer is may alias. The run time for
this would be proportional to "distance
to the root" * "number of overlapping
members". This could be slow if there
are unions with many members or many
unions of unions.</font><font size="3"><br>
</font><font size="2" face="sans-serif"><br>
Another option is to say that they do
not alias. This would mean that all
references to unions must be explicitly
through the union.</font><br>
<font size="3">From what I gather from the
thread so far, the access through the
union may be lost because of LLVM
transformations. I am not sure how, in
the face of that, TBAA could indicate
NoAlias safely (without the risk of
functional-correctness issues in correct
programs) between types which overlap
within any union (within some portion of
the program).<br>
</font><br>
<font size="3">As for the standards, it is
definitely not true that all references
to unions must be explicitly through the
union. However, if you are trying to
perform union-based type punning (under
C11), then it appears that it is
intended that the read must be through
the union.</font><br>
<font size="3"> </font><br>
<font size="2" face="sans-serif">This
would be the least restrictive aliasing
allowing the most optimization. The
implementation would be simple. I
believe we make the parent of the TBAA
node for the union to be "omnipotent
char". This might be similar to
treating the union TBAA node more like a
scalar node instead of a struct-path.
Then the traversal of the TBAA nodes
will be quick. I'll have to work this
out a bit more, but, if this is good
enough to meet the requirements of the
standard, I can try to think this
through a little more. I'll need Hubert
and Daniel to comment on that since I am
no expert on the C and C++ standards.</font><font
size="3"><br>
</font><font size="2" face="sans-serif"><br>
The third option is to be pessimistic
and say "may alias" all of the time
(conservatively correct), and rely on
other alias analysis to improve it.
This will have good compile time, but
could hinder optimization. Personally I
do not like this option. Most of the
time it will not have a negative effect,
but there will be a reasonable number of
programs where this will hurt
optimization more that it needs to.</font><br>
<br>
<br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset
class="m_-6586870021850522980mimeAttachmentHeader"></fieldset>
<br>
</span><span class="">
<pre>______________________________<wbr>_________________
LLVM Developers mailing list
<a moz-do-not-send="true" class="m_-6586870021850522980moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a moz-do-not-send="true" class="m_-6586870021850522980moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</pre>
</span></blockquote><span class="HOEnZb"><font color="#888888">
<pre class="m_-6586870021850522980moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</font></span></div>
</blockquote></div>
</div></div>
</blockquote>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre></body></html>