<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 03/09/2017 01:15 PM, Daniel Berlin
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAF4BwTWgK+fNoKXG9FAMaL0V0JRtM6VgkL+o5p8guDeTKc3C=g@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Thu, Mar 9, 2017 at 9:57 AM, Hal
            Finkel <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div>
                  <div class="h5"> <br>
                    <div class="m_-6586870021850522980moz-cite-prefix">On
                      03/01/2017 05:30 PM, Daniel Berlin via llvm-dev
                      wrote:<br>
                    </div>
                    <blockquote type="cite">
                      <div dir="ltr">So, <a moz-do-not-send="true"
                          href="https://bugs.llvm.org/show_bug.cgi?id=32056"
                          target="_blank">https://bugs.llvm.org/show_<wbr>bug.cgi?id=32056</a>
                        is an example showing our current TBAA tree for
                        union generation is definitely irretrievably
                        broken.
                        <div>I'll be honest here. I'm pretty sure your
                          proposal doesn't go far enough.</div>
                        <div>But truthfully,  I would rather see us come
                          closer to a representation we know works,
                          which is GCC's.</div>
                        <div>Let me try to simplify what you are
                          suggesting, and what we have.</div>
                        <div>Our current representation is basically
                          inverted from GCC, but we don't allow things
                          that would enable it to work.<br>
                        </div>
                        <div><br>
                        </div>
                        <div>Given</div>
                        <div>union {int a, short b};</div>
                        <div><br>
                        </div>
                        <div>GCC's will be:</div>
                        <div><br>
                        </div>
                        <div> union</div>
                        <div>  /   \</div>
                        <div>short int</div>
                        <div><br>
                        </div>
                        <div><br>
                        </div>
                        <div>Everything is implicitly a subset of alias
                          set 0 (which for C/C++ is char) just to avoid
                          representing it.</div>
                        <div><br>
                        </div>
                        <div>Our metadata has no child links, and only a
                          single parent link.</div>
                        <div><br>
                        </div>
                        <div>You can't represent the above form because
                          you can't make a single short node a child  of
                          every union/struct it needs to be (lack of
                          multiple parents means you can't just frob
                          them all to offset zero).</div>
                        <div><br>
                        </div>
                        <div>Your suggestion is to invert this as a
                          struct</div>
                        <div><br>
                        </div>
                        <div>short  int</div>
                        <div>   |    / </div>
                        <div>union</div>
                        <div><br>
                        </div>
                        <div>We don't allow multiple parents, however.</div>
                        <div>Because of that, you suggest we follow all
                          nodes for unions, special casing union-type
                          nodes somehow</div>
                        <div><br>
                        </div>
                        <div>Let me suggest something different:</div>
                        <div><br>
                        </div>
                        <div>We know the current structure fails us in a
                          number of ways.</div>
                        <div>Instead of trying to shoehorn this into our
                          current structure, I suggest: we stop messing
                          around and just have a GCC style tree, and
                          represent the children instead of the parents.</div>
                        <div>We make contained types descendants instead
                          of ancestors.</div>
                        <div>We allow multiple children at each offset
                          and for scalar type nodes.x`<br>
                        </div>
                        <div><br>
                        </div>
                        <div>We could do that without changing to
                          children, but honestly,  i strongly believe
                          nobody who ever looks at this really
                          understands it right now.</div>
                        <div>(If someone really does like the ancestor
                          form, i'd love to understand why :P)</div>
                        <div><br>
                          So if we are going to change it, we should
                          change it to something that is easier to
                          understand.</div>
                        <div><br>
                        </div>
                        <div>something like:<br>
                          <br>
                        </div>
                        <div>scalar type node -> {name, children
                          nodes} </div>
                        <div>struct node -> {name, array of {offset,
                          child node} }</div>
                        <div><br>
                        </div>
                        <div>Paths are separate from the tbaa tree
                          itself, and are just:</div>
                        <div>path node -> {struct node, offset} or
                          whatever.</div>
                        <div><br>
                        </div>
                        <div>unions are just scalar type nodes with
                          multiple children, not struct nodes with
                          special-cased offset zero.</div>
                        <div><br>
                        </div>
                        <div>This also has a well-defined upgrade path.</div>
                        <div><br>
                        </div>
                        <div>For "old-style" DAGs, like exist now, we
                          know we have to regen the metadata, and that
                          it is wrong (So we could, for example, simply
                          disable it for correctness reasons)</div>
                        <div>.</div>
                        <div>Anything with a "new-style" DAG, we know is
                          usable.</div>
                        <div><br>
                        </div>
                        <div>In this representation, the most generic
                          tbaa node is just the one that contains the
                          other.</div>
                        <div>If neither contains the other, you can just
                          make one that has both as children and use
                          that.</div>
                        <div>(instead of now, where you'd have to have
                          multiple parents again).</div>
                      </div>
                    </blockquote>
                    <br>
                  </div>
                </div>
                You mean making a 'scalar type node', because those
                represent 'or'?<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Yes.</div>
            <div>I would probably stop calling them scalar type nodes
              and struct type nodes too, while we are bike-shedding a
              bit :)</div>
            <div><br>
            </div>
            <div>The terminology seems to be based on what we think a
              type system that generates those kinds of nodes look like.</div>
            <div>But uh</div>
            <div>1. That's wrong a lot of the time, because each
              frontend may want to use different types of nodes to be
              conservatively correct, and because different things can
              be represented, correctly, in multiple ways (IE, as a
              stupid example, you could represent any type as a struct
              node where all pieces go to the same place).</div>
            <div><br>
            </div>
            <div>In fact, it's not even true for the few things that do
              generate TBAA trees right now.</div>
            <div><br>
            </div>
            <div>2. It gives you no idea what they do in LLVM, which is,
              IMHO, what anyone reading those docs cares about.</div>
            <div><br>
            </div>
            <div>I'd probably call them based on something related to
              their actual semantic in llvm's tbaa tree is.</div>
            <div><br>
            </div>
            <div>IE i'd call scalar type nodes "or-type nodes" or "any
              of" nodes or something that gives people an idea that it
              means the aliasing result is may-alias if any path from
              that node is may-alias.</div>
            <div>I'd call struct type nodes "offset-based nodes",
              becuase that's really what they are.</div>
            <div><br>
            </div>
            <div>(access-path nodes are actually access paths, so that
              seems find)<br>
            </div>
            <div><br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    I certainly agree. We should pick terminology that makes sense
    independent of C/C++.<br>
    <br>
     -Hal<br>
    <br>
    <blockquote
cite="mid:CAF4BwTWgK+fNoKXG9FAMaL0V0JRtM6VgkL+o5p8guDeTKc3C=g@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000"> <br>
                 -Hal<br>
                <br>
                <blockquote type="cite"><span class="">
                    <div dir="ltr">
                      <div><br>
                      </div>
                      <div>(The above also may be better for other
                        languages)</div>
                      <div><br>
                      </div>
                      <div>--Dan</div>
                      <div><br>
                      </div>
                      <div><br>
                      </div>
                      <div><br>
                      </div>
                    </div>
                    <div class="gmail_extra"><br>
                      <div class="gmail_quote">On Tue, Feb 28, 2017 at
                        4:44 PM, Steven Perron <span dir="ltr"><<a
                            moz-do-not-send="true"
                            href="mailto:perrons@ca.ibm.com"
                            target="_blank">perrons@ca.ibm.com</a>></span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex"><font size="2"
                            face="sans-serif">Seems like the comments
                            have stopped.  I'll try to get a patch
                            together.  Then we can continue the
                            discussion from there.  </font><br>
                          <br>
                          <font size="2" face="sans-serif">Hubert, as
                            for the issue with the llvm optimizations
                            losing the TBAA information, it should be
                            the responsibility to make sure the aliasing
                            is changed in the correct way.  One function
                            related to this has already been mentioned:
                             getMostGenericTBAA.</font><br>
                          <br>
                          <font size="2" face="sans-serif">Later,</font><br>
                          <font size="2" face="sans-serif">Steven Perron</font><br>
                          <br>
                          <br>
                          <br>
                          <font color="#5f5f5f" size="1"
                            face="sans-serif">From:        </font><font
                            size="1" face="sans-serif">Hubert Tong <<a
                              moz-do-not-send="true"
                              href="mailto:hubert.reinterpretcast@gmail.com"
                              target="_blank">hubert.reinterpretcast@gmail.<wbr>com</a>></font><br>
                          <font color="#5f5f5f" size="1"
                            face="sans-serif">To:        </font><font
                            size="1" face="sans-serif">Steven
                            Perron/Toronto/IBM@IBMCA</font><br>
                          <font color="#5f5f5f" size="1"
                            face="sans-serif">Cc:        </font><font
                            size="1" face="sans-serif">Daniel Berlin
                            <<a moz-do-not-send="true"
                              href="mailto:dberlin@dberlin.org"
                              target="_blank">dberlin@dberlin.org</a>>,
                            llvm-dev <<a moz-do-not-send="true"
                              href="mailto:llvm-dev@lists.llvm.org"
                              target="_blank">llvm-dev@lists.llvm.org</a>>,
                            Sanjoy Das <<a moz-do-not-send="true"
                              href="mailto:sanjoy@playingwithpointers.com"
                              target="_blank">sanjoy@playingwithpointers.co<wbr>m</a>></font><br>
                          <font color="#5f5f5f" size="1"
                            face="sans-serif">Date:        </font><font
                            size="1" face="sans-serif">2017/02/15 07:44
                            AM</font><span><br>
                            <font color="#5f5f5f" size="1"
                              face="sans-serif">Subject:        </font><font
                              size="1" face="sans-serif">Re: [llvm-dev]
                              RFC: Representing unions in TBAA</font><br>
                          </span>
                          <hr noshade="noshade">
                          <div class="m_-6586870021850522980HOEnZb">
                            <div class="m_-6586870021850522980h5"><br>
                              <br>
                              <br>
                              <font size="3">On Tue, Feb 14, 2017 at
                                11:22 PM, Steven Perron <</font><a
                                moz-do-not-send="true"
                                href="mailto:perrons@ca.ibm.com"
                                target="_blank"><font color="blue"
                                  size="3"><u>perrons@ca.ibm.com</u></font></a><font
                                size="3">> wrote:</font><br>
                              <font size="2" face="sans-serif">3) How
                                should we handle a reference directly
                                through a union, and a reference that is
                                not through the union?  </font><font
                                size="3"><br>
                              </font><font size="2" face="sans-serif"><br>
                                My solution was to look for each member
                                of the union overlaps the given offset,
                                and see if any of those members aliased
                                the other reference.  If no member
                                aliases the other reference, then the
                                answer is no alias.  Otherwise the
                                answer is may alias.  The run time for
                                this would be proportional to  "distance
                                to the root" * "number of overlapping
                                members".  This could be slow if there
                                are unions with many members or many
                                unions of unions.</font><font size="3"><br>
                              </font><font size="2" face="sans-serif"><br>
                                Another option is to say that they do
                                not alias.  This would mean that all
                                references to unions must be explicitly
                                through the union.</font><br>
                              <font size="3">From what I gather from the
                                thread so far, the access through the
                                union may be lost because of LLVM
                                transformations. I am not sure how, in
                                the face of that, TBAA could indicate
                                NoAlias safely (without the risk of
                                functional-correctness issues in correct
                                programs) between types which overlap
                                within any union (within some portion of
                                the program).<br>
                              </font><br>
                              <font size="3">As for the standards, it is
                                definitely not true that all references
                                to unions must be explicitly through the
                                union. However, if you are trying to
                                perform union-based type punning (under
                                C11), then it appears that it is
                                intended that the read must be through
                                the union.</font><br>
                              <font size="3"> </font><br>
                              <font size="2" face="sans-serif">This
                                would be the least restrictive aliasing
                                allowing the most optimization.  The
                                implementation would be simple.  I
                                believe we make the parent of the TBAA
                                node for the union to be "omnipotent
                                char".  This might be similar to
                                treating the union TBAA node more like a
                                scalar node instead of a struct-path.
                                 Then the traversal of the TBAA nodes
                                will be quick.  I'll have to work this
                                out a bit more, but, if this is good
                                enough to meet the requirements of the
                                standard, I can try to think this
                                through a little more.  I'll need Hubert
                                and Daniel to comment on that since I am
                                no expert on the C and C++ standards.</font><font
                                size="3"><br>
                              </font><font size="2" face="sans-serif"><br>
                                The third option is to be pessimistic
                                and say "may alias" all of the time
                                (conservatively correct), and rely on
                                other alias analysis to improve it. 
                                This will have good compile time, but
                                could hinder optimization.  Personally I
                                do not like this option.  Most of the
                                time it will not have a negative effect,
                                but there will be a reasonable number of
                                programs where this will hurt
                                optimization more that it needs to.</font><br>
                              <br>
                              <br>
                              <br>
                            </div>
                          </div>
                        </blockquote>
                      </div>
                      <br>
                    </div>
                    <br>
                    <fieldset
                      class="m_-6586870021850522980mimeAttachmentHeader"></fieldset>
                    <br>
                  </span><span class="">
                    <pre>______________________________<wbr>_________________
LLVM Developers mailing list
<a moz-do-not-send="true" class="m_-6586870021850522980moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a moz-do-not-send="true" class="m_-6586870021850522980moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</pre>
    </span></blockquote><span class="HOEnZb"><font color="#888888">
    

    <pre class="m_-6586870021850522980moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </font></span></div>

</blockquote></div>
</div></div>



</blockquote>
<pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre></body></html>