<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 05/14/2017 11:06 AM, Daniel Berlin
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAF4BwTWPYW9vpz3YA_ZrHEbj-obPD5=GsMn=tUrZgPyUjVN3=A@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Sun, May 14, 2017 at 8:37 AM, Hal
            Finkel <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000"><span class="">
                  <p><br>
                  </p>
                  <div class="m_7871019985833609421moz-cite-prefix">On
                    03/01/2017 05:30 PM, Daniel Berlin via llvm-dev
                    wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">So, <a moz-do-not-send="true"
                        href="https://bugs.llvm.org/show_bug.cgi?id=32056"
                        target="_blank">https://bugs.llvm.org/show_<wbr>bug.cgi?id=32056</a>
                      is an example showing our current TBAA tree for
                      union generation is definitely irretrievably
                      broken.
                      <div>I'll be honest here. I'm pretty sure your
                        proposal doesn't go far enough.</div>
                      <div>But truthfully,  I would rather see us come
                        closer to a representation we know works, which
                        is GCC's.</div>
                      <div>Let me try to simplify what you are
                        suggesting, and what we have.</div>
                      <div>Our current representation is basically
                        inverted from GCC, but we don't allow things
                        that would enable it to work.<br>
                      </div>
                      <div><br>
                      </div>
                      <div>Given</div>
                      <div>union {int a, short b};</div>
                      <div><br>
                      </div>
                      <div>GCC's will be:</div>
                      <div><br>
                      </div>
                      <div> union</div>
                      <div>  /   \</div>
                      <div>short int</div>
                      <div><br>
                      </div>
                      <div><br>
                      </div>
                      <div>Everything is implicitly a subset of alias
                        set 0 (which for C/C++ is char) just to avoid
                        representing it.</div>
                      <div><br>
                      </div>
                      <div>Our metadata has no child links, and only a
                        single parent link.</div>
                      <div><br>
                      </div>
                      <div>You can't represent the above form because
                        you can't make a single short node a child  of
                        every union/struct it needs to be (lack of
                        multiple parents means you can't just frob them
                        all to offset zero).</div>
                      <div><br>
                      </div>
                      <div>Your suggestion is to invert this as a struct</div>
                      <div><br>
                      </div>
                      <div>short  int</div>
                      <div>   |    / </div>
                      <div>union</div>
                      <div><br>
                      </div>
                      <div>We don't allow multiple parents, however.</div>
                      <div>Because of that, you suggest we follow all
                        nodes for unions, special casing union-type
                        nodes somehow</div>
                    </div>
                  </blockquote>
                  <br>
                </span> Now that I've spent a bunch of time looking at
                this, I'd like to voice support for Steven's original
                proposal. In the context of what we have, it makes
                sense, seems sound, and should fix the representational
                mapping problem we currently have. </div>
            </blockquote>
            <div><br>
            </div>
            <div>Except you can't easily differentiate it from the
              current one, and if we are going to have to upgrade/break
              compatibility, why not just do it once right, a way we
              know works, instead of risk screwing it up again, and
              playing with a representation we aren't actually sure we
              can make efficient for this case?<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    I don't see why need to break backward compatibility. Do you have
    something specific in mind? Once we extend the TBAA implementation
    to treat repeated offsets as disjunction, then we'll extend Clang to
    emit metadata for unions in this form. Old IR will work exactly as
    it does now.<br>
    <br>
    <blockquote
cite="mid:CAF4BwTWPYW9vpz3YA_ZrHEbj-obPD5=GsMn=tUrZgPyUjVN3=A@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">Something completely
                different (e.g. closer to what GCC uses) can work too,
                but this seems unnecessary (the proposed extension to
                the current semantics seem equivalently expressive).<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Yes, they are equivalently expressive.</div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000"> <br>
                What you call "special casing of union-type nodes" does
                not actually seem all that special.  The rule seems
                quite simple: if you some across duplicate offsets, then
                search all of them. </div>
            </blockquote>
            <div><br>
            </div>
            <div>Which means you have to know they exist, for starters,
              which means keeping it in some sorted order and checking,
              or some other mechanism.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    We already keep the offsets in order (this is a current
    requirement), so finding duplicate offsets can be done with no added
    complexity.<br>
    <br>
    <blockquote
cite="mid:CAF4BwTWPYW9vpz3YA_ZrHEbj-obPD5=GsMn=tUrZgPyUjVN3=A@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">This should not be
                difficult to implement or use, and seems no less
                efficient than any other way of encoding the concept of
                disjunction in the hierarchy.<span class="HOEnZb"><font
                    color="#888888"><br>
                    <br>
                  </font></span></div>
            </blockquote>
            <div>It is, in actuality, less efficient. </div>
            <div>For starters, in the inverted representation, you don't
              have to explicitly represent the things that are children
              of alias set zero (char in C++'s case), which is quite
              common.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    So, I think this is an orthogonal concern. It is not clear to me
    that overhead here is significant, and even if it is, it seems like
    an even-more efficient choice would be for the frontend just not to
    emit TBAA at all for universally-aliasing accesses. We can also
    reduce the overhead by merging the universally-aliasing-type nodes
    with the root node of the type system. We don't do that now,
    however, because we treat "omnipotent char" and "vtable pointer" as
    peer children to the root of the type hierarchy. That seems
    desirable (i.e. even char* does not alias with vtable pointers). If
    we wish to keep this, then I think we're doing about as well as we
    can.<br>
    <br>
    <blockquote
cite="mid:CAF4BwTWPYW9vpz3YA_ZrHEbj-obPD5=GsMn=tUrZgPyUjVN3=A@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>It's also trivial to generate transitive closures of
              parts of the graph in the inverted representation, and be
              able to use them, if you need to.</div>
            <div>It turns out to be much trickier the other way.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    I don't see how this differs significantly. We can still cache the
    types in the transitive closure up to the root for any type and use
    that as an O(1) filter on queries. As you point out, TBAA
    hierarchies tend to be shallow, so I'm not sure how much this
    matters. If it does, it is just a matter of walking from the type to
    the root (covering all paths if there are disjunctions).<br>
    <br>
    <blockquote
cite="mid:CAF4BwTWPYW9vpz3YA_ZrHEbj-obPD5=GsMn=tUrZgPyUjVN3=A@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>Those are just trivial examples.</div>
            <div><br>
            </div>
            <div>Now, how often this matters, don't know.</div>
            <div><br>
            </div>
            <div>But i'm suggesting what i believe to be the most
              practical route:</div>
            <div>Given a situation where our representation has been
              broken for years, take an approach that is battle tested
              and we know works and is efficient, instead of trying to
              patch our representation and hope we've thought it through
              well enough :)</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    I'm obviously all for learning from what others have done. Also,
    obviously, for everything we do it might turn out we haven't thought
    it through well enough. However, the fundamental problem here seems
    easy to express: we currently lack a concept of disjunction.
    Disjunction is the underlying concept required to represent unions.<br>
    <br>
    <blockquote
cite="mid:CAF4BwTWPYW9vpz3YA_ZrHEbj-obPD5=GsMn=tUrZgPyUjVN3=A@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>Does this mean the original proposal won't work?</div>
            <div>Nope. It may in fact work.  But i'd still do the thing
              i knew already worked well. Because like i said, you are
              going to break compatibility anyway, so ...</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    As I said above, I don't think we need to break backward
    compatibility. If we do, that may change my opinion. ;)<br>
    <br>
    Thanks again,<br>
    Hal<br>
    <br>
    <br>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>