<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 08/17/2017 07:20 PM, Daniel Berlin
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Thu, Aug 17, 2017 at 4:49 PM, Hal
            Finkel <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"><span
                  class="gmail-m_8289637616720673898gmail-"> <br>
                  <div
class="gmail-m_8289637616720673898gmail-m_5209512876004350455moz-cite-prefix">On
                    08/17/2017 04:49 PM, Daniel Berlin via llvm-dev
                    wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <blockquote class="gmail_quote"
                            style="margin:0px 0px 0px
                            0.8ex;border-left:1px solid
                            rgb(204,204,204);padding-left:1ex"><br>
                            <br>
                            For two given access sequences we can
                            determine if the accessed<br>
                            objects are allowed to overlap by the rules
                            of the input<br>
                            language.</blockquote>
                          <div><br>
                          </div>
                          <div>Sadly, this is where this becomes
                            "unlikely to want to use to replace TBAA",
                            at least for me. It may still be a thing we
                            want anyway.</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                </span> The rules proposed by Ivan for handling C/C++
                seem pretty generic.
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Uh?</div>
            <div><br>
            </div>
            <div>"</div>
            <br style="font-size:12.8px">
            <span style="font-size:12.8px">  [X...] is allowed to
              overlap with [S1..., X..., S2...]</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">  and the most generic access
              sequence is [X...].</span><br style="font-size:12.8px">
            <br style="font-size:12.8px">
            <span style="font-size:12.8px">  [X1..., X2...] is allowed
              to overlap with [S1..., X1...]</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">  with the most generic
              access sequence to be [X1...].</span><br
              style="font-size:12.8px">
            <br style="font-size:12.8px">
            <span style="font-size:12.8px">  [X1..., U, U::m1, X2...] is
              allowed to overlap with</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">  [S1..., X1..., U, U::m2,
              S2...]</span><br style="font-size:12.8px">
            <span style="font-size:12.8px">  for a union U of an unknown
              effective type, provided m1 != m2</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">  and the most generic access
              sequence is [X1..., U].</span><br style="font-size:12.8px">
            <br style="font-size:12.8px">
            <span style="font-size:12.8px">If neither of the given
              sequences contains the leading access</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">type of the other, then they
              are allowed to overlap if the</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">leading access type of one
              sequence is a direct or indirect field</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">of the final access type of
              the other sequence and then the most</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">generic access sequence
              consists of a single element, which is</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">that final access type.</span><br
              style="font-size:12.8px">
            <br style="font-size:12.8px">
            <span style="font-size:12.8px">For the purpose of
              determining whether one type is a direct or</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">indirect member of another
              type unions are considered to have no</span><br
              style="font-size:12.8px">
            <span style="font-size:12.8px">members as accesses to
              members of unions are only allowed to</span><br
              style="font-size:12.8px">
            <div><span style="font-size:12.8px">overlap if they have the
                base union object explicitly specified."</span></div>
            <div><br>
            </div>
            <div><br>
            </div>
            <div><span style="font-size:12.8px"></span>This sounds super
              specific to me.  It talks about unions, and how they
              behave in C/C++, etc.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    I realize that it sounds super specific, but I'm not sure that it
    is. I was thinking about it in the following way: Imagine that we
    took the enhancement we previously discussed, but instead of
    implementing it directly, we just directly encoded for every access
    the path from the access type to the root. I think it would look
    very much like this proposal. You'd need some way of designating
    when you passed through a disjunctive node (where here we're calling
    "union", but the concept seems generic).<br>
    <br>
    I need to think more about the direct/indirect access rules and
    what's being represented. I believe we end up recovering some of
    this information now because of the way that we track offsets. This
    might be better.<br>
    <br>
    <blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>It has initial sequence rules, etc.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    How so?<br>
    <br>
    <blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>These would make no sense for "Rust", "Go", or
              "Haskell" at all.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> <br>
            </div>
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> We generally explain our current
                TBAA rules by saying that they're generic but motivated
                by C/C++ rules.</div>
            </blockquote>
            <div><br>
            </div>
            <div>We do say that but that's not really what our
              implementation does in any way. <br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Really? I thought it was motivated by C/C++ rules. When you say that
    it's "not really what our implementation does", is this because it
    drops a lot of potential information in the translation to our
    generic form?<br>
    <br>
    <blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> I could say the same thing about
                this proposed system with its proposed rules.</div>
            </blockquote>
            <div>It is, IMHO, nowhere near as generic.</div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"><span
                  class="gmail-m_8289637616720673898gmail-"> <br>
                </span></div>
            </blockquote>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"><span
                  class="gmail-m_8289637616720673898gmail-">
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <div><br>
                          </div>
                          <div>This scheme is really an encoding of
                            C/C++ TBAA info so it can be read by LLVM
                            and requires that *LLVM* have some set of
                            rules that it enforces about that scheme.</div>
                          <div>But that scheme is still very language
                            specific in how it is used.</div>
                          <div>GCC still has something in between this
                            and LLVM, where the language rules are a bit
                            encoded (but not as much as you have).<br>
                          </div>
                          <div><br>
                          </div>
                          <div>We (and gcc) have deliberately avoided
                            such schemes, in favor of transforming the
                            info into abstract set trees that then tag
                            loads and stores.</div>
                          <div><br>
                          </div>
                          <div>The encoding of "struct path" tbaa, is
                            just a way of trading space vs time in that
                            encoding.  We trade walking time for the
                            space of transitive closure, etc.</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                </span> If the provided statistic of 15% holds up, maybe
                which way we go in this trade-off space doesn't matter
                much?</div>
            </blockquote>
            <div><br>
            </div>
            <div>Sure, that part i'm indifferent about. </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"><span
                  class="gmail-m_8289637616720673898gmail-"><br>
                  <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <div><br>
                          </div>
                          <div>None of the TBAA in LLVM really has any
                            *real* relation to the original type system
                            rules, and that is deliberate.  I've argued
                            for years that calling it "TBAA" just badly
                            confuses people, and i believe here is a
                            good example :)</div>
                          <div><br>
                          </div>
                          <div>So i don't think what you've written can
                            be used to replace TBAA.</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                </span> So *if* we just take the proposed rules for
                C/C++ as the rules for the scheme in general, I'm not
                sure this is true. </div>
            </blockquote>
            <div><br>
            </div>
            <div>Okay, let me try to rephrase a little more precisely:<br>
              <div>It is almost certainly true that we *could* make
                *any* particular lowering work, at different costs to
                the front ends and middle ends.  IE i totally agree that
                the power of all of these systems (with small
                extensions) are equal.  and also, for the record, i
                certainly think the rules/scheme described here would
                almost certainly produce better optimization results
                than what is currently implemented.</div>
              <div><br>
              </div>
              *But*</div>
            <div>I don't see how you use this as a general scheme
              without ending up forcing the other frontends to lower
              things to look as if it was a C/C++ based type system.</div>
            <div>This is definitely *not* true today.</div>
            <div><br>
            </div>
            <div>They only have to lower things to look like a tree of
              sets.  Having been down this path before, it's definitely
              easier to lower things to look like trees of sets than
              like groups of C/C++ structs/bitfields/etc that follow
              certain rules.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Our current TBAA has trees of structure types, essentially (lists of
    fields and offsets in each node). I don't understand how the mapping
    is easier or harder depending on whether we encode tree-node
    references vs. the path from the leaf to the root.<br>
    <br>
    <blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>This is one of the reasons I'm honestly having a lot of
              trouble seeing how this is is definitively better than
              what IBM's proposal was, which seemed both more generic
              and incremental, and handled most concerns.<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    The claim, as I understand it, is that this is more expressive. I'm
    trying to understand that.<br>
    <br>
    <blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>The only practical difference i see is that this just
              changes the scheme from one where the language rules are
              mostly in the front ends to one where the language rules
              are also partially in llvm.  To me that isn't necessarily
              better, just different (heck, i'm even more used to that
              system because it's a push closer to gcc :P)</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Philosophically, this doesn't bother me too much provided that we
    can define the required semantics in terms of IR-level constructs
    (without direct appeal to some outside language standard). This
    seems to be true in this case. We have a lot of IR-level features
    that come about this way (where they have well defined, but
    seemingly arbitrary, semantics because of the source-level language
    semantics that motivated their creation).<br>
    <br>
    <blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>So overall I don't see one as particularly better than
              the other.</div>
            <div>You seem to.</div>
            <div>So i'd really like to understand your perspective on
              this in terms of the pro/con that you are seeing.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    I'm not sold yet (and may never be). I'm interested in exploring the
    alternative. Maybe we'll come up with some kind of hybrid design in
    the end, or maybe we'll go back to the previous proposal. When you
    say, "i certainly think the rules/scheme described here would almost
    certainly produce better optimization results than what is currently
    implemented." is this also true relative to our previous proposed
    enhancements?<br>
    <br>
    In particular, I'd like to understand where this scheme is more
    expressive than our current one (and the current one plus proposed
    enhancement for unions/arrays).<br>
    <br>
    <blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>I mean, i'm  really not even opposed (though others may
              be) in principle to saying "hey, let's just have a set of
              language specific tbaa passes with semi-common metadata".</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    I'd rather not go there ;)<br>
    <br>
    <blockquote
cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>I'd just like to understand why we would change
              tradeoffs, etc.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Me too.<br>
    <br>
    Thanks again,<br>
    Hal<br>
    <br>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>