<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 08/17/2017 07:20 PM, Daniel Berlin

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Thu, Aug 17, 2017 at 4:49 PM, Hal

            Finkel <span dir="ltr"><<a moz-do-not-send="true"

                href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"><span

                  class="gmail-m_8289637616720673898gmail-"> <br>

                  <div

class="gmail-m_8289637616720673898gmail-m_5209512876004350455moz-cite-prefix">On

                    08/17/2017 04:49 PM, Daniel Berlin via llvm-dev

                    wrote:<br>

                  </div>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <blockquote class="gmail_quote"

                            style="margin:0px 0px 0px

                            0.8ex;border-left:1px solid

                            rgb(204,204,204);padding-left:1ex"><br>

                            <br>

                            For two given access sequences we can

                            determine if the accessed<br>

                            objects are allowed to overlap by the rules

                            of the input<br>

                            language.</blockquote>

                          <div><br>

                          </div>

                          <div>Sadly, this is where this becomes

                            "unlikely to want to use to replace TBAA",

                            at least for me. It may still be a thing we

                            want anyway.</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                  <br>

                </span> The rules proposed by Ivan for handling C/C++

                seem pretty generic.

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>Uh?</div>

            <div><br>

            </div>

            <div>"</div>

            <br style="font-size:12.8px">

            <span style="font-size:12.8px">  [X...] is allowed to

              overlap with [S1..., X..., S2...]</span><br

              style="font-size:12.8px">

            <span style="font-size:12.8px">  and the most generic access

              sequence is [X...].</span><br style="font-size:12.8px">

            <br style="font-size:12.8px">

            <span style="font-size:12.8px">  [X1..., X2...] is allowed

              to overlap with [S1..., X1...]</span><br

              style="font-size:12.8px">

            <span style="font-size:12.8px">  with the most generic

              access sequence to be [X1...].</span><br

              style="font-size:12.8px">

            <br style="font-size:12.8px">

            <span style="font-size:12.8px">  [X1..., U, U::m1, X2...] is

              allowed to overlap with</span><br style="font-size:12.8px">

            <span style="font-size:12.8px">  [S1..., X1..., U, U::m2,

              S2...]</span><br style="font-size:12.8px">

            <span style="font-size:12.8px">  for a union U of an unknown

              effective type, provided m1 != m2</span><br

              style="font-size:12.8px">

            <span style="font-size:12.8px">  and the most generic access

              sequence is [X1..., U].</span><br style="font-size:12.8px">

            <br style="font-size:12.8px">

            <span style="font-size:12.8px">If neither of the given

              sequences contains the leading access</span><br

              style="font-size:12.8px">

            <span style="font-size:12.8px">type of the other, then they

              are allowed to overlap if the</span><br

              style="font-size:12.8px">

            <span style="font-size:12.8px">leading access type of one

              sequence is a direct or indirect field</span><br

              style="font-size:12.8px">

            <span style="font-size:12.8px">of the final access type of

              the other sequence and then the most</span><br

              style="font-size:12.8px">

            <span style="font-size:12.8px">generic access sequence

              consists of a single element, which is</span><br

              style="font-size:12.8px">

            <span style="font-size:12.8px">that final access type.</span><br

              style="font-size:12.8px">

            <br style="font-size:12.8px">

            <span style="font-size:12.8px">For the purpose of

              determining whether one type is a direct or</span><br

              style="font-size:12.8px">

            <span style="font-size:12.8px">indirect member of another

              type unions are considered to have no</span><br

              style="font-size:12.8px">

            <span style="font-size:12.8px">members as accesses to

              members of unions are only allowed to</span><br

              style="font-size:12.8px">

            <div><span style="font-size:12.8px">overlap if they have the

                base union object explicitly specified."</span></div>

            <div><br>

            </div>

            <div><br>

            </div>

            <div><span style="font-size:12.8px"></span>This sounds super

              specific to me.  It talks about unions, and how they

              behave in C/C++, etc.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    I realize that it sounds super specific, but I'm not sure that it

    is. I was thinking about it in the following way: Imagine that we

    took the enhancement we previously discussed, but instead of

    implementing it directly, we just directly encoded for every access

    the path from the access type to the root. I think it would look

    very much like this proposal. You'd need some way of designating

    when you passed through a disjunctive node (where here we're calling

    "union", but the concept seems generic).<br>

    <br>

    I need to think more about the direct/indirect access rules and

    what's being represented. I believe we end up recovering some of

    this information now because of the way that we track offsets. This

    might be better.<br>

    <br>

    <blockquote

cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>It has initial sequence rules, etc.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    How so?<br>

    <br>

    <blockquote

cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>These would make no sense for "Rust", "Go", or

              "Haskell" at all.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <blockquote

cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> <br>

            </div>

            <div><br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"> We generally explain our current

                TBAA rules by saying that they're generic but motivated

                by C/C++ rules.</div>

            </blockquote>

            <div><br>

            </div>

            <div>We do say that but that's not really what our

              implementation does in any way. <br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Really? I thought it was motivated by C/C++ rules. When you say that

    it's "not really what our implementation does", is this because it

    drops a lot of potential information in the translation to our

    generic form?<br>

    <br>

    <blockquote

cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"> I could say the same thing about

                this proposed system with its proposed rules.</div>

            </blockquote>

            <div>It is, IMHO, nowhere near as generic.</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"><span

                  class="gmail-m_8289637616720673898gmail-"> <br>

                </span></div>

            </blockquote>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"><span

                  class="gmail-m_8289637616720673898gmail-">

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <div><br>

                          </div>

                          <div>This scheme is really an encoding of

                            C/C++ TBAA info so it can be read by LLVM

                            and requires that *LLVM* have some set of

                            rules that it enforces about that scheme.</div>

                          <div>But that scheme is still very language

                            specific in how it is used.</div>

                          <div>GCC still has something in between this

                            and LLVM, where the language rules are a bit

                            encoded (but not as much as you have).<br>

                          </div>

                          <div><br>

                          </div>

                          <div>We (and gcc) have deliberately avoided

                            such schemes, in favor of transforming the

                            info into abstract set trees that then tag

                            loads and stores.</div>

                          <div><br>

                          </div>

                          <div>The encoding of "struct path" tbaa, is

                            just a way of trading space vs time in that

                            encoding.  We trade walking time for the

                            space of transitive closure, etc.</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                  <br>

                </span> If the provided statistic of 15% holds up, maybe

                which way we go in this trade-off space doesn't matter

                much?</div>

            </blockquote>

            <div><br>

            </div>

            <div>Sure, that part i'm indifferent about. </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"><span

                  class="gmail-m_8289637616720673898gmail-"><br>

                  <br>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <div><br>

                          </div>

                          <div>None of the TBAA in LLVM really has any

                            *real* relation to the original type system

                            rules, and that is deliberate.  I've argued

                            for years that calling it "TBAA" just badly

                            confuses people, and i believe here is a

                            good example :)</div>

                          <div><br>

                          </div>

                          <div>So i don't think what you've written can

                            be used to replace TBAA.</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                  <br>

                </span> So *if* we just take the proposed rules for

                C/C++ as the rules for the scheme in general, I'm not

                sure this is true. </div>

            </blockquote>

            <div><br>

            </div>

            <div>Okay, let me try to rephrase a little more precisely:<br>

              <div>It is almost certainly true that we *could* make

                *any* particular lowering work, at different costs to

                the front ends and middle ends.  IE i totally agree that

                the power of all of these systems (with small

                extensions) are equal.  and also, for the record, i

                certainly think the rules/scheme described here would

                almost certainly produce better optimization results

                than what is currently implemented.</div>

              <div><br>

              </div>

              *But*</div>

            <div>I don't see how you use this as a general scheme

              without ending up forcing the other frontends to lower

              things to look as if it was a C/C++ based type system.</div>

            <div>This is definitely *not* true today.</div>

            <div><br>

            </div>

            <div>They only have to lower things to look like a tree of

              sets.  Having been down this path before, it's definitely

              easier to lower things to look like trees of sets than

              like groups of C/C++ structs/bitfields/etc that follow

              certain rules.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Our current TBAA has trees of structure types, essentially (lists of

    fields and offsets in each node). I don't understand how the mapping

    is easier or harder depending on whether we encode tree-node

    references vs. the path from the leaf to the root.<br>

    <br>

    <blockquote

cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>This is one of the reasons I'm honestly having a lot of

              trouble seeing how this is is definitively better than

              what IBM's proposal was, which seemed both more generic

              and incremental, and handled most concerns.<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    The claim, as I understand it, is that this is more expressive. I'm

    trying to understand that.<br>

    <br>

    <blockquote

cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>The only practical difference i see is that this just

              changes the scheme from one where the language rules are

              mostly in the front ends to one where the language rules

              are also partially in llvm.  To me that isn't necessarily

              better, just different (heck, i'm even more used to that

              system because it's a push closer to gcc :P)</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Philosophically, this doesn't bother me too much provided that we

    can define the required semantics in terms of IR-level constructs

    (without direct appeal to some outside language standard). This

    seems to be true in this case. We have a lot of IR-level features

    that come about this way (where they have well defined, but

    seemingly arbitrary, semantics because of the source-level language

    semantics that motivated their creation).<br>

    <br>

    <blockquote

cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>So overall I don't see one as particularly better than

              the other.</div>

            <div>You seem to.</div>

            <div>So i'd really like to understand your perspective on

              this in terms of the pro/con that you are seeing.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    I'm not sold yet (and may never be). I'm interested in exploring the

    alternative. Maybe we'll come up with some kind of hybrid design in

    the end, or maybe we'll go back to the previous proposal. When you

    say, "i certainly think the rules/scheme described here would almost

    certainly produce better optimization results than what is currently

    implemented." is this also true relative to our previous proposed

    enhancements?<br>

    <br>

    In particular, I'd like to understand where this scheme is more

    expressive than our current one (and the current one plus proposed

    enhancement for unions/arrays).<br>

    <br>

    <blockquote

cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>I mean, i'm  really not even opposed (though others may

              be) in principle to saying "hey, let's just have a set of

              language specific tbaa passes with semi-common metadata".</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    I'd rather not go there ;)<br>

    <br>

    <blockquote

cite="mid:CAF4BwTVN6+3gR7GMzow2ERuFM+xuqLfVc7JG4sQQTq5zEuSCQg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>I'd just like to understand why we would change

              tradeoffs, etc.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Me too.<br>

    <br>

    Thanks again,<br>

    Hal<br>

    <br>

    <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </body>

</html>