<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 08/19/2017 02:05 PM, Daniel Berlin

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAF4BwTWS9rxhCAUWadF=-s9o-M1Y9mrN=dZVungnr03zQF-Acw@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr">FWIW: to be completely concrete; from my

        perspective, the main thing we'd get out of switching is

        something that is more complete already.

        <div><br>

        </div>

        <div>past that</div>

        <div>"<span style="font-size:12.8px">I don't want

            language-specific kinds of nodes in the grammar, and I

            believe that it's not necessary under some reasonable

            variant of this scheme. Maybe I'm wrong.</span>"</div>

        <div><br>

        </div>

        <div>If your position is that you'd be fine with a variant of

          this scheme, than i don't really think we disagree at all :)</div>

      </div>

    </blockquote>

    <br>

    That's exactly my position. Good :-)<br>

    <br>

    Here's what I find most intriguing about this: In the way we'd

    discussed extending the current scheme to handle unions, we extend

    the current way that we traverse the graph such that, if there are

    multiple fields in one of the types with the same offset (i.e. a

    union), when we need to walk up the graph through all fields. While

    I still believe this is unlikely to be problematic in practice,

    we're now exploring many paths, and the asymptotic complexity

    doesn't thrill me.<br>

    <br>

    If I'm thinking about this correctly, this type of encoding makes

    dealing with unions much more efficient. The frontend knows the

    access path, and can encode that directly. We don't need to explore

    many paths through a graph to find it (i.e. find a potential set of

    paths through the graph that indicate potential aliasing). Instead

    we can just examine the two paths directly encoded. If they both

    have the same union type at the same offset, then they can alias. If

    they have the same union type but incompatible offsets, they can't.

    No path-finding required. Also, if I'm thinking about this

    correctly, doing it this way is also more accurate (because it can

    distinguish between structurally-identical union members, and the

    proposed extension we'd discussed previously cannot (i.e., it would

    give a conservative answer)). <br>

    <br>

    Thoughts?<br>

    <br>

    Thanks again,<br>

    Hal<br>

    <br>

    <blockquote

cite="mid:CAF4BwTWS9rxhCAUWadF=-s9o-M1Y9mrN=dZVungnr03zQF-Acw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Sat, Aug 19, 2017 at 12:00 PM, Hal

          Finkel <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000"><span class="">

                <p><br>

                </p>

                <div class="m_-8107259744690938194moz-cite-prefix">On

                  08/17/2017 11:25 PM, Daniel Berlin wrote:<br>

                </div>

                <blockquote type="cite">

                  <div dir="ltr">

                    <div class="gmail_extra">

                      <div class="gmail_quote">

                        <div><br>

                        </div>

                        <div><just want to focus on these parts for a

                          second. *All* of these representations are

                          really access path representations, just

                          encoded slightly different ways, and, as a

                          result, with various parts of the rules in

                          slightly different places> </div>

                        <div><br>

                        </div>

                        <blockquote class="gmail_quote"

                          style="margin:0px 0px 0px

                          0.8ex;border-left:1px solid

                          rgb(204,204,204);padding-left:1ex"><span

                            style="font-size:12.8px">Imagine that we

                            took the enhancement we previously

                            discussed, but instead of implementing it

                            directly, we just directly encoded for every

                            access the path from the access type to the

                            root. I think it would look very much like

                            this proposal.</span></blockquote>

                        <div><br>

                          Something like it, yes.</div>

                        <div>Note that this representation also has

                          special vtable and union groups, for example,

                          and assumes very c-like rules in several

                          places around the sequencing of access types.</div>

                        <div>Also note that  in the language of access

                          paths, C allows overlapping that does not

                          exist in, for example, Java</div>

                      </div>

                    </div>

                  </div>

                </blockquote>

                <br>

              </span> Is this a statement generally favoring or

              disfavoring having the frontend encode the path

              explicitly?<span class=""><br>

                <br>

                <blockquote type="cite">

                  <div dir="ltr">

                    <div class="gmail_extra">

                      <div class="gmail_quote">

                        <div>(This is true in both the points-to and

                          type-based domains).</div>

                        <div>Ada has discriminated unions (and you could

                          not use the "union" in this proposal to

                          represent them)</div>

                      </div>

                    </div>

                  </div>

                </blockquote>

                <br>

              </span> Yes, but discriminated unions also have a dynamic

              element to them. As a result, I suspect we'd need a scheme

              that captures that (or else we'd need to model them like a

              struct with a field and something like a C union).<span

                class=""><br>

                <br>

                <blockquote type="cite">

                  <div dir="ltr">

                    <div class="gmail_extra">

                      <div class="gmail_quote">

                        <div>etc.</div>

                        <div><br>

                        </div>

                        <div><br>

                        </div>

                        <blockquote class="gmail_quote"

                          style="margin:0px 0px 0px

                          0.8ex;border-left:1px solid

                          rgb(204,204,204);padding-left:1ex">

                          <div bgcolor="#FFFFFF"><span

                              class="m_-8107259744690938194gmail-">

                              <blockquote type="cite">

                                <div dir="ltr">

                                  <div class="gmail_extra">

                                    <div class="gmail_quote">

                                      <div> <br>

                                      </div>

                                      <div><br>

                                      </div>

                                      <blockquote class="gmail_quote"

                                        style="margin:0px 0px 0px

                                        0.8ex;border-left:1px solid

                                        rgb(204,204,204);padding-left:1ex">

                                        <div bgcolor="#FFFFFF"> We

                                          generally explain our current

                                          TBAA rules by saying that

                                          they're generic but motivated

                                          by C/C++ rules.</div>

                                      </blockquote>

                                      <div><br>

                                      </div>

                                      <div>We do say that but that's not

                                        really what our implementation

                                        does in any way. <br>

                                      </div>

                                    </div>

                                  </div>

                                </div>

                              </blockquote>

                              <br>

                            </span> Really? I thought it was motivated

                            by C/C++ rules. When you say that it's "not

                            really what our implementation does", is

                            this because it drops a lot of potential

                            information in the translation to our

                            generic form?</div>

                        </blockquote>

                        <div><br>

                        </div>

                        <div>What we do is completely and totally

                          unrelated to types as they exist in the

                          original language.  It is a completely and

                          totally generic implementation with no rules

                          related to the original language.  The

                          original language, for example, has dynamic

                          typing rules, object lifetime rules, etc. We

                          have none of these.</div>

                        <div><br>

                        </div>

                        <div>The types do not, in fact,even always have

                          the same relationship we give them.</div>

                        <div> <br>

                        </div>

                        <div>We have a tree and some nodes, and we

                          happen to name them after types.  Like this

                          proposal, they also represent computed access

                          paths, in a much simpler language.</div>

                        <div><br>

                        </div>

                        <div>The nodes represent alias set members

                          (either one or many)</div>

                        <div>The edges represent either "access at

                          offset" (where the default is 0 if

                          unweighted).</div>

                        <div>We have a simple rule that that processes

                          the access paths related to ancestry.</div>

                      </div>

                    </div>

                  </div>

                </blockquote>

                <br>

              </span> I certainly understand all of this ;)<span

                class=""><br>

                <br>

                <blockquote type="cite">

                  <div dir="ltr">

                    <div class="gmail_extra">

                      <div class="gmail_quote">

                        <div><br>

                        </div>

                        <div>You can view it as either a grammar parsing

                          or a graph reachability problem depending on

                          what works for you (in fact, it's really a

                          Dyck-CFL reachability problem on bidirected

                          trees)</div>

                        <div><br>

                        </div>

                        <div>The reachability rule is  usually given as:

                          If node Target is reachable from node Source,

                          offset another through either it's children or

                          the upwards edges (or was it downwards, i

                          always screw up which direction we go in),

                          they may-alias<br>

                        </div>

                        <div><br>

                        </div>

                        <div>You could also implement it as a real

                          grammar parsing problem (IE you quite

                          literally could generate strings from tag +

                          offset, and parse them against the grammar,

                          and see if the terminal nodes contain your

                          target).  They are equivalent.</div>

                        <div><br>

                        </div>

                        <div>This representation would be the same in

                          that regard.<br>

                        </div>

                        <div><br>

                        </div>

                        <div>Our current lowering for clang happens to

                          kind of look like c/c++ structures converted

                          to a tree.  </div>

                        <div><br>

                        </div>

                        <div>However, this is actually just inefficient,

                          space wise, and done because it's simple to

                          lower it this way.</div>

                      </div>

                    </div>

                  </div>

                </blockquote>

                <br>

              </span> I agree (although this wasn't happenstance; the

              representation was designed with an expected lowering

              scheme in mind, at least for C/C++). <br>

              <span class=""> <br>

                <blockquote type="cite">

                  <div dir="ltr">

                    <div class="gmail_extra">

                      <div class="gmail_quote">

                        <div>  </div>

                        <div>Because the accesses are completely

                          unrelated to the original types, and require

                          *no* language rules to interpret, you could

                          also just partition the things that alias by

                          the language rules in the frontend, and then

                          output a tree that represents the possible

                          unique paths.</div>

                        <div><br>

                        </div>

                        <div>IE figure out all the answers, then

                          re-encode it as precisely as possible.</div>

                        <div><br>

                        </div>

                        <div>This trades time for space.</div>

                        <div><br>

                        </div>

                        <div>This representation is *super* generic. 

                          That is, the language being used here is super

                          simple, and the reachability rule is super

                          simple.</div>

                        <div><br>

                        </div>

                        <div>I could have the frontend generate these

                          trees based on anything i like.  I could, in

                          fact, encode steensgaard points-to results as

                          this tree without any trouble.</div>

                        <div><br>

                        </div>

                        <div>The access path language described in this

                          proposal is more complex and complete, and

                          directly closer to access paths you find in

                          C/C++.  It has bitfields, vtables, and unions.</div>

                      </div>

                    </div>

                  </div>

                </blockquote>

                <br>

              </span> It is more complex, but I think this is partly in

              the description. AFAIKT, only "union" has special

              properties under the scheme. Bit fields, vtables, etc. are

              all just particular types with no particular special

              rules. I don't like special rules at all for any named

              entities, but as there's only one such entity right now

              ("union"), this is an encoding choice we could bikeshed.<span

                class=""><br>

                <br>

                <blockquote type="cite">

                  <div dir="ltr">

                    <div class="gmail_extra">

                      <div class="gmail_quote">

                        <div>The reachability rules are more complex.</div>

                        <div>Is it possible to express other languages

                          in that set of access path rules?</div>

                        <div>Sure.  For example, you can, as above,

                          generate the set of answers, and then

                          re-encode it into access paths.</div>

                        <div><br>

                        </div>

                        <div>Right now, the work it takes *in the front

                          end* is minimal, and has a fairly efficient

                          space encoding.</div>

                        <div>if i want to say two things alias, i just

                          gotta be able to reach one from the other.</div>

                        <div>If i want to say two things do not alias, i

                          just gotta be able to not reach one from the

                          other only using certain types of edges</div>

                        <div>In our current language, all that takes in

                          a frontend is "find longest no-aliasing part

                          of tree. Go to parent, add new child".</div>

                        <div><br>

                        </div>

                        <div>In the proposed language, the lowering is

                          more complex.  Is it doable?</div>

                        <div>Sure, of course, not gonna claim

                          otherwise.  But the more "features" of the

                          access path you add and expect the middle end

                          to handle, instead of the front-end expanding

                          them, and the more those feature's

                          reachability rules are related to a specific

                          language the more language-specific it gets.</div>

                        <div><br>

                        </div>

                        <div>That's the *only* tradeoff we are making

                          here, representationally.   How much does the

                          frontend have to understand about how the

                          middle end walks these things, in order to

                          generate whatever it wants as precisely as

                          possible</div>

                        <div><br>

                        </div>

                        <div>We can make whichever we want express

                          everything we want in N^3 time :)</div>

                        <div>The real question is "do we try to add on

                          to what we have in ways that work for multiple

                          languages, and are expressed neutrally in a

                          simple reachability language"</div>

                        <div>or "do we add language-specific kinds of

                          nodes to the grammar, and have reachability

                          rules that are fairly language specific".</div>

                      </div>

                    </div>

                  </div>

                </blockquote>

                <br>

              </span> I don't want language-specific kinds of nodes in

              the grammar, and I believe that it's not necessary under

              some reasonable variant of this scheme. Maybe I'm wrong.<span

                class=""><br>

                <br>

                <blockquote type="cite">

                  <div dir="ltr">

                    <div class="gmail_extra">

                      <div class="gmail_quote">

                        <div><br>

                        </div>

                        <div>IE do you add, say, discriminated_union

                          nodes to our current representation for ada,

                          or "vtable_access" nodes to our current

                          representation for C++ vtable accesses</div>

                        <div>Or do you instead generate a metadata that

                          has a unidirectional edge reachability (IE up

                          only), or whatever it takes to do vtables

                          generically.</div>

                        <div><br>

                        </div>

                        <div>Both are completely and totally viable

                          paths, and it's all about which way you want

                          to go.</div>

                        <div>But they *definitely* have a difference in

                          terms of language-specificness.</div>

                        <div><br>

                        </div>

                      </div>

                    </div>

                  </div>

                </blockquote>

                <br>

              </span> To be clear, if the system is not generic, I'm far

              less interested.<span class=""><br>

                <br>

                Thanks again,<br>

                Hal<br>

                <br>

                <blockquote type="cite">

                  <div dir="ltr">

                    <div class="gmail_extra">

                      <div class="gmail_quote">

                        <div><br>

                        </div>

                        <div><br>

                        </div>

                        <div><br>

                        </div>

                        <div><br>

                        </div>

                      </div>

                    </div>

                  </div>

                </blockquote>

                <br>

                <pre class="m_-8107259744690938194moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

              </span></div>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </body>

</html>