<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 08/26/2016 11:28 PM, Daniel Berlin
      via llvm-dev wrote:<br>
    </div>
    <blockquote
cite="mid:CAF4BwTU9F56WUU2fQCUJns6aa6kM56ZoUn2rHs4GaoA237i_EA@mail.gmail.com"
      type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Fri, Aug 26, 2016 at 9:44 PM, Hal
            Finkel <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div>
                <div
style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:rgb(0,0,0)"><br>
                  <hr>
                  <blockquote style="border-left:2px solid
rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><b>From:
                    </b>"Daniel Berlin" <<a moz-do-not-send="true"
                      href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>><br>
                    <b>To: </b>"Quentin Colombet" <<a
                      moz-do-not-send="true"
                      href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>><br>
                    <b>Cc: </b>"Hal Finkel" <<a
                      moz-do-not-send="true"
                      href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>>,
                    "llvm-dev" <<a moz-do-not-send="true"
                      href="mailto:llvm-dev@lists.llvm.org"
                      target="_blank">llvm-dev@lists.llvm.org</a>><br>
                    <b>Sent: </b>Friday, August 26, 2016 11:06:56 PM<span
                      class=""><br>
                      <b>Subject: </b>Re: [llvm-dev] [RFC]
                      Interprocedural MIR-level outlining pass<br>
                      <br>
                    </span><span class="">
                      <div dir="ltr">FWIW: I'm with quentin. Without a
                        good value numbering analysis, this is a hard
                        problem.<br>
                      </div>
                    </span></blockquote>
                  How exactly does value numbering help here? This
                  problem seems to be about finding structurally-similar
                  parts of the computations of different values, not
                  identical values.</div>
              </div>
            </blockquote>
            <div>Note, first, the optimization we are talking about is
              not outlining, at least, as far as i've ever heard it.</div>
          </div>
        </div>
      </div>
    </blockquote>
    Minor terminology point...<br>
    <br>
    The *mechanism* used here is definitely function outlining.  The
    policy applied to select outline candidates is variant of code
    folding techniques.  Many times when we refer to "an optimization"
    we tend to mix policy and mechanism into the same thing.  I'm
    highlighting the difference only because I found Danny's response
    slightly confusing on first read and through others might as well. 
    <br>
    <br>
    Back to your regularly scheduled technical discussion...<br>
    <blockquote
cite="mid:CAF4BwTU9F56WUU2fQCUJns6aa6kM56ZoUn2rHs4GaoA237i_EA@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>function outlining/etc is about removing cold regions
              of functions to separate functions to make the hot
              portions smaller, more inlinable, etc.</div>
            <div><br>
            </div>
            <div>What is being talked about here is a variant of
               identical code folding, e.g., <a moz-do-not-send="true"
                href="http://research.google.com/pubs/pub36912.html">http://research.google.com/pubs/pub36912.html</a>
              and it's variants.</div>
            <div><br>
            </div>
            <div>Where identical varies (and is based on semantic
              equivalence).  Variants where you extract portions
              partially to merge functions are pretty much the  same
              optimization :)</div>
            <div><br>
            </div>
            <div>Here, value numbering helps because it gives you value
              expressions.</div>
            <div><br>
            </div>
            <div>It tells you:<br>
              <br>
            </div>
            <div>This operation is "add V1, V2".</div>
            <div>It does not matter what V1 and V2 are, they just "are
              abstract things". We know if these abstract things are
              equivalent or not.  </div>
            <div><br>
            </div>
            <div>More relevantly, these expressions form a value number
              dag.</div>
            <div><br>
            </div>
            <div>That is, they represent the computations being computed
              only in terms of operators, abstract variables, and other
              parts of the VNDAG</div>
            <div><br>
            </div>
            <div>The *actual values* of those computations, numbers,
              etc, do not matter (IE the abstract value numbers have
              some set of things *in the program* that are in the set of
              things equivalent to that value number).</div>
            <div><br>
            </div>
            <div>It is "are these semantically the same operations in
              the same order", regardless of the value of those
              operations.</div>
            <div>See, for example, the DAGS generated by:</div>
            <div><br>
            </div>
            <div><a moz-do-not-send="true"
href="https://www.researchgate.net/publication/221323142_An_Efficient_SSA-Based_Algorithm_for_Complete_Global_Value_Numbering">https://www.researchgate.net/publication/221323142_An_Efficient_SSA-Based_Algorithm_for_Complete_Global_Value_Numbering</a><br>
            </div>
            <div><br>
            </div>
            <div>Again, this is not something current GVN does, but
              NewGVN (and other GVN's do) can give you.<br>
            </div>
            <div><br>
            </div>
            <div>The question of what portions of a function you can
              merge are variants of common subgraph and graph
              isomorphism problems on a VNDAG.</div>
            <div><br>
            </div>
            <div>(because in the end, it doesn't matter where the
              computations are, it matters what they abstractly compute,
              and what they depend on)</div>
            <div><br>
            </div>
            <div> <br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div>
                <div
style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:rgb(0,0,0)">
                  It seems much closer to the analysis necessary for SLP
                  vectorization than value numbering, as such.<br>
                </div>
              </div>
            </blockquote>
            <div> <br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div>
                <div
style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:rgb(0,0,0)">I
                  think we might be able to use value numbering to help
                  with subset of SLP vectorization, but only because we
                  can define some kind of equivalence relation on
                  consecutive memory accesses (and similar). What you
                  need for this outlining seems more akin to the general
                  SLP-vectorization case. This might just be the maximal
                  common subgraph problem.</div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>If i have an add in a loop, and it's VN equivalent to
              an add outside a loop in some other function, i can still
              merge the functions :)</div>
            <div><br>
            </div>
            <div>It's how the value is generated and used that matters,
              not the structure of the program.</div>
            <div> </div>
            <div>So you can't just do it on a regular CFG or anything
              like that if you want good results.</div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div>
                <div
style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:rgb(0,0,0)"><span
                    class=""><font color="#888888"><br>
                       -Hal</font></span>
                  <div>
                    <div class="h5"><br>
                      <blockquote style="border-left:2px solid
rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt">
                        <div dir="ltr">
                          <div><br>
                          </div>
                          <div>GVN as it exists now doesn't really
                            provide what you want, and in fact, doesn't
                            value number a lot of instruction types
                            (including, for example, loads, stores,
                            calls, phis, etc).  It also relies on
                            ordering, and more importantly, it relies on
                            *not* being a strong analysis. If you were
                            to stop it from eliminating as it went, it
                            would catch maybe 10% of the redundancies it
                            does now.  So what you want is "i want to
                            know what globally is the same", it doesn't
                            really answer that well.</div>
                          <div><br>
                          </div>
                          <div>Doing the analysis Quentin wants is
                            pretty trivial in NewGVN (analysis and
                            elimination are split, everything is broken
                            into congruence classes explicitly, each
                            congruence class has an id, you could sub in
                            that id as the value for the terminated
                            string), but i'd agree that GVN as it exists
                            now will not do what they want, and would be
                            pretty hard to make work well.<br>
                          </div>
                          <div><br>
                          </div>
                          <div>(FWIW: Davide Italiano has started
                            breaking up newgvn into submittable pieces,
                            and is pretty far along to having a first
                            patch set I believe)</div>
                          <div><br>
                          </div>
                        </div>
                        <div class="gmail_extra"><br>
                          <div class="gmail_quote">On Fri, Aug 26, 2016
                            at 4:54 PM, Quentin Colombet via llvm-dev <span
                              dir="ltr"><<a moz-do-not-send="true"
                                href="mailto:llvm-dev@lists.llvm.org"
                                target="_blank">llvm-dev@lists.llvm.org</a>></span>
                            wrote:<br>
                            <blockquote class="gmail_quote"
                              style="margin:0pt 0pt 0pt
                              0.8ex;border-left:1px solid
                              rgb(204,204,204);padding-left:1ex">
                              <div style="word-wrap:break-word">Hi,
                                <div><br>
                                </div>
                                <div>I let Jessica give more details but
                                  here are some insights.</div>
                                <div><br>
                                </div>
                                <div>MIR offers a straight forward way
                                  to model benefits, because we know
                                  which instructions we remove and which
                                  one we add and there are no overhead
                                  of setting up parameters. Indeed,
                                  since the coloring will be the same
                                  between the different outlining
                                  candidates, the call is just a jump
                                  somewhere else. We do not have to
                                  worry about the impact of parameter
                                  passing and ABI.</div>
                                <div><br>
                                </div>
                                <div>So basically, better cost model.
                                  That's one part of the story.</div>
                                <div><br>
                                </div>
                                <div>The other part is at the LLVM IR
                                  level or before register allocation
                                  identifying similar code sequence is
                                  much harder, at least with a suffix
                                  tree like algorithm. Basically the
                                  problem is how do we name our
                                  instructions such that we can match
                                  them.</div>
                                <div>Let me take an example.</div>
                                <div>foo() {</div>
                                <div>/* bunch of code */</div>
                                <div>a = add b, c;</div>
                                <div>d = add e, f; </div>
                                <div>}</div>
                                <div><br>
                                </div>
                                <div>bar() {</div>
                                <div>d = add e, g;</div>
                                <div>f = add c, w;</div>
                                <div>}</div>
                                <div><br>
                                </div>
                                <div>With proper renaming, we can
                                  outline both adds in one function. The
                                  difficulty is to recognize that they
                                  are semantically equivalent to give
                                  them the same identifier in the suffix
                                  tree. I won’t get into the details but
                                  it gets tricky quickly. We were
                                  thinking of reusing GVN to have such
                                  identifier if we wanted to do the
                                  outlining at IR level but solving this
                                  problem is hard.</div>
                                <div><br>
                                </div>
                                <div>By running after regalloc, we
                                  basically have a heuristic that does
                                  this naming for us.</div>
                                <div><br>
                                </div>
                                <div>Cheers,</div>
                                <div>-Quentin </div>
                                <div>
                                  <div>
                                    <div><br>
                                    </div>
                                    <div><br>
                                    </div>
                                    <div>
                                      <div>
                                        <blockquote>
                                          <div>On Aug 26, 2016, at 3:01
                                            PM, Hal Finkel via llvm-dev
                                            <<a
                                              moz-do-not-send="true"
                                              href="mailto:llvm-dev@lists.llvm.org"
                                              target="_blank">llvm-dev@lists.llvm.org</a>>
                                            wrote:</div>
                                          <br>
                                          <div>
                                            <div
style="font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-family:arial,helvetica,sans-serif;font-size:10pt"><br>
                                              <hr>
                                              <blockquote
                                                style="border-left:2px
                                                solid
rgb(16,16,255);margin-left:5px;padding-left:5px;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><b>From:<span> </span></b>"Kevin
                                                Choi" <<a
                                                  moz-do-not-send="true"
href="mailto:code.kchoi@gmail.com" target="_blank">code.kchoi@gmail.com</a>><br>
                                                <b>To:<span> </span></b>"Hal
                                                Finkel" <<a
                                                  moz-do-not-send="true"
href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>><br>
                                                <b>Cc:<span> </span></b>"llvm-dev"
                                                <<a
                                                  moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>><br>
                                                <b>Sent:<span> </span></b>Friday,
                                                August 26, 2016 4:55:29
                                                PM<br>
                                                <b>Subject:<span> </span></b>Re:
                                                [llvm-dev] [RFC]
                                                Interprocedural
                                                MIR-level outlining pass<br>
                                                <br>
                                                <div dir="ltr">
                                                  <div>I think the
                                                    "Motivation" section
                                                    explained that.</div>
                                                </div>
                                              </blockquote>
                                              I don't think it explained
                                              it.<br>
                                              <blockquote
                                                style="border-left:2px
                                                solid
rgb(16,16,255);margin-left:5px;padding-left:5px;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt">
                                                <div dir="ltr">
                                                  <div>I too first
                                                    thought about "why
                                                    not at IR?" but the
                                                    reason looks like
                                                    MIR, post-RA has the
                                                    most accurate
                                                    heuristics (best way
                                                    to know looks like
                                                    actually getting
                                                    there).</div>
                                                </div>
                                              </blockquote>
                                              But also, potentially, the
                                              fewest opportunities.
                                              That's why I'm curious
                                              about the motivation - the
                                              trade offs are not obvious
                                              to me.<br>
                                              <br>
                                               -Hal<br>
                                              <blockquote
                                                style="border-left:2px
                                                solid
rgb(16,16,255);margin-left:5px;padding-left:5px;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt">
                                                <div dir="ltr">
                                                  <div><br>
                                                    <br>
                                                  </div>
                                                  <div>Do you know if
                                                    there is any
                                                    experimental pass
                                                    that relies on
                                                    deriving heuristics
                                                    by a feedback loop
                                                    after letting, ie. a
                                                    duplicate
                                                    module/function/block
                                                    continue past?<br>
                                                    <br>
                                                  </div>
                                                  <div>Regards,<br>
                                                  </div>
                                                  <div>Kevin<br>
                                                  </div>
                                                </div>
                                                <div class="gmail_extra"><br>
                                                  <div
                                                    class="gmail_quote">On
                                                    26 August 2016 at
                                                    14:36, Hal Finkel
                                                    via llvm-dev<span> </span><span
                                                      dir="ltr"><<a
                                                        moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.<wbr>org</a>></span><span> </span>wrote:<br>
                                                    <blockquote
                                                      class="gmail_quote"
                                                      style="margin:0pt
                                                      0pt 0pt
                                                      0.8ex;border-left:1px
                                                      solid
                                                      rgb(204,204,204);padding-left:1ex">Hi
                                                      Jessica,<br>
                                                      <br>
                                                      This is quite
                                                      interesting.<br>
                                                      <br>
                                                      Can you comment on
                                                      why you started by
                                                      doing this at the
                                                      MI level, as
                                                      opposed to the IR
                                                      level. And at the
                                                      MI level, why
                                                      after RA instead
                                                      of before RA?<br>
                                                      <br>
                                                      Perhaps the first
                                                      question is
                                                      another way of
                                                      asking about how
                                                      accurately we can
                                                      model call-site
                                                      code size at the
                                                      IR level?<br>
                                                      <br>
                                                      Thanks in advance,<br>
                                                      Hal<br>
                                                      <div>
                                                        <div><br>
                                                          <hr><br>
                                                          > From:
                                                          "Jessica
                                                          Paquette via
                                                          llvm-dev" <<a
moz-do-not-send="true" href="mailto:llvm-dev@lists.llvm.org"
                                                          target="_blank">llvm-dev@lists.llvm.org</a>><br>
                                                          > To:<span> </span><a
moz-do-not-send="true" href="mailto:llvm-dev@lists.llvm.org"
                                                          target="_blank">llvm-dev@lists.llvm.org</a><br>
                                                          > Sent:
                                                          Friday, August
                                                          26, 2016
                                                          4:26:09 PM<br>
                                                          > Subject:
                                                          [llvm-dev]
                                                          [RFC]
                                                          Interprocedural
                                                          MIR-level
                                                          outlining pass<br>
                                                          ><br>
                                                          > Hi
                                                          everyone,<br>
                                                          ><br>
                                                          > Since I
                                                          haven't said
                                                          anything on
                                                          the mailing
                                                          list before, a
                                                          quick<br>
                                                          >
                                                          introduction.
                                                          I'm an intern
                                                          at Apple, and
                                                          over the
                                                          summer I<br>
                                                          >
                                                          implemented a<br>
                                                          > prototype
                                                          for an
                                                          outlining pass
                                                          for code size
                                                          in LLVM. Now
                                                          I'm<br>
                                                          > seeking
                                                          to<br>
                                                          >
                                                          eventually
                                                          upstream it. I
                                                          have the
                                                          preliminary
                                                          code on GitHub
                                                          right<br>
                                                          > now,<br>
                                                          > but it's
                                                          still very
                                                          prototypical
                                                          (see the code
                                                          section).<br>
                                                          ><br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          >
                                                          Motivation<br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          > The goal
                                                          of the
                                                          internship was
                                                          to create an
                                                          interprocedural
                                                          pass that<br>
                                                          > would
                                                          reduce code
                                                          size as much
                                                          as possible,
                                                          perhaps at the
                                                          cost of<br>
                                                          > some<br>
                                                          >
                                                          performance.
                                                          This would be
                                                          useful to,
                                                          say, embedded
                                                          programmers
                                                          who<br>
                                                          > only<br>
                                                          > have a
                                                          few kilobytes
                                                          to work with
                                                          and a
                                                          substantial
                                                          amount of code
                                                          to<br>
                                                          > fit<br>
                                                          > in that
                                                          space.<br>
                                                          ><br>
                                                          ><br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          > Approach
                                                          and Initial
                                                          Results<br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          > To do
                                                          this, we chose
                                                          to implement
                                                          an outliner.
                                                          Outliners find<br>
                                                          > sequences
                                                          of<br>
                                                          >
                                                          instructions
                                                          which would be
                                                          better off as
                                                          a function
                                                          call, by some<br>
                                                          > measure<br>
                                                          > of
                                                          "better". In
                                                          this case, the
                                                          measure of
                                                          "better" is
                                                          "makes code<br>
                                                          > smaller".<br>
                                                          ><br>
                                                          ><br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          > Results<br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          > These
                                                          results are
                                                          from a fairly
                                                          recent 64-bit
                                                          Intel
                                                          processor,
                                                          using<br>
                                                          > a<br>
                                                          > version
                                                          of Clang
                                                          equipped with
                                                          the outliner
                                                          prototype
                                                          versus an<br>
                                                          >
                                                          equivalent<br>
                                                          > version
                                                          of Clang
                                                          without the
                                                          outliner.<br>
                                                          ><br>
                                                          > CODE SIZE
                                                          REDUCTION<br>
                                                          > For tests
                                                          >=4 Kb in
                                                          non-outlined
                                                          size, the
                                                          outliner
                                                          currently<br>
                                                          > provides
                                                          an<br>
                                                          > average
                                                          of 12.94% code
                                                          size reduction
                                                          on the LLVM
                                                          test suite in<br>
                                                          >
                                                          comparison<br>
                                                          > to a
                                                          default Clang,
                                                          up to 51.4%
                                                          code size
                                                          reduction. In
                                                          comparison to<br>
                                                          > a<br>
                                                          > Clang
                                                          with -Oz, the
                                                          outliner
                                                          provides an
                                                          average of a
                                                          1.29% code
                                                          size<br>
                                                          >
                                                          reduction, up
                                                          to a 37% code
                                                          size
                                                          reduction. I
                                                          believe that
                                                          the -Oz<br>
                                                          > numbers<br>
                                                          > can be
                                                          further
                                                          improved by
                                                          better tuning
                                                          the outlining
                                                          cost model.<br>
                                                          ><br>
                                                          > EXECUTION
                                                          TIME IMPACT<br>
                                                          > On
                                                          average, the
                                                          outliner
                                                          increases
                                                          execution time
                                                          by 2% on the
                                                          LLVM<br>
                                                          > test<br>
                                                          > suite,
                                                          but has been
                                                          also shown to
                                                          improve
                                                          exection time
                                                          by up to 16%.<br>
                                                          > These
                                                          results were
                                                          from a fairly
                                                          recent Intel
                                                          processor, so
                                                          the<br>
                                                          > results<br>
                                                          > may vary.
                                                          Recent Intel
                                                          processors
                                                          have very low
                                                          latency for
                                                          function<br>
                                                          > calls,
                                                          which may
                                                          impact these
                                                          results.
                                                          Execution time
                                                          improvements<br>
                                                          > are<br>
                                                          > likely
                                                          dependent on
                                                          the latency of
                                                          function
                                                          calls,
                                                          instruction<br>
                                                          > caching<br>
                                                          >
                                                          behaviour, and
                                                          the execution
                                                          frequency of
                                                          the code being
                                                          outlined. In<br>
                                                          >
                                                          partucular,
                                                          using a
                                                          processor with
                                                          heavy function
                                                          call latency
                                                          will<br>
                                                          > likely<br>
                                                          > increase
                                                          execution time
                                                          overhead.<br>
                                                          ><br>
                                                          ><br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          >
                                                          Implementation<br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          > The
                                                          outliner, in
                                                          its current
                                                          state, is a
                                                          MachineModulePass.
                                                          It finds<br>
                                                          >
                                                          *identical*
                                                          sequences of
                                                          MIR, after
                                                          register
                                                          allocation,
                                                          and pulls<br>
                                                          > them<br>
                                                          > out into
                                                          their own
                                                          functions.
                                                          Thus, it's
                                                          effectively
                                                          assembly-level.<br>
                                                          >
                                                          Ultimately,
                                                          the algorithm
                                                          used is
                                                          general, so it
                                                          can sit
                                                          anywhere,<br>
                                                          > but MIR<br>
                                                          > was very
                                                          convenient for
                                                          the time
                                                          being.<br>
                                                          ><br>
                                                          > It
                                                          requires two
                                                          data
                                                          structures.<br>
                                                          ><br>
                                                          > 1. A
                                                          generalized
                                                          suffix tree<br>
                                                          > 2. A
                                                          "terminated
                                                          string"<br>
                                                          ><br>
                                                          > 1: The
                                                          generalized
                                                          suffix tree is
                                                          constructed
                                                          using
                                                          Ukkonen's
                                                          linear<br>
                                                          > time<br>
                                                          >
                                                          construction
                                                          algorithm [1].
                                                          They require
                                                          linear space
                                                          and support<br>
                                                          >
                                                          linear-time
                                                          substring
                                                          queries. In
                                                          practice, the
                                                          construction
                                                          time for<br>
                                                          > the<br>
                                                          > suffix
                                                          tree is the
                                                          most time
                                                          consuming
                                                          part, but I
                                                          haven't
                                                          noticed a<br>
                                                          >
                                                          difference in
                                                          compilation
                                                          time on, say,
                                                          12 MB .ll
                                                          files.<br>
                                                          ><br>
                                                          > 2: To
                                                          support the
                                                          suffix tree,
                                                          we need a
                                                          "terminated
                                                          string." This
                                                          is<br>
                                                          > a<br>
                                                          >
                                                          generalized
                                                          string with an
                                                          unique
                                                          terminator
                                                          character
                                                          appended to<br>
                                                          > the<br>
                                                          > end.
                                                          TerminatedStrings
                                                          can be built
                                                          from any type.<br>
                                                          ><br>
                                                          > The
                                                          algorithm is
                                                          then roughly
                                                          as follows.<br>
                                                          ><br>
                                                          > 1. For
                                                          each
                                                          MachineBasicBlock
                                                          in the
                                                          program, build
                                                          a<br>
                                                          >
                                                          TerminatedString
                                                          for<br>
                                                          > that
                                                          block.<br>
                                                          > 2. Build
                                                          a suffix tree
                                                          for that
                                                          collection of
                                                          strings.<br>
                                                          > 3. Query
                                                          the suffix
                                                          tree for the
                                                          longest
                                                          repeated
                                                          substring and
                                                          place<br>
                                                          > that<br>
                                                          > string in
                                                          a candidate
                                                          list. Repeat
                                                          until none are
                                                          found.<br>
                                                          > 4. Create
                                                          functions for
                                                          each
                                                          candidate.<br>
                                                          > 5.
                                                          Replace each
                                                          candidate with
                                                          a call to its
                                                          respective
                                                          function.<br>
                                                          ><br>
                                                          >
                                                          Currently, the
                                                          program itself
                                                          isn't stored
                                                          in the suffix
                                                          tree, but<br>
                                                          > rather<br>
                                                          > a "proxy
                                                          string" of
                                                          integers. This
                                                          isn't
                                                          necessary at
                                                          the MIR level,<br>
                                                          > but<br>
                                                          > may be
                                                          for an IR
                                                          level
                                                          extension of
                                                          the algorithm.<br>
                                                          ><br>
                                                          ><br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          >
                                                          Challenges<br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          > 1) MEMORY
                                                          CONSUMPTION<br>
                                                          > Given a
                                                          string of
                                                          length n, a
                                                          naive suffix
                                                          tree
                                                          implementation
                                                          can<br>
                                                          > take up<br>
                                                          > to 40n
                                                          bytes of
                                                          memory.
                                                          However, this
                                                          number can be
                                                          reduced to 20n<br>
                                                          > with a<br>
                                                          > bit of
                                                          work [2].
                                                          Currently, the
                                                          suffix tree
                                                          stores the
                                                          entire<br>
                                                          > program,<br>
                                                          > including
                                                          instructions
                                                          which really
                                                          ought not to
                                                          be outlined,
                                                          such as<br>
                                                          > returns.
                                                          These
                                                          instructions
                                                          should not be
                                                          included in
                                                          the final<br>
                                                          >
                                                          implementation,
                                                          but should
                                                          rather act as
                                                          terminators
                                                          for the
                                                          strings.<br>
                                                          > This<br>
                                                          > will
                                                          likely curb
                                                          memory
                                                          consumption.
                                                          Suffix trees
                                                          have been used
                                                          in<br>
                                                          > the<br>
                                                          > past in
                                                          sliding-window-based
                                                          compression
                                                          schemes, which
                                                          may serve as<br>
                                                          > a<br>
                                                          > source of
                                                          inspiration
                                                          for reducing
                                                          memory
                                                          overhead.[3]<br>
                                                          ><br>
                                                          >
                                                          Nonetheless,
                                                          the outliner
                                                          probably
                                                          shouldn't be
                                                          run unless it
                                                          really<br>
                                                          > has<br>
                                                          > to be
                                                          run. It will
                                                          likely be used
                                                          mostly in
                                                          embedded
                                                          spaces, where<br>
                                                          > the<br>
                                                          > programs
                                                          have to fit
                                                          into small
                                                          devices
                                                          anyway. Thus,
                                                          memory
                                                          overhead<br>
                                                          > for<br>
                                                          > the
                                                          compiler
                                                          likely won't
                                                          be a problem.
                                                          The outliner
                                                          should only be<br>
                                                          > used<br>
                                                          > in -Oz
                                                          compilations,
                                                          and possibly
                                                          should have
                                                          its own flag.<br>
                                                          ><br>
                                                          ><br>
                                                          > 2)
                                                          EXECUTION TIME<br>
                                                          >
                                                          Currently, the
                                                          outliner isn't
                                                          tuned for
                                                          preventing
                                                          execution time<br>
                                                          >
                                                          increases.
                                                          There is an
                                                          average of a
                                                          2% execution
                                                          time hit on
                                                          the<br>
                                                          > tests in<br>
                                                          > the LLVM
                                                          test suite,
                                                          with a few
                                                          outliers of up
                                                          to 30%. The
                                                          outliers<br>
                                                          > are<br>
                                                          > tests
                                                          which contain
                                                          hot loops. The
                                                          outliner
                                                          really ought
                                                          to be able<br>
                                                          > to use<br>
                                                          > profiling
                                                          information
                                                          and not
                                                          outline from
                                                          hot areas.
                                                          Another<br>
                                                          >
                                                          suggestion<br>
                                                          > people
                                                          have given me
                                                          is to add a
                                                          "never
                                                          outline"
                                                          directive
                                                          which<br>
                                                          > would<br>
                                                          > allow the
                                                          user to say
                                                          something
                                                          along the
                                                          lines of "this
                                                          is a hot<br>
                                                          > loop,<br>
                                                          > please
                                                          never outline
                                                          from it".<br>
                                                          ><br>
                                                          > It's also
                                                          important to
                                                          note that
                                                          these numbers
                                                          are coming
                                                          from a<br>
                                                          > fairly<br>
                                                          > recent
                                                          Intel
                                                          processor.<br>
                                                          ><br>
                                                          ><br>
                                                          > 3)
                                                          CONSTRAINTS ON
                                                          INSTRUCTIONS<br>
                                                          > The
                                                          outliner
                                                          currently
                                                          won't pull
                                                          anything out
                                                          of functions
                                                          which use<br>
                                                          > a<br>
                                                          > red zone.
                                                          It also won't
                                                          pull anything
                                                          out that uses
                                                          the stack,<br>
                                                          >
                                                          instruction<br>
                                                          > pointer,
                                                          uses constant
                                                          pool indices,
                                                          CFI indices,
                                                          jump table
                                                          indices,<br>
                                                          > or<br>
                                                          > frame
                                                          indices. This
                                                          removes many
                                                          opportunities
                                                          for outlining
                                                          which<br>
                                                          > would<br>
                                                          > likely be
                                                          available at a
                                                          higher level
                                                          (such as IR).
                                                          Thus, there's
                                                          a<br>
                                                          > case<br>
                                                          > for
                                                          moving this up
                                                          to a higher
                                                          level.<br>
                                                          ><br>
                                                          ><br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          >
                                                          Additional
                                                          Applications<br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          > The
                                                          suffix tree
                                                          itself could
                                                          be used as a
                                                          tool for
                                                          finding<br>
                                                          >
                                                          opportunities<br>
                                                          > to
                                                          refactor code.
                                                          For example,
                                                          it could
                                                          recognize
                                                          places where
                                                          the<br>
                                                          > user<br>
                                                          > likely
                                                          copied and
                                                          pasted some
                                                          code. This
                                                          could be run
                                                          on codebases
                                                          to<br>
                                                          > find<br>
                                                          > areas
                                                          where people
                                                          could manually
                                                          outline things
                                                          at the source
                                                          level.<br>
                                                          ><br>
                                                          > Using the
                                                          terminated
                                                          string class,
                                                          it would also
                                                          be possible to<br>
                                                          > implement<br>
                                                          > other
                                                          string
                                                          algorithms on
                                                          code. This may
                                                          open the door
                                                          to new ways<br>
                                                          > to<br>
                                                          > analyze
                                                          existing
                                                          codebases.<br>
                                                          ><br>
                                                          ><br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          > Roadmap<br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          > The
                                                          current
                                                          outliner is
                                                          *very*
                                                          prototypical.
                                                          The version I
                                                          would want<br>
                                                          > to<br>
                                                          > upstream
                                                          would be a new
implementation. Here's what I'd like to<br>
                                                          > address<br>
                                                          > and
                                                          accomplish.<br>
                                                          ><br>
                                                          > 1. Ask
                                                          "what does the
                                                          LLVM community
                                                          want from an
                                                          outliner" and
                                                          use<br>
                                                          > that<br>
                                                          > to drive
                                                          development of
                                                          the algorithm.<br>
                                                          > 2.
                                                          Reimplement
                                                          the outliner,
                                                          perhaps using
                                                          a less
                                                          memory-intensve<br>
                                                          > data<br>
                                                          > structure
                                                          like a suffix
                                                          array.<br>
                                                          > 3. Begin
                                                          adding
                                                          features to
                                                          the algorithm,
                                                          for example:<br>
                                                          >     i. 
                                                           Teaching the
                                                          algorithm
                                                          about hot/cold
                                                          blocks of code
                                                          and<br>
                                                          >   
                                                           taking<br>
                                                          > that into
                                                          account.<br>
                                                          >     ii. 
                                                          Simple
                                                          parameter
                                                          passing.<br>
                                                          >     iii.
                                                          Similar
                                                          function
                                                          outlining--
                                                          eg, noticing
                                                          that two
                                                          outlining<br>
                                                          >
                                                          candidates are
                                                          similar and
                                                          can be merged
                                                          into one
                                                          function with
                                                          some<br>
                                                          > control
                                                          flow.<br>
                                                          ><br>
                                                          ><br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          > Code<br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          > Note:
                                                          This code
                                                          requires
                                                          MachineModulePasses<br>
                                                          ><br>
                                                          > * Main
                                                          pass:<br>
                                                          ><span> </span><a
moz-do-not-send="true"
href="https://github.com/ornata/llvm/blob/master/lib/CodeGen/MachineOutliner.h"
rel="noreferrer" target="_blank">https://github.com/ornata/<wbr>llvm/blob/master/lib/CodeGen/<wbr>MachineOutliner.h</a><br>
                                                          ><br>
                                                          > * Suffix
                                                          tree:<br>
                                                          ><span> </span><a
moz-do-not-send="true"
href="https://github.com/ornata/llvm/blob/master/include/llvm/ADT/SuffixTree.h"
rel="noreferrer" target="_blank">https://github.com/ornata/<wbr>llvm/blob/master/include/llvm/<wbr>ADT/SuffixTree.h</a><br>
                                                          ><br>
                                                          > *
                                                          TerminatedString
                                                          and
                                                          TerminatedStringList:<br>
                                                          ><span> </span><a
moz-do-not-send="true"
href="https://github.com/ornata/llvm/blob/master/include/llvm/ADT/TerminatedString.h"
rel="noreferrer" target="_blank">https://github.com/ornata/<wbr>llvm/blob/master/include/llvm/<wbr>ADT/TerminatedString.h</a><br>
                                                          ><br>
                                                          > Here are
                                                          a couple unit
                                                          tests for the
                                                          data
                                                          structures.<br>
                                                          ><br>
                                                          > * Suffix
                                                          tree unit
                                                          tests:<br>
                                                          ><span> </span><a
moz-do-not-send="true"
href="https://github.com/ornata/llvm/blob/master/unittests/ADT/SuffixTreeTest.cpp"
rel="noreferrer" target="_blank">https://github.com/ornata/<wbr>llvm/blob/master/unittests/<wbr>ADT/SuffixTreeTest.cpp</a><br>
                                                          ><br>
                                                          > *
                                                          TerminatedString
                                                          unit tests:<br>
                                                          ><span> </span><a
moz-do-not-send="true"
href="https://github.com/ornata/llvm/blob/master/unittests/ADT/TerminatedStringTest.cpp"
rel="noreferrer" target="_blank">https://github.com/ornata/<wbr>llvm/blob/master/unittests/<wbr>ADT/TerminatedStringTest.cpp</a><br>
                                                          ><br>
                                                          > *
                                                          TerminatedStringList
                                                          unit tests:<br>
                                                          ><span> </span><a
moz-do-not-send="true"
href="https://github.com/ornata/llvm/blob/master/unittests/ADT/TerminatedStringListTest.cpp"
rel="noreferrer" target="_blank">https://github.com/ornata/<wbr>llvm/blob/master/unittests/<wbr>ADT/TerminatedStringListTest.<wbr>cpp</a><br>
                                                          ><br>
                                                          ><br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          >
                                                          References<br>
                                                          >
                                                          ==============================<wbr>==<br>
                                                          > [1]
                                                          Ukkonen's
                                                          Algorithm:<br>
                                                          ><span> </span><a
moz-do-not-send="true"
                                                          href="https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf"
rel="noreferrer" target="_blank">https://www.cs.helsinki.fi/<wbr>u/ukkonen/SuffixT1withFigs.pdf</a><br>
                                                          > [2]
                                                          Suffix Trees
                                                          and Suffix
                                                          Arrays:<br>
                                                          ><span> </span><a
moz-do-not-send="true"
                                                          href="http://web.cs.iastate.edu/%7Ecs548/suffix.pdf"
rel="noreferrer" target="_blank">http://web.cs.iastate.edu/~<wbr>cs548/suffix.pdf</a><br>
                                                          > [3]
                                                          Extended
                                                          Application of
                                                          Suffix Trees
                                                          to Data
                                                          Compression:<br>
                                                          ><span> </span><a
moz-do-not-send="true" href="http://www.larsson.dogma.net/dccpaper.pdf"
rel="noreferrer" target="_blank">http://www.larsson.dogma.<wbr>net/dccpaper.pdf</a><br>
                                                          ><br>
                                                          ><br>
                                                          > Thanks
                                                          for reading,<br>
                                                          > Jessica<br>
                                                          ><br>
                                                          >
                                                          ______________________________<wbr>_________________<br>
                                                          > LLVM
                                                          Developers
                                                          mailing list<br>
                                                          ><span> </span><a
moz-do-not-send="true" href="mailto:llvm-dev@lists.llvm.org"
                                                          target="_blank">llvm-dev@lists.llvm.org</a><br>
                                                          ><span> </span><a
moz-do-not-send="true"
                                                          href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-<wbr>bin/mailman/listinfo/llvm-dev</a><br>
                                                          ><br>
                                                          <br>
                                                        </div>
                                                      </div>
                                                      <span><font
                                                          color="#888888">--<br>
                                                          Hal Finkel<br>
                                                          Assistant
                                                          Computational
                                                          Scientist<br>
                                                          Leadership
                                                          Computing
                                                          Facility<br>
                                                          Argonne
                                                          National
                                                          Laboratory<br>
                                                        </font></span>
                                                      <div>
                                                        <div>______________________________<wbr>_________________<br>
                                                          LLVM
                                                          Developers
                                                          mailing list<br>
                                                          <a
                                                          moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
                                                          <a
                                                          moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
                                                          rel="noreferrer"
target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
                                                        </div>
                                                      </div>
                                                    </blockquote>
                                                  </div>
                                                  <br>
                                                </div>
                                              </blockquote>
                                              <br>
                                              <br>
                                              <br>
                                              --<span> </span><br>
                                              <div><span></span>Hal
                                                Finkel<br>
                                                Assistant Computational
                                                Scientist<br>
                                                Leadership Computing
                                                Facility<br>
                                                Argonne National
                                                Laboratory<span></span><br>
                                              </div>
                                            </div>
                                            <span
style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">______________________________<wbr>_________________</span><br
style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
                                            <span
style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">LLVM
                                              Developers mailing list</span><br
style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
                                            <a moz-do-not-send="true"
                                              href="mailto:llvm-dev@lists.llvm.org"
style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"
                                              target="_blank">llvm-dev@lists.llvm.org</a><br
style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
                                            <a moz-do-not-send="true"
                                              href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"
                                              target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a></div>
                                        </blockquote>
                                      </div>
                                      <br>
                                    </div>
                                  </div>
                                </div>
                              </div>
                              <br>
                              ______________________________<wbr>_________________<br>
                              LLVM Developers mailing list<br>
                              <a moz-do-not-send="true"
                                href="mailto:llvm-dev@lists.llvm.org"
                                target="_blank">llvm-dev@lists.llvm.org</a><br>
                              <a moz-do-not-send="true"
                                href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
                                rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
                              <br>
                            </blockquote>
                          </div>
                          <br>
                        </div>
                      </blockquote>
                      <br>
                      <br>
                      <br>
                      -- <br>
                      <div><span name="x"></span>Hal Finkel<br>
                        Assistant Computational Scientist<br>
                        Leadership Computing Facility<br>
                        Argonne National Laboratory<span name="x"></span><br>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>