<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 08/26/2016 11:28 PM, Daniel Berlin

      via llvm-dev wrote:<br>

    </div>

    <blockquote

cite="mid:CAF4BwTU9F56WUU2fQCUJns6aa6kM56ZoUn2rHs4GaoA237i_EA@mail.gmail.com"

      type="cite">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Fri, Aug 26, 2016 at 9:44 PM, Hal

            Finkel <span dir="ltr"><<a moz-do-not-send="true"

                href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div>

                <div

style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:rgb(0,0,0)"><br>

                  <hr>

                  <blockquote style="border-left:2px solid

rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><b>From:

                    </b>"Daniel Berlin" <<a moz-do-not-send="true"

                      href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>><br>

                    <b>To: </b>"Quentin Colombet" <<a

                      moz-do-not-send="true"

                      href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>><br>

                    <b>Cc: </b>"Hal Finkel" <<a

                      moz-do-not-send="true"

                      href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>>,

                    "llvm-dev" <<a moz-do-not-send="true"

                      href="mailto:llvm-dev@lists.llvm.org"

                      target="_blank">llvm-dev@lists.llvm.org</a>><br>

                    <b>Sent: </b>Friday, August 26, 2016 11:06:56 PM<span

                      class=""><br>

                      <b>Subject: </b>Re: [llvm-dev] [RFC]

                      Interprocedural MIR-level outlining pass<br>

                      <br>

                    </span><span class="">

                      <div dir="ltr">FWIW: I'm with quentin. Without a

                        good value numbering analysis, this is a hard

                        problem.<br>

                      </div>

                    </span></blockquote>

                  How exactly does value numbering help here? This

                  problem seems to be about finding structurally-similar

                  parts of the computations of different values, not

                  identical values.</div>

              </div>

            </blockquote>

            <div>Note, first, the optimization we are talking about is

              not outlining, at least, as far as i've ever heard it.</div>

          </div>

        </div>

      </div>

    </blockquote>

    Minor terminology point...<br>

    <br>

    The *mechanism* used here is definitely function outlining.  The

    policy applied to select outline candidates is variant of code

    folding techniques.  Many times when we refer to "an optimization"

    we tend to mix policy and mechanism into the same thing.  I'm

    highlighting the difference only because I found Danny's response

    slightly confusing on first read and through others might as well. 

    <br>

    <br>

    Back to your regularly scheduled technical discussion...<br>

    <blockquote

cite="mid:CAF4BwTU9F56WUU2fQCUJns6aa6kM56ZoUn2rHs4GaoA237i_EA@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>function outlining/etc is about removing cold regions

              of functions to separate functions to make the hot

              portions smaller, more inlinable, etc.</div>

            <div><br>

            </div>

            <div>What is being talked about here is a variant of

               identical code folding, e.g., <a moz-do-not-send="true"

                href="http://research.google.com/pubs/pub36912.html">http://research.google.com/pubs/pub36912.html</a>

              and it's variants.</div>

            <div><br>

            </div>

            <div>Where identical varies (and is based on semantic

              equivalence).  Variants where you extract portions

              partially to merge functions are pretty much the  same

              optimization :)</div>

            <div><br>

            </div>

            <div>Here, value numbering helps because it gives you value

              expressions.</div>

            <div><br>

            </div>

            <div>It tells you:<br>

              <br>

            </div>

            <div>This operation is "add V1, V2".</div>

            <div>It does not matter what V1 and V2 are, they just "are

              abstract things". We know if these abstract things are

              equivalent or not.  </div>

            <div><br>

            </div>

            <div>More relevantly, these expressions form a value number

              dag.</div>

            <div><br>

            </div>

            <div>That is, they represent the computations being computed

              only in terms of operators, abstract variables, and other

              parts of the VNDAG</div>

            <div><br>

            </div>

            <div>The *actual values* of those computations, numbers,

              etc, do not matter (IE the abstract value numbers have

              some set of things *in the program* that are in the set of

              things equivalent to that value number).</div>

            <div><br>

            </div>

            <div>It is "are these semantically the same operations in

              the same order", regardless of the value of those

              operations.</div>

            <div>See, for example, the DAGS generated by:</div>

            <div><br>

            </div>

            <div><a moz-do-not-send="true"

href="https://www.researchgate.net/publication/221323142_An_Efficient_SSA-Based_Algorithm_for_Complete_Global_Value_Numbering">https://www.researchgate.net/publication/221323142_An_Efficient_SSA-Based_Algorithm_for_Complete_Global_Value_Numbering</a><br>

            </div>

            <div><br>

            </div>

            <div>Again, this is not something current GVN does, but

              NewGVN (and other GVN's do) can give you.<br>

            </div>

            <div><br>

            </div>

            <div>The question of what portions of a function you can

              merge are variants of common subgraph and graph

              isomorphism problems on a VNDAG.</div>

            <div><br>

            </div>

            <div>(because in the end, it doesn't matter where the

              computations are, it matters what they abstractly compute,

              and what they depend on)</div>

            <div><br>

            </div>

            <div> <br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div>

                <div

style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:rgb(0,0,0)">

                  It seems much closer to the analysis necessary for SLP

                  vectorization than value numbering, as such.<br>

                </div>

              </div>

            </blockquote>

            <div> <br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div>

                <div

style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:rgb(0,0,0)">I

                  think we might be able to use value numbering to help

                  with subset of SLP vectorization, but only because we

                  can define some kind of equivalence relation on

                  consecutive memory accesses (and similar). What you

                  need for this outlining seems more akin to the general

                  SLP-vectorization case. This might just be the maximal

                  common subgraph problem.</div>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>If i have an add in a loop, and it's VN equivalent to

              an add outside a loop in some other function, i can still

              merge the functions :)</div>

            <div><br>

            </div>

            <div>It's how the value is generated and used that matters,

              not the structure of the program.</div>

            <div> </div>

            <div>So you can't just do it on a regular CFG or anything

              like that if you want good results.</div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div>

                <div

style="font-family:arial,helvetica,sans-serif;font-size:10pt;color:rgb(0,0,0)"><span

                    class=""><font color="#888888"><br>

                       -Hal</font></span>

                  <div>

                    <div class="h5"><br>

                      <blockquote style="border-left:2px solid

rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt">

                        <div dir="ltr">

                          <div><br>

                          </div>

                          <div>GVN as it exists now doesn't really

                            provide what you want, and in fact, doesn't

                            value number a lot of instruction types

                            (including, for example, loads, stores,

                            calls, phis, etc).  It also relies on

                            ordering, and more importantly, it relies on

                            *not* being a strong analysis. If you were

                            to stop it from eliminating as it went, it

                            would catch maybe 10% of the redundancies it

                            does now.  So what you want is "i want to

                            know what globally is the same", it doesn't

                            really answer that well.</div>

                          <div><br>

                          </div>

                          <div>Doing the analysis Quentin wants is

                            pretty trivial in NewGVN (analysis and

                            elimination are split, everything is broken

                            into congruence classes explicitly, each

                            congruence class has an id, you could sub in

                            that id as the value for the terminated

                            string), but i'd agree that GVN as it exists

                            now will not do what they want, and would be

                            pretty hard to make work well.<br>

                          </div>

                          <div><br>

                          </div>

                          <div>(FWIW: Davide Italiano has started

                            breaking up newgvn into submittable pieces,

                            and is pretty far along to having a first

                            patch set I believe)</div>

                          <div><br>

                          </div>

                        </div>

                        <div class="gmail_extra"><br>

                          <div class="gmail_quote">On Fri, Aug 26, 2016

                            at 4:54 PM, Quentin Colombet via llvm-dev <span

                              dir="ltr"><<a moz-do-not-send="true"

                                href="mailto:llvm-dev@lists.llvm.org"

                                target="_blank">llvm-dev@lists.llvm.org</a>></span>

                            wrote:<br>

                            <blockquote class="gmail_quote"

                              style="margin:0pt 0pt 0pt

                              0.8ex;border-left:1px solid

                              rgb(204,204,204);padding-left:1ex">

                              <div style="word-wrap:break-word">Hi,

                                <div><br>

                                </div>

                                <div>I let Jessica give more details but

                                  here are some insights.</div>

                                <div><br>

                                </div>

                                <div>MIR offers a straight forward way

                                  to model benefits, because we know

                                  which instructions we remove and which

                                  one we add and there are no overhead

                                  of setting up parameters. Indeed,

                                  since the coloring will be the same

                                  between the different outlining

                                  candidates, the call is just a jump

                                  somewhere else. We do not have to

                                  worry about the impact of parameter

                                  passing and ABI.</div>

                                <div><br>

                                </div>

                                <div>So basically, better cost model.

                                  That's one part of the story.</div>

                                <div><br>

                                </div>

                                <div>The other part is at the LLVM IR

                                  level or before register allocation

                                  identifying similar code sequence is

                                  much harder, at least with a suffix

                                  tree like algorithm. Basically the

                                  problem is how do we name our

                                  instructions such that we can match

                                  them.</div>

                                <div>Let me take an example.</div>

                                <div>foo() {</div>

                                <div>/* bunch of code */</div>

                                <div>a = add b, c;</div>

                                <div>d = add e, f; </div>

                                <div>}</div>

                                <div><br>

                                </div>

                                <div>bar() {</div>

                                <div>d = add e, g;</div>

                                <div>f = add c, w;</div>

                                <div>}</div>

                                <div><br>

                                </div>

                                <div>With proper renaming, we can

                                  outline both adds in one function. The

                                  difficulty is to recognize that they

                                  are semantically equivalent to give

                                  them the same identifier in the suffix

                                  tree. I won’t get into the details but

                                  it gets tricky quickly. We were

                                  thinking of reusing GVN to have such

                                  identifier if we wanted to do the

                                  outlining at IR level but solving this

                                  problem is hard.</div>

                                <div><br>

                                </div>

                                <div>By running after regalloc, we

                                  basically have a heuristic that does

                                  this naming for us.</div>

                                <div><br>

                                </div>

                                <div>Cheers,</div>

                                <div>-Quentin </div>

                                <div>

                                  <div>

                                    <div><br>

                                    </div>

                                    <div><br>

                                    </div>

                                    <div>

                                      <div>

                                        <blockquote>

                                          <div>On Aug 26, 2016, at 3:01

                                            PM, Hal Finkel via llvm-dev

                                            <<a

                                              moz-do-not-send="true"

                                              href="mailto:llvm-dev@lists.llvm.org"

                                              target="_blank">llvm-dev@lists.llvm.org</a>>

                                            wrote:</div>

                                          <br>

                                          <div>

                                            <div

style="font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-family:arial,helvetica,sans-serif;font-size:10pt"><br>

                                              <hr>

                                              <blockquote

                                                style="border-left:2px

                                                solid

rgb(16,16,255);margin-left:5px;padding-left:5px;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><b>From:<span> </span></b>"Kevin

                                                Choi" <<a

                                                  moz-do-not-send="true"

href="mailto:code.kchoi@gmail.com" target="_blank">code.kchoi@gmail.com</a>><br>

                                                <b>To:<span> </span></b>"Hal

                                                Finkel" <<a

                                                  moz-do-not-send="true"

href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>><br>

                                                <b>Cc:<span> </span></b>"llvm-dev"

                                                <<a

                                                  moz-do-not-send="true"

href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>><br>

                                                <b>Sent:<span> </span></b>Friday,

                                                August 26, 2016 4:55:29

                                                PM<br>

                                                <b>Subject:<span> </span></b>Re:

                                                [llvm-dev] [RFC]

                                                Interprocedural

                                                MIR-level outlining pass<br>

                                                <br>

                                                <div dir="ltr">

                                                  <div>I think the

                                                    "Motivation" section

                                                    explained that.</div>

                                                </div>

                                              </blockquote>

                                              I don't think it explained

                                              it.<br>

                                              <blockquote

                                                style="border-left:2px

                                                solid

rgb(16,16,255);margin-left:5px;padding-left:5px;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt">

                                                <div dir="ltr">

                                                  <div>I too first

                                                    thought about "why

                                                    not at IR?" but the

                                                    reason looks like

                                                    MIR, post-RA has the

                                                    most accurate

                                                    heuristics (best way

                                                    to know looks like

                                                    actually getting

                                                    there).</div>

                                                </div>

                                              </blockquote>

                                              But also, potentially, the

                                              fewest opportunities.

                                              That's why I'm curious

                                              about the motivation - the

                                              trade offs are not obvious

                                              to me.<br>

                                              <br>

                                               -Hal<br>

                                              <blockquote

                                                style="border-left:2px

                                                solid

rgb(16,16,255);margin-left:5px;padding-left:5px;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt">

                                                <div dir="ltr">

                                                  <div><br>

                                                    <br>

                                                  </div>

                                                  <div>Do you know if

                                                    there is any

                                                    experimental pass

                                                    that relies on

                                                    deriving heuristics

                                                    by a feedback loop

                                                    after letting, ie. a

                                                    duplicate

                                                    module/function/block

                                                    continue past?<br>

                                                    <br>

                                                  </div>

                                                  <div>Regards,<br>

                                                  </div>

                                                  <div>Kevin<br>

                                                  </div>

                                                </div>

                                                <div class="gmail_extra"><br>

                                                  <div

                                                    class="gmail_quote">On

                                                    26 August 2016 at

                                                    14:36, Hal Finkel

                                                    via llvm-dev<span> </span><span

                                                      dir="ltr"><<a

                                                        moz-do-not-send="true"

href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.<wbr>org</a>></span><span> </span>wrote:<br>

                                                    <blockquote

                                                      class="gmail_quote"

                                                      style="margin:0pt

                                                      0pt 0pt

                                                      0.8ex;border-left:1px

                                                      solid

                                                      rgb(204,204,204);padding-left:1ex">Hi

                                                      Jessica,<br>

                                                      <br>

                                                      This is quite

                                                      interesting.<br>

                                                      <br>

                                                      Can you comment on

                                                      why you started by

                                                      doing this at the

                                                      MI level, as

                                                      opposed to the IR

                                                      level. And at the

                                                      MI level, why

                                                      after RA instead

                                                      of before RA?<br>

                                                      <br>

                                                      Perhaps the first

                                                      question is

                                                      another way of

                                                      asking about how

                                                      accurately we can

                                                      model call-site

                                                      code size at the

                                                      IR level?<br>

                                                      <br>

                                                      Thanks in advance,<br>

                                                      Hal<br>

                                                      <div>

                                                        <div><br>

                                                          <hr><br>

                                                          > From:

                                                          "Jessica

                                                          Paquette via

                                                          llvm-dev" <<a

moz-do-not-send="true" href="mailto:llvm-dev@lists.llvm.org"

                                                          target="_blank">llvm-dev@lists.llvm.org</a>><br>

                                                          > To:<span> </span><a

moz-do-not-send="true" href="mailto:llvm-dev@lists.llvm.org"

                                                          target="_blank">llvm-dev@lists.llvm.org</a><br>

                                                          > Sent:

                                                          Friday, August

                                                          26, 2016

                                                          4:26:09 PM<br>

                                                          > Subject:

                                                          [llvm-dev]

                                                          [RFC]

                                                          Interprocedural

                                                          MIR-level

                                                          outlining pass<br>

                                                          ><br>

                                                          > Hi

                                                          everyone,<br>

                                                          ><br>

                                                          > Since I

                                                          haven't said

                                                          anything on

                                                          the mailing

                                                          list before, a

                                                          quick<br>

                                                          >

                                                          introduction.

                                                          I'm an intern

                                                          at Apple, and

                                                          over the

                                                          summer I<br>

                                                          >

                                                          implemented a<br>

                                                          > prototype

                                                          for an

                                                          outlining pass

                                                          for code size

                                                          in LLVM. Now

                                                          I'm<br>

                                                          > seeking

                                                          to<br>

                                                          >

                                                          eventually

                                                          upstream it. I

                                                          have the

                                                          preliminary

                                                          code on GitHub

                                                          right<br>

                                                          > now,<br>

                                                          > but it's

                                                          still very

                                                          prototypical

                                                          (see the code

                                                          section).<br>

                                                          ><br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          >

                                                          Motivation<br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          > The goal

                                                          of the

                                                          internship was

                                                          to create an

                                                          interprocedural

                                                          pass that<br>

                                                          > would

                                                          reduce code

                                                          size as much

                                                          as possible,

                                                          perhaps at the

                                                          cost of<br>

                                                          > some<br>

                                                          >

                                                          performance.

                                                          This would be

                                                          useful to,

                                                          say, embedded

                                                          programmers

                                                          who<br>

                                                          > only<br>

                                                          > have a

                                                          few kilobytes

                                                          to work with

                                                          and a

                                                          substantial

                                                          amount of code

                                                          to<br>

                                                          > fit<br>

                                                          > in that

                                                          space.<br>

                                                          ><br>

                                                          ><br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          > Approach

                                                          and Initial

                                                          Results<br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          > To do

                                                          this, we chose

                                                          to implement

                                                          an outliner.

                                                          Outliners find<br>

                                                          > sequences

                                                          of<br>

                                                          >

                                                          instructions

                                                          which would be

                                                          better off as

                                                          a function

                                                          call, by some<br>

                                                          > measure<br>

                                                          > of

                                                          "better". In

                                                          this case, the

                                                          measure of

                                                          "better" is

                                                          "makes code<br>

                                                          > smaller".<br>

                                                          ><br>

                                                          ><br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          > Results<br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          > These

                                                          results are

                                                          from a fairly

                                                          recent 64-bit

                                                          Intel

                                                          processor,

                                                          using<br>

                                                          > a<br>

                                                          > version

                                                          of Clang

                                                          equipped with

                                                          the outliner

                                                          prototype

                                                          versus an<br>

                                                          >

                                                          equivalent<br>

                                                          > version

                                                          of Clang

                                                          without the

                                                          outliner.<br>

                                                          ><br>

                                                          > CODE SIZE

                                                          REDUCTION<br>

                                                          > For tests

                                                          >=4 Kb in

                                                          non-outlined

                                                          size, the

                                                          outliner

                                                          currently<br>

                                                          > provides

                                                          an<br>

                                                          > average

                                                          of 12.94% code

                                                          size reduction

                                                          on the LLVM

                                                          test suite in<br>

                                                          >

                                                          comparison<br>

                                                          > to a

                                                          default Clang,

                                                          up to 51.4%

                                                          code size

                                                          reduction. In

                                                          comparison to<br>

                                                          > a<br>

                                                          > Clang

                                                          with -Oz, the

                                                          outliner

                                                          provides an

                                                          average of a

                                                          1.29% code

                                                          size<br>

                                                          >

                                                          reduction, up

                                                          to a 37% code

                                                          size

                                                          reduction. I

                                                          believe that

                                                          the -Oz<br>

                                                          > numbers<br>

                                                          > can be

                                                          further

                                                          improved by

                                                          better tuning

                                                          the outlining

                                                          cost model.<br>

                                                          ><br>

                                                          > EXECUTION

                                                          TIME IMPACT<br>

                                                          > On

                                                          average, the

                                                          outliner

                                                          increases

                                                          execution time

                                                          by 2% on the

                                                          LLVM<br>

                                                          > test<br>

                                                          > suite,

                                                          but has been

                                                          also shown to

                                                          improve

                                                          exection time

                                                          by up to 16%.<br>

                                                          > These

                                                          results were

                                                          from a fairly

                                                          recent Intel

                                                          processor, so

                                                          the<br>

                                                          > results<br>

                                                          > may vary.

                                                          Recent Intel

                                                          processors

                                                          have very low

                                                          latency for

                                                          function<br>

                                                          > calls,

                                                          which may

                                                          impact these

                                                          results.

                                                          Execution time

                                                          improvements<br>

                                                          > are<br>

                                                          > likely

                                                          dependent on

                                                          the latency of

                                                          function

                                                          calls,

                                                          instruction<br>

                                                          > caching<br>

                                                          >

                                                          behaviour, and

                                                          the execution

                                                          frequency of

                                                          the code being

                                                          outlined. In<br>

                                                          >

                                                          partucular,

                                                          using a

                                                          processor with

                                                          heavy function

                                                          call latency

                                                          will<br>

                                                          > likely<br>

                                                          > increase

                                                          execution time

                                                          overhead.<br>

                                                          ><br>

                                                          ><br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          >

                                                          Implementation<br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          > The

                                                          outliner, in

                                                          its current

                                                          state, is a

                                                          MachineModulePass.

                                                          It finds<br>

                                                          >

                                                          *identical*

                                                          sequences of

                                                          MIR, after

                                                          register

                                                          allocation,

                                                          and pulls<br>

                                                          > them<br>

                                                          > out into

                                                          their own

                                                          functions.

                                                          Thus, it's

                                                          effectively

                                                          assembly-level.<br>

                                                          >

                                                          Ultimately,

                                                          the algorithm

                                                          used is

                                                          general, so it

                                                          can sit

                                                          anywhere,<br>

                                                          > but MIR<br>

                                                          > was very

                                                          convenient for

                                                          the time

                                                          being.<br>

                                                          ><br>

                                                          > It

                                                          requires two

                                                          data

                                                          structures.<br>

                                                          ><br>

                                                          > 1. A

                                                          generalized

                                                          suffix tree<br>

                                                          > 2. A

                                                          "terminated

                                                          string"<br>

                                                          ><br>

                                                          > 1: The

                                                          generalized

                                                          suffix tree is

                                                          constructed

                                                          using

                                                          Ukkonen's

                                                          linear<br>

                                                          > time<br>

                                                          >

                                                          construction

                                                          algorithm [1].

                                                          They require

                                                          linear space

                                                          and support<br>

                                                          >

                                                          linear-time

                                                          substring

                                                          queries. In

                                                          practice, the

                                                          construction

                                                          time for<br>

                                                          > the<br>

                                                          > suffix

                                                          tree is the

                                                          most time

                                                          consuming

                                                          part, but I

                                                          haven't

                                                          noticed a<br>

                                                          >

                                                          difference in

                                                          compilation

                                                          time on, say,

                                                          12 MB .ll

                                                          files.<br>

                                                          ><br>

                                                          > 2: To

                                                          support the

                                                          suffix tree,

                                                          we need a

                                                          "terminated

                                                          string." This

                                                          is<br>

                                                          > a<br>

                                                          >

                                                          generalized

                                                          string with an

                                                          unique

                                                          terminator

                                                          character

                                                          appended to<br>

                                                          > the<br>

                                                          > end.

                                                          TerminatedStrings

                                                          can be built

                                                          from any type.<br>

                                                          ><br>

                                                          > The

                                                          algorithm is

                                                          then roughly

                                                          as follows.<br>

                                                          ><br>

                                                          > 1. For

                                                          each

                                                          MachineBasicBlock

                                                          in the

                                                          program, build

                                                          a<br>

                                                          >

                                                          TerminatedString

                                                          for<br>

                                                          > that

                                                          block.<br>

                                                          > 2. Build

                                                          a suffix tree

                                                          for that

                                                          collection of

                                                          strings.<br>

                                                          > 3. Query

                                                          the suffix

                                                          tree for the

                                                          longest

                                                          repeated

                                                          substring and

                                                          place<br>

                                                          > that<br>

                                                          > string in

                                                          a candidate

                                                          list. Repeat

                                                          until none are

                                                          found.<br>

                                                          > 4. Create

                                                          functions for

                                                          each

                                                          candidate.<br>

                                                          > 5.

                                                          Replace each

                                                          candidate with

                                                          a call to its

                                                          respective

                                                          function.<br>

                                                          ><br>

                                                          >

                                                          Currently, the

                                                          program itself

                                                          isn't stored

                                                          in the suffix

                                                          tree, but<br>

                                                          > rather<br>

                                                          > a "proxy

                                                          string" of

                                                          integers. This

                                                          isn't

                                                          necessary at

                                                          the MIR level,<br>

                                                          > but<br>

                                                          > may be

                                                          for an IR

                                                          level

                                                          extension of

                                                          the algorithm.<br>

                                                          ><br>

                                                          ><br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          >

                                                          Challenges<br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          > 1) MEMORY

                                                          CONSUMPTION<br>

                                                          > Given a

                                                          string of

                                                          length n, a

                                                          naive suffix

                                                          tree

                                                          implementation

                                                          can<br>

                                                          > take up<br>

                                                          > to 40n

                                                          bytes of

                                                          memory.

                                                          However, this

                                                          number can be

                                                          reduced to 20n<br>

                                                          > with a<br>

                                                          > bit of

                                                          work [2].

                                                          Currently, the

                                                          suffix tree

                                                          stores the

                                                          entire<br>

                                                          > program,<br>

                                                          > including

                                                          instructions

                                                          which really

                                                          ought not to

                                                          be outlined,

                                                          such as<br>

                                                          > returns.

                                                          These

                                                          instructions

                                                          should not be

                                                          included in

                                                          the final<br>

                                                          >

                                                          implementation,

                                                          but should

                                                          rather act as

                                                          terminators

                                                          for the

                                                          strings.<br>

                                                          > This<br>

                                                          > will

                                                          likely curb

                                                          memory

                                                          consumption.

                                                          Suffix trees

                                                          have been used

                                                          in<br>

                                                          > the<br>

                                                          > past in

                                                          sliding-window-based

                                                          compression

                                                          schemes, which

                                                          may serve as<br>

                                                          > a<br>

                                                          > source of

                                                          inspiration

                                                          for reducing

                                                          memory

                                                          overhead.[3]<br>

                                                          ><br>

                                                          >

                                                          Nonetheless,

                                                          the outliner

                                                          probably

                                                          shouldn't be

                                                          run unless it

                                                          really<br>

                                                          > has<br>

                                                          > to be

                                                          run. It will

                                                          likely be used

                                                          mostly in

                                                          embedded

                                                          spaces, where<br>

                                                          > the<br>

                                                          > programs

                                                          have to fit

                                                          into small

                                                          devices

                                                          anyway. Thus,

                                                          memory

                                                          overhead<br>

                                                          > for<br>

                                                          > the

                                                          compiler

                                                          likely won't

                                                          be a problem.

                                                          The outliner

                                                          should only be<br>

                                                          > used<br>

                                                          > in -Oz

                                                          compilations,

                                                          and possibly

                                                          should have

                                                          its own flag.<br>

                                                          ><br>

                                                          ><br>

                                                          > 2)

                                                          EXECUTION TIME<br>

                                                          >

                                                          Currently, the

                                                          outliner isn't

                                                          tuned for

                                                          preventing

                                                          execution time<br>

                                                          >

                                                          increases.

                                                          There is an

                                                          average of a

                                                          2% execution

                                                          time hit on

                                                          the<br>

                                                          > tests in<br>

                                                          > the LLVM

                                                          test suite,

                                                          with a few

                                                          outliers of up

                                                          to 30%. The

                                                          outliers<br>

                                                          > are<br>

                                                          > tests

                                                          which contain

                                                          hot loops. The

                                                          outliner

                                                          really ought

                                                          to be able<br>

                                                          > to use<br>

                                                          > profiling

                                                          information

                                                          and not

                                                          outline from

                                                          hot areas.

                                                          Another<br>

                                                          >

                                                          suggestion<br>

                                                          > people

                                                          have given me

                                                          is to add a

                                                          "never

                                                          outline"

                                                          directive

                                                          which<br>

                                                          > would<br>

                                                          > allow the

                                                          user to say

                                                          something

                                                          along the

                                                          lines of "this

                                                          is a hot<br>

                                                          > loop,<br>

                                                          > please

                                                          never outline

                                                          from it".<br>

                                                          ><br>

                                                          > It's also

                                                          important to

                                                          note that

                                                          these numbers

                                                          are coming

                                                          from a<br>

                                                          > fairly<br>

                                                          > recent

                                                          Intel

                                                          processor.<br>

                                                          ><br>

                                                          ><br>

                                                          > 3)

                                                          CONSTRAINTS ON

                                                          INSTRUCTIONS<br>

                                                          > The

                                                          outliner

                                                          currently

                                                          won't pull

                                                          anything out

                                                          of functions

                                                          which use<br>

                                                          > a<br>

                                                          > red zone.

                                                          It also won't

                                                          pull anything

                                                          out that uses

                                                          the stack,<br>

                                                          >

                                                          instruction<br>

                                                          > pointer,

                                                          uses constant

                                                          pool indices,

                                                          CFI indices,

                                                          jump table

                                                          indices,<br>

                                                          > or<br>

                                                          > frame

                                                          indices. This

                                                          removes many

                                                          opportunities

                                                          for outlining

                                                          which<br>

                                                          > would<br>

                                                          > likely be

                                                          available at a

                                                          higher level

                                                          (such as IR).

                                                          Thus, there's

                                                          a<br>

                                                          > case<br>

                                                          > for

                                                          moving this up

                                                          to a higher

                                                          level.<br>

                                                          ><br>

                                                          ><br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          >

                                                          Additional

                                                          Applications<br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          > The

                                                          suffix tree

                                                          itself could

                                                          be used as a

                                                          tool for

                                                          finding<br>

                                                          >

                                                          opportunities<br>

                                                          > to

                                                          refactor code.

                                                          For example,

                                                          it could

                                                          recognize

                                                          places where

                                                          the<br>

                                                          > user<br>

                                                          > likely

                                                          copied and

                                                          pasted some

                                                          code. This

                                                          could be run

                                                          on codebases

                                                          to<br>

                                                          > find<br>

                                                          > areas

                                                          where people

                                                          could manually

                                                          outline things

                                                          at the source

                                                          level.<br>

                                                          ><br>

                                                          > Using the

                                                          terminated

                                                          string class,

                                                          it would also

                                                          be possible to<br>

                                                          > implement<br>

                                                          > other

                                                          string

                                                          algorithms on

                                                          code. This may

                                                          open the door

                                                          to new ways<br>

                                                          > to<br>

                                                          > analyze

                                                          existing

                                                          codebases.<br>

                                                          ><br>

                                                          ><br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          > Roadmap<br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          > The

                                                          current

                                                          outliner is

                                                          *very*

                                                          prototypical.

                                                          The version I

                                                          would want<br>

                                                          > to<br>

                                                          > upstream

                                                          would be a new

implementation. Here's what I'd like to<br>

                                                          > address<br>

                                                          > and

                                                          accomplish.<br>

                                                          ><br>

                                                          > 1. Ask

                                                          "what does the

                                                          LLVM community

                                                          want from an

                                                          outliner" and

                                                          use<br>

                                                          > that<br>

                                                          > to drive

                                                          development of

                                                          the algorithm.<br>

                                                          > 2.

                                                          Reimplement

                                                          the outliner,

                                                          perhaps using

                                                          a less

                                                          memory-intensve<br>

                                                          > data<br>

                                                          > structure

                                                          like a suffix

                                                          array.<br>

                                                          > 3. Begin

                                                          adding

                                                          features to

                                                          the algorithm,

                                                          for example:<br>

                                                          >     i. 

                                                           Teaching the

                                                          algorithm

                                                          about hot/cold

                                                          blocks of code

                                                          and<br>

                                                          >   

                                                           taking<br>

                                                          > that into

                                                          account.<br>

                                                          >     ii. 

                                                          Simple

                                                          parameter

                                                          passing.<br>

                                                          >     iii.

                                                          Similar

                                                          function

                                                          outlining--

                                                          eg, noticing

                                                          that two

                                                          outlining<br>

                                                          >

                                                          candidates are

                                                          similar and

                                                          can be merged

                                                          into one

                                                          function with

                                                          some<br>

                                                          > control

                                                          flow.<br>

                                                          ><br>

                                                          ><br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          > Code<br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          > Note:

                                                          This code

                                                          requires

                                                          MachineModulePasses<br>

                                                          ><br>

                                                          > * Main

                                                          pass:<br>

                                                          ><span> </span><a

moz-do-not-send="true"

href="https://github.com/ornata/llvm/blob/master/lib/CodeGen/MachineOutliner.h"

rel="noreferrer" target="_blank">https://github.com/ornata/<wbr>llvm/blob/master/lib/CodeGen/<wbr>MachineOutliner.h</a><br>

                                                          ><br>

                                                          > * Suffix

                                                          tree:<br>

                                                          ><span> </span><a

moz-do-not-send="true"

href="https://github.com/ornata/llvm/blob/master/include/llvm/ADT/SuffixTree.h"

rel="noreferrer" target="_blank">https://github.com/ornata/<wbr>llvm/blob/master/include/llvm/<wbr>ADT/SuffixTree.h</a><br>

                                                          ><br>

                                                          > *

                                                          TerminatedString

                                                          and

                                                          TerminatedStringList:<br>

                                                          ><span> </span><a

moz-do-not-send="true"

href="https://github.com/ornata/llvm/blob/master/include/llvm/ADT/TerminatedString.h"

rel="noreferrer" target="_blank">https://github.com/ornata/<wbr>llvm/blob/master/include/llvm/<wbr>ADT/TerminatedString.h</a><br>

                                                          ><br>

                                                          > Here are

                                                          a couple unit

                                                          tests for the

                                                          data

                                                          structures.<br>

                                                          ><br>

                                                          > * Suffix

                                                          tree unit

                                                          tests:<br>

                                                          ><span> </span><a

moz-do-not-send="true"

href="https://github.com/ornata/llvm/blob/master/unittests/ADT/SuffixTreeTest.cpp"

rel="noreferrer" target="_blank">https://github.com/ornata/<wbr>llvm/blob/master/unittests/<wbr>ADT/SuffixTreeTest.cpp</a><br>

                                                          ><br>

                                                          > *

                                                          TerminatedString

                                                          unit tests:<br>

                                                          ><span> </span><a

moz-do-not-send="true"

href="https://github.com/ornata/llvm/blob/master/unittests/ADT/TerminatedStringTest.cpp"

rel="noreferrer" target="_blank">https://github.com/ornata/<wbr>llvm/blob/master/unittests/<wbr>ADT/TerminatedStringTest.cpp</a><br>

                                                          ><br>

                                                          > *

                                                          TerminatedStringList

                                                          unit tests:<br>

                                                          ><span> </span><a

moz-do-not-send="true"

href="https://github.com/ornata/llvm/blob/master/unittests/ADT/TerminatedStringListTest.cpp"

rel="noreferrer" target="_blank">https://github.com/ornata/<wbr>llvm/blob/master/unittests/<wbr>ADT/TerminatedStringListTest.<wbr>cpp</a><br>

                                                          ><br>

                                                          ><br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          >

                                                          References<br>

                                                          >

                                                          ==============================<wbr>==<br>

                                                          > [1]

                                                          Ukkonen's

                                                          Algorithm:<br>

                                                          ><span> </span><a

moz-do-not-send="true"

                                                          href="https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf"

rel="noreferrer" target="_blank">https://www.cs.helsinki.fi/<wbr>u/ukkonen/SuffixT1withFigs.pdf</a><br>

                                                          > [2]

                                                          Suffix Trees

                                                          and Suffix

                                                          Arrays:<br>

                                                          ><span> </span><a

moz-do-not-send="true"

                                                          href="http://web.cs.iastate.edu/%7Ecs548/suffix.pdf"

rel="noreferrer" target="_blank">http://web.cs.iastate.edu/~<wbr>cs548/suffix.pdf</a><br>

                                                          > [3]

                                                          Extended

                                                          Application of

                                                          Suffix Trees

                                                          to Data

                                                          Compression:<br>

                                                          ><span> </span><a

moz-do-not-send="true" href="http://www.larsson.dogma.net/dccpaper.pdf"

rel="noreferrer" target="_blank">http://www.larsson.dogma.<wbr>net/dccpaper.pdf</a><br>

                                                          ><br>

                                                          ><br>

                                                          > Thanks

                                                          for reading,<br>

                                                          > Jessica<br>

                                                          ><br>

                                                          >

                                                          ______________________________<wbr>_________________<br>

                                                          > LLVM

                                                          Developers

                                                          mailing list<br>

                                                          ><span> </span><a

moz-do-not-send="true" href="mailto:llvm-dev@lists.llvm.org"

                                                          target="_blank">llvm-dev@lists.llvm.org</a><br>

                                                          ><span> </span><a

moz-do-not-send="true"

                                                          href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-<wbr>bin/mailman/listinfo/llvm-dev</a><br>

                                                          ><br>

                                                          <br>

                                                        </div>

                                                      </div>

                                                      <span><font

                                                          color="#888888">--<br>

                                                          Hal Finkel<br>

                                                          Assistant

                                                          Computational

                                                          Scientist<br>

                                                          Leadership

                                                          Computing

                                                          Facility<br>

                                                          Argonne

                                                          National

                                                          Laboratory<br>

                                                        </font></span>

                                                      <div>

                                                        <div>______________________________<wbr>_________________<br>

                                                          LLVM

                                                          Developers

                                                          mailing list<br>

                                                          <a

                                                          moz-do-not-send="true"

href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

                                                          <a

                                                          moz-do-not-send="true"

href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                                                          rel="noreferrer"

target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

                                                        </div>

                                                      </div>

                                                    </blockquote>

                                                  </div>

                                                  <br>

                                                </div>

                                              </blockquote>

                                              <br>

                                              <br>

                                              <br>

                                              --<span> </span><br>

                                              <div><span></span>Hal

                                                Finkel<br>

                                                Assistant Computational

                                                Scientist<br>

                                                Leadership Computing

                                                Facility<br>

                                                Argonne National

                                                Laboratory<span></span><br>

                                              </div>

                                            </div>

                                            <span

style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">______________________________<wbr>_________________</span><br

style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">

                                            <span

style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">LLVM

                                              Developers mailing list</span><br

style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">

                                            <a moz-do-not-send="true"

                                              href="mailto:llvm-dev@lists.llvm.org"

style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"

                                              target="_blank">llvm-dev@lists.llvm.org</a><br

style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">

                                            <a moz-do-not-send="true"

                                              href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

style="font-family:Helvetica;font-size:12px;font-style:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"

                                              target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a></div>

                                        </blockquote>

                                      </div>

                                      <br>

                                    </div>

                                  </div>

                                </div>

                              </div>

                              <br>

                              ______________________________<wbr>_________________<br>

                              LLVM Developers mailing list<br>

                              <a moz-do-not-send="true"

                                href="mailto:llvm-dev@lists.llvm.org"

                                target="_blank">llvm-dev@lists.llvm.org</a><br>

                              <a moz-do-not-send="true"

                                href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                                rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

                              <br>

                            </blockquote>

                          </div>

                          <br>

                        </div>

                      </blockquote>

                      <br>

                      <br>

                      <br>

                      -- <br>

                      <div><span name="x"></span>Hal Finkel<br>

                        Assistant Computational Scientist<br>

                        Leadership Computing Facility<br>

                        Argonne National Laboratory<span name="x"></span><br>

                      </div>

                    </div>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

          <br>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>