<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 11/29/2016 05:01 PM, Bekket McClane

      via llvm-dev wrote:<br>

    </div>

    <blockquote

      cite="mid:8FA1D301-E88D-4041-A796-9EF38E4A8A98@gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <br class="">

      <div>

        <blockquote type="cite" class="">

          <div class="">Mehdi Amini <<a moz-do-not-send="true"

              href="mailto:mehdi.amini@apple.com" class="">mehdi.amini@apple.com</a>>

            於 2016年11月30日 上午5:16 寫道：</div>

          <br class="Apple-interchange-newline">

          <div class="">

            <blockquote type="cite" class="" style="font-family:

              Helvetica; font-size: 12px; font-style: normal;

              font-variant-caps: normal; font-weight: normal;

              letter-spacing: normal; orphans: auto; text-align: start;

              text-indent: 0px; text-transform: none; white-space:

              normal; widows: auto; word-spacing: 0px;

              -webkit-text-size-adjust: auto; -webkit-text-stroke-width:

              0px;">

              <div class=""><br class="Apple-interchange-newline">

                On Nov 29, 2016, at 1:14 PM, Mehdi Amini <<a

                  moz-do-not-send="true"

                  href="mailto:mehdi.amini@apple.com" class="">mehdi.amini@apple.com</a>>

                wrote:</div>

              <br class="Apple-interchange-newline">

              <div class="">

                <div class="" style="word-wrap: break-word;

                  -webkit-nbsp-mode: space; -webkit-line-break:

                  after-white-space;"><br class="">

                  <div class="">

                    <blockquote type="cite" class="">

                      <div class="">On Nov 29, 2016, at 4:02 AM, Bekket

                        McClane via llvm-dev <<a

                          moz-do-not-send="true"

                          href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>>

                        wrote:</div>

                      <br class="Apple-interchange-newline">

                      <div class="">

                        <div dir="ltr" class="">Hi,

                          <div class="">Though there exists lots of

                            researches on parallelizing or scheduling

                            optimization passes, If you open up the time

                            matrices of codegen(llc -time-passes),

                            you'll find that the most time consuming

                            task is actually instruction

                            selection(40~50% of time) instead of

                            optimization passes(10~0%). That's why we're

                            trying to parallelize the

                            (target-independent) instruction selection

                            process in aid of JIT compilation speed.</div>

                        </div>

                      </div>

                    </blockquote>

                    <div class=""><br class="">

                    </div>

                    <div class=""><br class="">

                    </div>

                    <div class="">How much of this 40-50% is spent in

                      the matcher table? I though most of the overhead

                      was inherent to SelectionDAG?</div>

                    <div class="">Also why having such a fine grain

                      approach instead of trying to perform instruction

                      selection in parallel across basic blocks or

                      functions?</div>

                    <div class=""><br class="">

                    </div>

                    <div class="">I suspect you won’t gain much for too

                      much added complexity with this approach.</div>

                  </div>

                </div>

              </div>

            </blockquote>

            <div style="font-family: Helvetica; font-size: 12px;

              font-style: normal; font-variant-caps: normal;

              font-weight: normal; letter-spacing: normal; orphans:

              auto; text-align: start; text-indent: 0px; text-transform:

              none; white-space: normal; widows: auto; word-spacing:

              0px; -webkit-text-stroke-width: 0px;" class=""><br

                class="">

            </div>

            <div style="font-family: Helvetica; font-size: 12px;

              font-style: normal; font-variant-caps: normal;

              font-weight: normal; letter-spacing: normal; orphans:

              auto; text-align: start; text-indent: 0px; text-transform:

              none; white-space: normal; widows: auto; word-spacing:

              0px; -webkit-text-stroke-width: 0px;" class="">I forgot to

              add: did you try to enable fast-isel instead? In the

              context of a JIT this is a quite common approach.</div>

          </div>

        </blockquote>

        <div><br class="">

        </div>

        <div>Well, as I mentioned at the bottom of my first letter, one

          of our goal is to boost the compilation speed while keeping

          the quality of generated code as much as possible. And

          fast-isel doesn't perform really well on quality of generated

          code.</div>

        <div>Perhaps users of fast-isel would use multi-tier compilation

          model and take fast-isel as baseline compiler I guess.</div>

      </div>

    </blockquote>

    JFYI, if you haven't tested the quality of FastIsel code on your

    platform, do.  FastIsel frequently generates quite decent code for

    some of the architectures we support.  I've heard of folks using it

    for high tier JITs.  (I don't personally do so, but it's on my list

    of things to re-evaluate at some point.)<br>

    <blockquote

      cite="mid:8FA1D301-E88D-4041-A796-9EF38E4A8A98@gmail.com"

      type="cite">

      <div>

        <div><br class="">

        </div>

        B.R.</div>

      <div>McClane<br class="">

        <blockquote type="cite" class="">

          <div class="">

            <div style="font-family: Helvetica; font-size: 12px;

              font-style: normal; font-variant-caps: normal;

              font-weight: normal; letter-spacing: normal; orphans:

              auto; text-align: start; text-indent: 0px; text-transform:

              none; white-space: normal; widows: auto; word-spacing:

              0px; -webkit-text-stroke-width: 0px;" class=""><br

                class="">

            </div>

            <div style="font-family: Helvetica; font-size: 12px;

              font-style: normal; font-variant-caps: normal;

              font-weight: normal; letter-spacing: normal; orphans:

              auto; text-align: start; text-indent: 0px; text-transform:

              none; white-space: normal; widows: auto; word-spacing:

              0px; -webkit-text-stroke-width: 0px;" class="">— </div>

            <div style="font-family: Helvetica; font-size: 12px;

              font-style: normal; font-variant-caps: normal;

              font-weight: normal; letter-spacing: normal; orphans:

              auto; text-align: start; text-indent: 0px; text-transform:

              none; white-space: normal; widows: auto; word-spacing:

              0px; -webkit-text-stroke-width: 0px;" class="">Mehdi</div>

            <div style="font-family: Helvetica; font-size: 12px;

              font-style: normal; font-variant-caps: normal;

              font-weight: normal; letter-spacing: normal; orphans:

              auto; text-align: start; text-indent: 0px; text-transform:

              none; white-space: normal; widows: auto; word-spacing:

              0px; -webkit-text-stroke-width: 0px;" class=""><br

                class="">

            </div>

            <blockquote type="cite" class="" style="font-family:

              Helvetica; font-size: 12px; font-style: normal;

              font-variant-caps: normal; font-weight: normal;

              letter-spacing: normal; orphans: auto; text-align: start;

              text-indent: 0px; text-transform: none; white-space:

              normal; widows: auto; word-spacing: 0px;

              -webkit-text-size-adjust: auto; -webkit-text-stroke-width:

              0px;">

              <div class="" style="word-wrap: break-word;

                -webkit-nbsp-mode: space; -webkit-line-break:

                after-white-space;">

                <div class="">

                  <div class=""><br class="">

                  </div>

                  <div class=""><br class="">

                  </div>

                  <div class=""><br class="">

                  </div>

                  <div class=""><br class="">

                  </div>

                  <br class="">

                  <blockquote type="cite" class="">

                    <div class="">

                      <div dir="ltr" class="">

                        <div class=""><br class="">

                        </div>

                        <div class="">The instruction selector of LLVM

                          is an interpreter that interpret the

                          MatcherTable which consists of bytecodes

                          generated by TableGen. I'm surprised to find

                          that the structure of MatcherTable and the

                          interpreter seems to be suitable for

                          parallelization. So we propose a prototype

                          that parallelizes the interpreting of

                          OPC_Scope children that are possibly

                          time-consuming. Here is some quick overview:</div>

                        <div class=""><br class="">

                        </div>

                        <div class="">We add two new opcodes: OPC_Fork

                          and OPC_Merge. During DAG optimization

                          process(utils/TableGen/<wbr class="">DAGISelMatcherOpt.cpp).

                          OPC_Fork would be added to the front of

                          scope(OPC_Scope) children which fulfill

                          following conditions:</div>

                        <div class="">1.  Amount of opcodes within the

                          child exceed certain threshold(5 in current

                          prototype).</div>

                        <div class="">2. The child is reside in a

                          sequence of continuous scope children which

                          length also exceed certain threshold(7 in

                          current prototype).</div>

                        <div class="">For each valid sequence of scope

                          children, an extra scope child, where

                          OPC_Merge is the only opcode, would be

                          appended to it(the sequence). </div>

                        <div class=""><br class="">

                        </div>

                        <div class="">In the interpreter, when an

                          OPC_Fork is encountered inside a scope child,

                          the main thread would dispatch the scope child

                          as a task to a central thread pool, then jump

                          to the next child. At the end of a valid

                          "parallel sequence(of scope children)" an

                          OPC_Merge must exist and the main thread would

                          stop there and wait other threads to finish. <br

                            class="" clear="all">

                          <div class=""><br class="">

                          </div>

                          <div class="">About the synchronization,

                            read-write lock is mainly used: In each

                            checking-style opcode(e.g. OPC_CheckSame,

                            OPC_CheckType, except OPC_CheckComplexPat)

                            handlers, a read lock is used, otherwise, a

                            write lock is used. </div>

                          <div class=""><br class="">

                          </div>

                          <div class="">Finally, although the generated

                            code is correct, total consuming time barely

                            break even with the original one. Possible

                            reasons may be:</div>

                          <div class="">1. The original interpreter is

                            pretty fast actually. The thread pool

                            dispatching time for each selection task may

                            be too long in comparison with the original

                            approach.</div>

                          <div class="">2. X86 is the only architecture

                            which contains OPC_CheckComplexPat that

                            would modify DAG. This constraint force us

                            to add write lock on it which would block

                            other threads at the same time.

                            Unfortunately, OPC_CheckComplexPat is

                            probably the most time-consuming opcodes in

                            X86 and perhaps in other architectures, too.</div>

                          <div class="">3. Too many threads. We're now

                            working on another approach that use larger

                            region, consist of multiple scope children,

                            for each parallel task for the sake of

                            reducing thread amount. </div>

                          <div class="">4. Popular instructions, like

                            add or sub, contain lots of scope children

                            so one or several parallel regions exist.

                            However, most of the common instruction

                            variants(e.g. add %reg1, %reg2) is on "top"

                            among scope children which would be

                            encountered pretty early. So sometimes

                            threads are fired, but the correct

                            instruction is actually immediately selected

                            after that. Thus lots of time is wasted on

                            joining threads. </div>

                          <div class=""><br class="">

                          </div>

                          <div class="">Here is our working repository

                            and diff with 3.9 release: <a

                              moz-do-not-send="true"

href="https://bitbucket.org/mshockwave/hydra-llvm/branches/compare/master%0D3.9-origin#diff"

                              target="_blank" class="">https://bitbucket.<wbr

                                class="">org/mshockwave/hydra-llvm/<wbr

                                class="">branches/compare/master%0D3.9-<wbr

                                class="">origin#diff </a></div>

                          <div class="">I don't think the current state

                            is ready for code reviewing since there is

                            no significant speedup. But it's very

                            welcome for folks to discuss about this idea

                            and also, whether current instruction

                            selection approach had reached its upper

                            bound of speed.(I ignore fast-isel by mean

                            since it sacrifices too much on quality of

                            generated code. One of our goals is to boost

                            the compilation speed while keeping the code

                            quality as much as possible)</div>

                          <div class=""><br class="">

                          </div>

                          <div class="">Feel free to comment directly on

                            the repo diff above.</div>

                          <div class=""><br class="">

                          </div>

                          <div class="">About the "region approach"

                            mentioned in the third item of possible

                            reasons above. It's actually the

                            "dev-region-parallel" branch, but it still

                            has some bugs on correctness of generated

                            code. I would put more detail about it if

                            the feedback is sound.</div>

                          <div class=""><br class="">

                          </div>

                          <div class="">NOTE: There seems to be some

                            serious bugs in concurrent and

                            synchronization library of old gcc/standard

                            libraries. So it's strongly recommended to

                            use the latest version of clang to build our

                            work.</div>

                          <div class=""><br class="">

                          </div>

                          <div class="">B.R</div>

                          --<span class="Apple-converted-space"> </span><br

                            class="">

                          <div

                            class="m_-289937573853035298gmail-m_-4541007648684868004gmail_signature">

                            <div dir="ltr" class="">Bekket McClane

                              <div class="">Department of Computer

                                Science,</div>

                              <div class="">National Tsing Hua

                                University</div>

                            </div>

                          </div>

                        </div>

                      </div>

                      _______________________________________________<br

                        class="">

                      LLVM Developers mailing list<br class="">

                      <a moz-do-not-send="true"

                        href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br

                        class="">

                      <a moz-do-not-send="true"

                        href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                        class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></div>

                  </blockquote>

                </div>

              </div>

            </blockquote>

          </div>

        </blockquote>

      </div>

      <br class="">

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>