<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 11/29/2016 05:01 PM, Bekket McClane
      via llvm-dev wrote:<br>
    </div>
    <blockquote
      cite="mid:8FA1D301-E88D-4041-A796-9EF38E4A8A98@gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <br class="">
      <div>
        <blockquote type="cite" class="">
          <div class="">Mehdi Amini <<a moz-do-not-send="true"
              href="mailto:mehdi.amini@apple.com" class="">mehdi.amini@apple.com</a>>
            於 2016年11月30日 上午5:16 寫道:</div>
          <br class="Apple-interchange-newline">
          <div class="">
            <blockquote type="cite" class="" style="font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; orphans: auto; text-align: start;
              text-indent: 0px; text-transform: none; white-space:
              normal; widows: auto; word-spacing: 0px;
              -webkit-text-size-adjust: auto; -webkit-text-stroke-width:
              0px;">
              <div class=""><br class="Apple-interchange-newline">
                On Nov 29, 2016, at 1:14 PM, Mehdi Amini <<a
                  moz-do-not-send="true"
                  href="mailto:mehdi.amini@apple.com" class="">mehdi.amini@apple.com</a>>
                wrote:</div>
              <br class="Apple-interchange-newline">
              <div class="">
                <div class="" style="word-wrap: break-word;
                  -webkit-nbsp-mode: space; -webkit-line-break:
                  after-white-space;"><br class="">
                  <div class="">
                    <blockquote type="cite" class="">
                      <div class="">On Nov 29, 2016, at 4:02 AM, Bekket
                        McClane via llvm-dev <<a
                          moz-do-not-send="true"
                          href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>>
                        wrote:</div>
                      <br class="Apple-interchange-newline">
                      <div class="">
                        <div dir="ltr" class="">Hi,
                          <div class="">Though there exists lots of
                            researches on parallelizing or scheduling
                            optimization passes, If you open up the time
                            matrices of codegen(llc -time-passes),
                            you'll find that the most time consuming
                            task is actually instruction
                            selection(40~50% of time) instead of
                            optimization passes(10~0%). That's why we're
                            trying to parallelize the
                            (target-independent) instruction selection
                            process in aid of JIT compilation speed.</div>
                        </div>
                      </div>
                    </blockquote>
                    <div class=""><br class="">
                    </div>
                    <div class=""><br class="">
                    </div>
                    <div class="">How much of this 40-50% is spent in
                      the matcher table? I though most of the overhead
                      was inherent to SelectionDAG?</div>
                    <div class="">Also why having such a fine grain
                      approach instead of trying to perform instruction
                      selection in parallel across basic blocks or
                      functions?</div>
                    <div class=""><br class="">
                    </div>
                    <div class="">I suspect you won’t gain much for too
                      much added complexity with this approach.</div>
                  </div>
                </div>
              </div>
            </blockquote>
            <div style="font-family: Helvetica; font-size: 12px;
              font-style: normal; font-variant-caps: normal;
              font-weight: normal; letter-spacing: normal; orphans:
              auto; text-align: start; text-indent: 0px; text-transform:
              none; white-space: normal; widows: auto; word-spacing:
              0px; -webkit-text-stroke-width: 0px;" class=""><br
                class="">
            </div>
            <div style="font-family: Helvetica; font-size: 12px;
              font-style: normal; font-variant-caps: normal;
              font-weight: normal; letter-spacing: normal; orphans:
              auto; text-align: start; text-indent: 0px; text-transform:
              none; white-space: normal; widows: auto; word-spacing:
              0px; -webkit-text-stroke-width: 0px;" class="">I forgot to
              add: did you try to enable fast-isel instead? In the
              context of a JIT this is a quite common approach.</div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        <div>Well, as I mentioned at the bottom of my first letter, one
          of our goal is to boost the compilation speed while keeping
          the quality of generated code as much as possible. And
          fast-isel doesn't perform really well on quality of generated
          code.</div>
        <div>Perhaps users of fast-isel would use multi-tier compilation
          model and take fast-isel as baseline compiler I guess.</div>
      </div>
    </blockquote>
    JFYI, if you haven't tested the quality of FastIsel code on your
    platform, do.  FastIsel frequently generates quite decent code for
    some of the architectures we support.  I've heard of folks using it
    for high tier JITs.  (I don't personally do so, but it's on my list
    of things to re-evaluate at some point.)<br>
    <blockquote
      cite="mid:8FA1D301-E88D-4041-A796-9EF38E4A8A98@gmail.com"
      type="cite">
      <div>
        <div><br class="">
        </div>
        B.R.</div>
      <div>McClane<br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div style="font-family: Helvetica; font-size: 12px;
              font-style: normal; font-variant-caps: normal;
              font-weight: normal; letter-spacing: normal; orphans:
              auto; text-align: start; text-indent: 0px; text-transform:
              none; white-space: normal; widows: auto; word-spacing:
              0px; -webkit-text-stroke-width: 0px;" class=""><br
                class="">
            </div>
            <div style="font-family: Helvetica; font-size: 12px;
              font-style: normal; font-variant-caps: normal;
              font-weight: normal; letter-spacing: normal; orphans:
              auto; text-align: start; text-indent: 0px; text-transform:
              none; white-space: normal; widows: auto; word-spacing:
              0px; -webkit-text-stroke-width: 0px;" class="">— </div>
            <div style="font-family: Helvetica; font-size: 12px;
              font-style: normal; font-variant-caps: normal;
              font-weight: normal; letter-spacing: normal; orphans:
              auto; text-align: start; text-indent: 0px; text-transform:
              none; white-space: normal; widows: auto; word-spacing:
              0px; -webkit-text-stroke-width: 0px;" class="">Mehdi</div>
            <div style="font-family: Helvetica; font-size: 12px;
              font-style: normal; font-variant-caps: normal;
              font-weight: normal; letter-spacing: normal; orphans:
              auto; text-align: start; text-indent: 0px; text-transform:
              none; white-space: normal; widows: auto; word-spacing:
              0px; -webkit-text-stroke-width: 0px;" class=""><br
                class="">
            </div>
            <blockquote type="cite" class="" style="font-family:
              Helvetica; font-size: 12px; font-style: normal;
              font-variant-caps: normal; font-weight: normal;
              letter-spacing: normal; orphans: auto; text-align: start;
              text-indent: 0px; text-transform: none; white-space:
              normal; widows: auto; word-spacing: 0px;
              -webkit-text-size-adjust: auto; -webkit-text-stroke-width:
              0px;">
              <div class="" style="word-wrap: break-word;
                -webkit-nbsp-mode: space; -webkit-line-break:
                after-white-space;">
                <div class="">
                  <div class=""><br class="">
                  </div>
                  <div class=""><br class="">
                  </div>
                  <div class=""><br class="">
                  </div>
                  <div class=""><br class="">
                  </div>
                  <br class="">
                  <blockquote type="cite" class="">
                    <div class="">
                      <div dir="ltr" class="">
                        <div class=""><br class="">
                        </div>
                        <div class="">The instruction selector of LLVM
                          is an interpreter that interpret the
                          MatcherTable which consists of bytecodes
                          generated by TableGen. I'm surprised to find
                          that the structure of MatcherTable and the
                          interpreter seems to be suitable for
                          parallelization. So we propose a prototype
                          that parallelizes the interpreting of
                          OPC_Scope children that are possibly
                          time-consuming. Here is some quick overview:</div>
                        <div class=""><br class="">
                        </div>
                        <div class="">We add two new opcodes: OPC_Fork
                          and OPC_Merge. During DAG optimization
                          process(utils/TableGen/<wbr class="">DAGISelMatcherOpt.cpp).
                          OPC_Fork would be added to the front of
                          scope(OPC_Scope) children which fulfill
                          following conditions:</div>
                        <div class="">1.  Amount of opcodes within the
                          child exceed certain threshold(5 in current
                          prototype).</div>
                        <div class="">2. The child is reside in a
                          sequence of continuous scope children which
                          length also exceed certain threshold(7 in
                          current prototype).</div>
                        <div class="">For each valid sequence of scope
                          children, an extra scope child, where
                          OPC_Merge is the only opcode, would be
                          appended to it(the sequence). </div>
                        <div class=""><br class="">
                        </div>
                        <div class="">In the interpreter, when an
                          OPC_Fork is encountered inside a scope child,
                          the main thread would dispatch the scope child
                          as a task to a central thread pool, then jump
                          to the next child. At the end of a valid
                          "parallel sequence(of scope children)" an
                          OPC_Merge must exist and the main thread would
                          stop there and wait other threads to finish. <br
                            class="" clear="all">
                          <div class=""><br class="">
                          </div>
                          <div class="">About the synchronization,
                            read-write lock is mainly used: In each
                            checking-style opcode(e.g. OPC_CheckSame,
                            OPC_CheckType, except OPC_CheckComplexPat)
                            handlers, a read lock is used, otherwise, a
                            write lock is used. </div>
                          <div class=""><br class="">
                          </div>
                          <div class="">Finally, although the generated
                            code is correct, total consuming time barely
                            break even with the original one. Possible
                            reasons may be:</div>
                          <div class="">1. The original interpreter is
                            pretty fast actually. The thread pool
                            dispatching time for each selection task may
                            be too long in comparison with the original
                            approach.</div>
                          <div class="">2. X86 is the only architecture
                            which contains OPC_CheckComplexPat that
                            would modify DAG. This constraint force us
                            to add write lock on it which would block
                            other threads at the same time.
                            Unfortunately, OPC_CheckComplexPat is
                            probably the most time-consuming opcodes in
                            X86 and perhaps in other architectures, too.</div>
                          <div class="">3. Too many threads. We're now
                            working on another approach that use larger
                            region, consist of multiple scope children,
                            for each parallel task for the sake of
                            reducing thread amount. </div>
                          <div class="">4. Popular instructions, like
                            add or sub, contain lots of scope children
                            so one or several parallel regions exist.
                            However, most of the common instruction
                            variants(e.g. add %reg1, %reg2) is on "top"
                            among scope children which would be
                            encountered pretty early. So sometimes
                            threads are fired, but the correct
                            instruction is actually immediately selected
                            after that. Thus lots of time is wasted on
                            joining threads. </div>
                          <div class=""><br class="">
                          </div>
                          <div class="">Here is our working repository
                            and diff with 3.9 release: <a
                              moz-do-not-send="true"
href="https://bitbucket.org/mshockwave/hydra-llvm/branches/compare/master%0D3.9-origin#diff"
                              target="_blank" class="">https://bitbucket.<wbr
                                class="">org/mshockwave/hydra-llvm/<wbr
                                class="">branches/compare/master%0D3.9-<wbr
                                class="">origin#diff </a></div>
                          <div class="">I don't think the current state
                            is ready for code reviewing since there is
                            no significant speedup. But it's very
                            welcome for folks to discuss about this idea
                            and also, whether current instruction
                            selection approach had reached its upper
                            bound of speed.(I ignore fast-isel by mean
                            since it sacrifices too much on quality of
                            generated code. One of our goals is to boost
                            the compilation speed while keeping the code
                            quality as much as possible)</div>
                          <div class=""><br class="">
                          </div>
                          <div class="">Feel free to comment directly on
                            the repo diff above.</div>
                          <div class=""><br class="">
                          </div>
                          <div class="">About the "region approach"
                            mentioned in the third item of possible
                            reasons above. It's actually the
                            "dev-region-parallel" branch, but it still
                            has some bugs on correctness of generated
                            code. I would put more detail about it if
                            the feedback is sound.</div>
                          <div class=""><br class="">
                          </div>
                          <div class="">NOTE: There seems to be some
                            serious bugs in concurrent and
                            synchronization library of old gcc/standard
                            libraries. So it's strongly recommended to
                            use the latest version of clang to build our
                            work.</div>
                          <div class=""><br class="">
                          </div>
                          <div class="">B.R</div>
                          --<span class="Apple-converted-space"> </span><br
                            class="">
                          <div
                            class="m_-289937573853035298gmail-m_-4541007648684868004gmail_signature">
                            <div dir="ltr" class="">Bekket McClane
                              <div class="">Department of Computer
                                Science,</div>
                              <div class="">National Tsing Hua
                                University</div>
                            </div>
                          </div>
                        </div>
                      </div>
                      _______________________________________________<br
                        class="">
                      LLVM Developers mailing list<br class="">
                      <a moz-do-not-send="true"
                        href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br
                        class="">
                      <a moz-do-not-send="true"
                        href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
                        class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></div>
                  </blockquote>
                </div>
              </div>
            </blockquote>
          </div>
        </blockquote>
      </div>
      <br class="">
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>