<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    I did a few compile-and-run benchmarks with lli et al recently. I

    didn't see overall performance improvements from parallelization

    either.<br>

    Using a release build on my Macbook Pro, typical runtimes for

    403.gcc from bitcode precompiled with clang -O1 -g0 look like this

    (real, user, sys):<br>

    <br>

    <tt>Static build and run:</tt><tt><br>

    </tt><tt>  clang -o main -O0 <bitcode> && ./main

      <input>               4.652s  4.471s  0.172s</tt><tt><br>

    </tt><tt><br>

    </tt><tt>Eager JIT:</tt><tt><br>

    </tt><tt>  lli

      -jit-kind=mcjit                                        18.086s

      17.975s  0.088s</tt><tt><br>

    </tt><tt>  lli -jit-kind=orc-eager (local

      hack)                       15.334s 11.534s  0.264s</tt><tt><br>

    </tt><tt><br>

    </tt><tt>Per-function lazy JIT:</tt><tt><br>

    </tt><tt>  lli

      -jit-kind=orc-lazy                                     13.939s

      13.779s  0.146s</tt><tt><br>

    </tt><tt>  lli -jit-kind=orc-lazy

      -compile-threads=8                  15.171s 15.590s  0.245s</tt><tt><br>

    </tt><tt>  SpeculativeJIT

      -num-threads=8                              10.292s 17.306s 

      0.380s</tt><tt><br>

    </tt><tt><br>

    </tt><tt>Per-module lazy JIT:</tt><tt><br>

    </tt><tt>  lli -jit-kind=orc-lazy

      -per-module-lazy                     4.655s  4.580s  0.069s</tt><tt><br>

    </tt><tt>  lli -jit-kind=orc-lazy -per-module-lazy

      -compile-threads=8  4.695s  6.184s  0.173s</tt><br>

    <br>

    Invocations with the -compile-threads parameter dispatch compilation

    to parallel threads. My guess is that so far the synchronization

    overhead eats up all speedup, but I didn't investigate enough to

    bake this with evidence. It would be nice to see the difference for

    -jit-kind=orc-eager, but with my local hack I am currently running

    into an internal error in the JITed code that I don't understand

    yet.<br>

    <br>

    Cheers,<br>

    Stefan<br>

    <br>

    <div class="moz-cite-prefix">On 30/01/2020 03:03, Lang Hames wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CALLttgr7EmLO_CAX=b8PTY7K15yvyDb-HC7wkr0ULKz2EWqMxQ@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">Hi Chris,

        <div><br>

        </div>

        <div>I can think of a couple of things to check up front:</div>

        <div><br>

        </div>

        <div>(1) Are you timing this with a release build or a debug

          build? ORC uses asserts liberally, including in code that is

          run under the session lock, and this may decrease parallelism

          in debug builds.</div>

        <div><br>

        </div>

        <div>(1) Are you using a fixed sized thread pool with an

          appropriate limit? Compiling too many things in parallel can

          have negatively impact performance if it leads to memory

          exhaustion.</div>

        <div><br>

        </div>

        <div>(2) Are you loading each module on a different LLVMContext?

          Modules sharing an LLVMContext cannot be compiled

          concurrently, as contexts cannot be shared between threads.</div>

        <div><br>

        </div>

        <div>And some follow up questions: What platform are you running

          on? Are you using LLJIT or LLLazyJIT? What kind of slow-down

          do you see relative to single-threaded compilation?</div>

        <div><br>

        </div>

        <div>Finally, some thoughts: The performance of concurrent

          compilation has not received any attention at all yet, as I

          have been busy with other feature work. I definitely want to

          get this working though. There are no stats or timings

          collected at the moment, but I can think of a few that i think

          would be useful and relatively easy to implement: (1) Track

          time spent under the session lock by adding timers to

          runSessionLocked, (2) Track time spent waiting on LLVMContexts

          in ThreadSafeContext, (3) Add a runAs<FunctionType>

          utility with timers to time execution of JIT functions.</div>

        <div><br>

        </div>

        <div>What are your thoughts? Are there any other tools you would

          like to see added?</div>

        <div><br>

        </div>

        <div>Cheers,</div>

        <div>Lang.</div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Wed, Jan 29, 2020 at 2:12

          PM chris boese via llvm-dev <<a

            href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

          <div dir="ltr">Hi,

            <div><br>

            </div>

            <div>We are using the new LLJIT class in our compiler. We

              have not been successful using the parallel JIT feature.

              When we tried it previously on multiple modules, our

              compile-time increased significantly. I don't know if we

              are using it incorrectly, or that we miss out on

              optimizations we get when running on a single merged

              module, but it hasn't worked for us yet. We are pretty far

              behind HEAD atm, but will try it again soon. </div>

            <div><br>

            </div>

            <div>In the meantime, we are trying to find ways to gauge

              the compilation time of a module. We pass a single module

              to the LLJIT instance. Is there is any information we can

              get during the JIT construction to let us compare against

              other modules we run through JIT? We're trying to find hot

              spots or performance issues in our modules. Timers or

              statistical data would be helpful if they exist during the

              execution of the JIT engine.</div>

            <div><br>

            </div>

            <div>I imagine parallelizing the JIT will be our best bet

              for increasing performance, but we have not been able to

              use that yet.</div>

            <div><br>

            </div>

            <div>Any help/ideas would be appreciated.</div>

            <div><br>

            </div>

            <div>Thanks,</div>

            <div>Chris</div>

          </div>

          _______________________________________________<br>

          LLVM Developers mailing list<br>

          <a href="mailto:llvm-dev@lists.llvm.org" target="_blank"

            moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>

          <a

            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

            rel="noreferrer" target="_blank" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

        </blockquote>

      </div>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

<a class="moz-txt-link-freetext" href="https://flowcrypt.com/pub/stefan.graenitz@gmail.com">https://flowcrypt.com/pub/stefan.graenitz@gmail.com</a></pre>

  </body>

</html>