<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
I did a few compile-and-run benchmarks with lli et al recently. I
didn't see overall performance improvements from parallelization
either.<br>
Using a release build on my Macbook Pro, typical runtimes for
403.gcc from bitcode precompiled with clang -O1 -g0 look like this
(real, user, sys):<br>
<br>
<tt>Static build and run:</tt><tt><br>
</tt><tt> clang -o main -O0 <bitcode> && ./main
<input> 4.652s 4.471s 0.172s</tt><tt><br>
</tt><tt><br>
</tt><tt>Eager JIT:</tt><tt><br>
</tt><tt> lli
-jit-kind=mcjit 18.086s
17.975s 0.088s</tt><tt><br>
</tt><tt> lli -jit-kind=orc-eager (local
hack) 15.334s 11.534s 0.264s</tt><tt><br>
</tt><tt><br>
</tt><tt>Per-function lazy JIT:</tt><tt><br>
</tt><tt> lli
-jit-kind=orc-lazy 13.939s
13.779s 0.146s</tt><tt><br>
</tt><tt> lli -jit-kind=orc-lazy
-compile-threads=8 15.171s 15.590s 0.245s</tt><tt><br>
</tt><tt> SpeculativeJIT
-num-threads=8 10.292s 17.306s
0.380s</tt><tt><br>
</tt><tt><br>
</tt><tt>Per-module lazy JIT:</tt><tt><br>
</tt><tt> lli -jit-kind=orc-lazy
-per-module-lazy 4.655s 4.580s 0.069s</tt><tt><br>
</tt><tt> lli -jit-kind=orc-lazy -per-module-lazy
-compile-threads=8 4.695s 6.184s 0.173s</tt><br>
<br>
Invocations with the -compile-threads parameter dispatch compilation
to parallel threads. My guess is that so far the synchronization
overhead eats up all speedup, but I didn't investigate enough to
bake this with evidence. It would be nice to see the difference for
-jit-kind=orc-eager, but with my local hack I am currently running
into an internal error in the JITed code that I don't understand
yet.<br>
<br>
Cheers,<br>
Stefan<br>
<br>
<div class="moz-cite-prefix">On 30/01/2020 03:03, Lang Hames wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CALLttgr7EmLO_CAX=b8PTY7K15yvyDb-HC7wkr0ULKz2EWqMxQ@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Hi Chris,
<div><br>
</div>
<div>I can think of a couple of things to check up front:</div>
<div><br>
</div>
<div>(1) Are you timing this with a release build or a debug
build? ORC uses asserts liberally, including in code that is
run under the session lock, and this may decrease parallelism
in debug builds.</div>
<div><br>
</div>
<div>(1) Are you using a fixed sized thread pool with an
appropriate limit? Compiling too many things in parallel can
have negatively impact performance if it leads to memory
exhaustion.</div>
<div><br>
</div>
<div>(2) Are you loading each module on a different LLVMContext?
Modules sharing an LLVMContext cannot be compiled
concurrently, as contexts cannot be shared between threads.</div>
<div><br>
</div>
<div>And some follow up questions: What platform are you running
on? Are you using LLJIT or LLLazyJIT? What kind of slow-down
do you see relative to single-threaded compilation?</div>
<div><br>
</div>
<div>Finally, some thoughts: The performance of concurrent
compilation has not received any attention at all yet, as I
have been busy with other feature work. I definitely want to
get this working though. There are no stats or timings
collected at the moment, but I can think of a few that i think
would be useful and relatively easy to implement: (1) Track
time spent under the session lock by adding timers to
runSessionLocked, (2) Track time spent waiting on LLVMContexts
in ThreadSafeContext, (3) Add a runAs<FunctionType>
utility with timers to time execution of JIT functions.</div>
<div><br>
</div>
<div>What are your thoughts? Are there any other tools you would
like to see added?</div>
<div><br>
</div>
<div>Cheers,</div>
<div>Lang.</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Jan 29, 2020 at 2:12
PM chris boese via llvm-dev <<a
href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Hi,
<div><br>
</div>
<div>We are using the new LLJIT class in our compiler. We
have not been successful using the parallel JIT feature.
When we tried it previously on multiple modules, our
compile-time increased significantly. I don't know if we
are using it incorrectly, or that we miss out on
optimizations we get when running on a single merged
module, but it hasn't worked for us yet. We are pretty far
behind HEAD atm, but will try it again soon. </div>
<div><br>
</div>
<div>In the meantime, we are trying to find ways to gauge
the compilation time of a module. We pass a single module
to the LLJIT instance. Is there is any information we can
get during the JIT construction to let us compare against
other modules we run through JIT? We're trying to find hot
spots or performance issues in our modules. Timers or
statistical data would be helpful if they exist during the
execution of the JIT engine.</div>
<div><br>
</div>
<div>I imagine parallelizing the JIT will be our best bet
for increasing performance, but we have not been able to
use that yet.</div>
<div><br>
</div>
<div>Any help/ideas would be appreciated.</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Chris</div>
</div>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank"
moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
<a
href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote>
</div>
</blockquote>
<pre class="moz-signature" cols="72">--
<a class="moz-txt-link-freetext" href="https://flowcrypt.com/pub/stefan.graenitz@gmail.com">https://flowcrypt.com/pub/stefan.graenitz@gmail.com</a></pre>
</body>
</html>