<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 11/29/2016 05:01 PM, Bekket McClane
via llvm-dev wrote:<br>
</div>
<blockquote
cite="mid:8FA1D301-E88D-4041-A796-9EF38E4A8A98@gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<br class="">
<div>
<blockquote type="cite" class="">
<div class="">Mehdi Amini <<a moz-do-not-send="true"
href="mailto:mehdi.amini@apple.com" class="">mehdi.amini@apple.com</a>>
於 2016年11月30日 上午5:16 寫道:</div>
<br class="Apple-interchange-newline">
<div class="">
<blockquote type="cite" class="" style="font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; orphans: auto; text-align: start;
text-indent: 0px; text-transform: none; white-space:
normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto; -webkit-text-stroke-width:
0px;">
<div class=""><br class="Apple-interchange-newline">
On Nov 29, 2016, at 1:14 PM, Mehdi Amini <<a
moz-do-not-send="true"
href="mailto:mehdi.amini@apple.com" class="">mehdi.amini@apple.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="" style="word-wrap: break-word;
-webkit-nbsp-mode: space; -webkit-line-break:
after-white-space;"><br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Nov 29, 2016, at 4:02 AM, Bekket
McClane via llvm-dev <<a
moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">Hi,
<div class="">Though there exists lots of
researches on parallelizing or scheduling
optimization passes, If you open up the time
matrices of codegen(llc -time-passes),
you'll find that the most time consuming
task is actually instruction
selection(40~50% of time) instead of
optimization passes(10~0%). That's why we're
trying to parallelize the
(target-independent) instruction selection
process in aid of JIT compilation speed.</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">How much of this 40-50% is spent in
the matcher table? I though most of the overhead
was inherent to SelectionDAG?</div>
<div class="">Also why having such a fine grain
approach instead of trying to perform instruction
selection in parallel across basic blocks or
functions?</div>
<div class=""><br class="">
</div>
<div class="">I suspect you won’t gain much for too
much added complexity with this approach.</div>
</div>
</div>
</div>
</blockquote>
<div style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans:
auto; text-align: start; text-indent: 0px; text-transform:
none; white-space: normal; widows: auto; word-spacing:
0px; -webkit-text-stroke-width: 0px;" class=""><br
class="">
</div>
<div style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans:
auto; text-align: start; text-indent: 0px; text-transform:
none; white-space: normal; widows: auto; word-spacing:
0px; -webkit-text-stroke-width: 0px;" class="">I forgot to
add: did you try to enable fast-isel instead? In the
context of a JIT this is a quite common approach.</div>
</div>
</blockquote>
<div><br class="">
</div>
<div>Well, as I mentioned at the bottom of my first letter, one
of our goal is to boost the compilation speed while keeping
the quality of generated code as much as possible. And
fast-isel doesn't perform really well on quality of generated
code.</div>
<div>Perhaps users of fast-isel would use multi-tier compilation
model and take fast-isel as baseline compiler I guess.</div>
</div>
</blockquote>
JFYI, if you haven't tested the quality of FastIsel code on your
platform, do. FastIsel frequently generates quite decent code for
some of the architectures we support. I've heard of folks using it
for high tier JITs. (I don't personally do so, but it's on my list
of things to re-evaluate at some point.)<br>
<blockquote
cite="mid:8FA1D301-E88D-4041-A796-9EF38E4A8A98@gmail.com"
type="cite">
<div>
<div><br class="">
</div>
B.R.</div>
<div>McClane<br class="">
<blockquote type="cite" class="">
<div class="">
<div style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans:
auto; text-align: start; text-indent: 0px; text-transform:
none; white-space: normal; widows: auto; word-spacing:
0px; -webkit-text-stroke-width: 0px;" class=""><br
class="">
</div>
<div style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans:
auto; text-align: start; text-indent: 0px; text-transform:
none; white-space: normal; widows: auto; word-spacing:
0px; -webkit-text-stroke-width: 0px;" class="">— </div>
<div style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans:
auto; text-align: start; text-indent: 0px; text-transform:
none; white-space: normal; widows: auto; word-spacing:
0px; -webkit-text-stroke-width: 0px;" class="">Mehdi</div>
<div style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans:
auto; text-align: start; text-indent: 0px; text-transform:
none; white-space: normal; widows: auto; word-spacing:
0px; -webkit-text-stroke-width: 0px;" class=""><br
class="">
</div>
<blockquote type="cite" class="" style="font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; orphans: auto; text-align: start;
text-indent: 0px; text-transform: none; white-space:
normal; widows: auto; word-spacing: 0px;
-webkit-text-size-adjust: auto; -webkit-text-stroke-width:
0px;">
<div class="" style="word-wrap: break-word;
-webkit-nbsp-mode: space; -webkit-line-break:
after-white-space;">
<div class="">
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class=""><br class="">
</div>
<div class="">The instruction selector of LLVM
is an interpreter that interpret the
MatcherTable which consists of bytecodes
generated by TableGen. I'm surprised to find
that the structure of MatcherTable and the
interpreter seems to be suitable for
parallelization. So we propose a prototype
that parallelizes the interpreting of
OPC_Scope children that are possibly
time-consuming. Here is some quick overview:</div>
<div class=""><br class="">
</div>
<div class="">We add two new opcodes: OPC_Fork
and OPC_Merge. During DAG optimization
process(utils/TableGen/<wbr class="">DAGISelMatcherOpt.cpp).
OPC_Fork would be added to the front of
scope(OPC_Scope) children which fulfill
following conditions:</div>
<div class="">1. Amount of opcodes within the
child exceed certain threshold(5 in current
prototype).</div>
<div class="">2. The child is reside in a
sequence of continuous scope children which
length also exceed certain threshold(7 in
current prototype).</div>
<div class="">For each valid sequence of scope
children, an extra scope child, where
OPC_Merge is the only opcode, would be
appended to it(the sequence). </div>
<div class=""><br class="">
</div>
<div class="">In the interpreter, when an
OPC_Fork is encountered inside a scope child,
the main thread would dispatch the scope child
as a task to a central thread pool, then jump
to the next child. At the end of a valid
"parallel sequence(of scope children)" an
OPC_Merge must exist and the main thread would
stop there and wait other threads to finish. <br
class="" clear="all">
<div class=""><br class="">
</div>
<div class="">About the synchronization,
read-write lock is mainly used: In each
checking-style opcode(e.g. OPC_CheckSame,
OPC_CheckType, except OPC_CheckComplexPat)
handlers, a read lock is used, otherwise, a
write lock is used. </div>
<div class=""><br class="">
</div>
<div class="">Finally, although the generated
code is correct, total consuming time barely
break even with the original one. Possible
reasons may be:</div>
<div class="">1. The original interpreter is
pretty fast actually. The thread pool
dispatching time for each selection task may
be too long in comparison with the original
approach.</div>
<div class="">2. X86 is the only architecture
which contains OPC_CheckComplexPat that
would modify DAG. This constraint force us
to add write lock on it which would block
other threads at the same time.
Unfortunately, OPC_CheckComplexPat is
probably the most time-consuming opcodes in
X86 and perhaps in other architectures, too.</div>
<div class="">3. Too many threads. We're now
working on another approach that use larger
region, consist of multiple scope children,
for each parallel task for the sake of
reducing thread amount. </div>
<div class="">4. Popular instructions, like
add or sub, contain lots of scope children
so one or several parallel regions exist.
However, most of the common instruction
variants(e.g. add %reg1, %reg2) is on "top"
among scope children which would be
encountered pretty early. So sometimes
threads are fired, but the correct
instruction is actually immediately selected
after that. Thus lots of time is wasted on
joining threads. </div>
<div class=""><br class="">
</div>
<div class="">Here is our working repository
and diff with 3.9 release: <a
moz-do-not-send="true"
href="https://bitbucket.org/mshockwave/hydra-llvm/branches/compare/master%0D3.9-origin#diff"
target="_blank" class="">https://bitbucket.<wbr
class="">org/mshockwave/hydra-llvm/<wbr
class="">branches/compare/master%0D3.9-<wbr
class="">origin#diff </a></div>
<div class="">I don't think the current state
is ready for code reviewing since there is
no significant speedup. But it's very
welcome for folks to discuss about this idea
and also, whether current instruction
selection approach had reached its upper
bound of speed.(I ignore fast-isel by mean
since it sacrifices too much on quality of
generated code. One of our goals is to boost
the compilation speed while keeping the code
quality as much as possible)</div>
<div class=""><br class="">
</div>
<div class="">Feel free to comment directly on
the repo diff above.</div>
<div class=""><br class="">
</div>
<div class="">About the "region approach"
mentioned in the third item of possible
reasons above. It's actually the
"dev-region-parallel" branch, but it still
has some bugs on correctness of generated
code. I would put more detail about it if
the feedback is sound.</div>
<div class=""><br class="">
</div>
<div class="">NOTE: There seems to be some
serious bugs in concurrent and
synchronization library of old gcc/standard
libraries. So it's strongly recommended to
use the latest version of clang to build our
work.</div>
<div class=""><br class="">
</div>
<div class="">B.R</div>
--<span class="Apple-converted-space"> </span><br
class="">
<div
class="m_-289937573853035298gmail-m_-4541007648684868004gmail_signature">
<div dir="ltr" class="">Bekket McClane
<div class="">Department of Computer
Science,</div>
<div class="">National Tsing Hua
University</div>
</div>
</div>
</div>
</div>
_______________________________________________<br
class="">
LLVM Developers mailing list<br class="">
<a moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br
class="">
<a moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br class="">
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
<p><br>
</p>
</body>
</html>