<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">You can put them all in one review but
there may be separate people that need<br>
to approve the different parts.<br>
<br>
On 04/20/2015 01:28 PM, Pete Cooper wrote:<br>
</div>
<blockquote
cite="mid:E5347F8F-78BA-47DD-A17B-DA75EC0F8951@apple.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
Thanks Reed. I can do that. I guess it’ll be 3 reviews, but that
should be fine to start with.
<div class=""><br class="">
</div>
<div class="">Pete<br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Apr 20, 2015, at 1:27 PM, Reed Kotler <<a
moz-do-not-send="true" href="mailto:rkotler@mips.com"
class="">rkotler@mips.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class=""><br style="font-family: Helvetica; font-size:
12px; font-style: normal; font-variant: normal;
font-weight: normal; letter-spacing: normal;
line-height: 14px; orphans: auto; text-align: start;
text-indent: 0px; text-transform: none; white-space:
normal; widows: auto; word-spacing: 0px;
-webkit-text-stroke-width: 0px;" class="">
<br style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant: normal; font-weight:
normal; letter-spacing: normal; line-height: 14px;
orphans: auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: auto;
word-spacing: 0px; -webkit-text-stroke-width: 0px;"
class="">
<span style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant: normal; font-weight:
normal; letter-spacing: normal; line-height: 14px;
orphans: auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: auto;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
float: none; display: inline !important;" class="">Consider
putting this in Phabricator. It will make it much easier
for people to comment on it.</span><br
style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant: normal; font-weight:
normal; letter-spacing: normal; line-height: 14px;
orphans: auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: auto;
word-spacing: 0px; -webkit-text-stroke-width: 0px;"
class="">
<br style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant: normal; font-weight:
normal; letter-spacing: normal; line-height: 14px;
orphans: auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: auto;
word-spacing: 0px; -webkit-text-stroke-width: 0px;"
class="">
<span style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant: normal; font-weight:
normal; letter-spacing: normal; line-height: 14px;
orphans: auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: auto;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
float: none; display: inline !important;" class="">On
04/17/2015 09:49 AM, Pete Cooper wrote:</span><br
style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant: normal; font-weight:
normal; letter-spacing: normal; line-height: 14px;
orphans: auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: auto;
word-spacing: 0px; -webkit-text-stroke-width: 0px;"
class="">
<blockquote type="cite" style="font-family: Helvetica;
font-size: 12px; font-style: normal; font-variant:
normal; font-weight: normal; letter-spacing: normal;
line-height: 14px; orphans: auto; text-align: start;
text-indent: 0px; text-transform: none; white-space:
normal; widows: auto; word-spacing: 0px;
-webkit-text-stroke-width: 0px;" class="">Hi all<br
class="">
<br class="">
I’ve been working on improving fast-isel coverage. Our
current fast-isel model involves auto-generating a bunch
of C++ code from tablegen, but then hand writing a
significant proportion in C++ to get coverage and
performance.<br class="">
<br class="">
I’ve ported the state machine used by selection DAG to
fast-isel. This is able to walk IR in much the same way
that the SD machine walks nodes, and produces MIs. This
is able to handle predicates, transforms, and complex
patterns, all of which are not handled by the current
fast-isel tablegen emitter.<br class="">
<br class="">
There are a few different pieces of this work:<br
class="">
(1) Extend tablegen SDNode to take the IR ValueID of the
thing we are matching<br class="">
(2) Extend tablegen PatFrag to take fast-isel versions
of the predicate code and the transform code<br class="">
(3) Teach the tablegen DAG emitter to use these IR
constructs where available, and when emitting for
fast-isel<br class="">
(4) The state machine itself, which is just a port of
the SD one, but with SDValue->Value* and a bunch of
other changes like handling register class constraining.<br
class="">
(5) Porting the complex patterns, predicates, and
transforms from SD to fast-isel. This is mostly target
specific code in the targets own FastISel.cpp file, and
td files.<br class="">
<br class="">
As my test case, i took a bitcode which contains llc
itself compiled for AArch64. All the target specific
work i’ve done here is for AArch64. It involves writing
about 600 LOC for the complex patterns, and about 300
LOC to handle predicates/transforms. This is vs the
5100 LOC AArch64FastISel.cpp currently takes.<br
class="">
<br class="">
To measure performance, i tried to see what it would
take to get from SD, all the way to the currently
extremely good AArch64 fast-isel implementation, and see
how this new code could help us either get there
quicker, or even improve what we have.<br class="">
<br class="">
The metrics are:<br class="">
- Time to run ISel<br class="">
- Number of machine instrs printed by asm-printer<br
class="">
- BBs selected entirely by fast-isel<br class="">
- Number of instrs fast-isel selected<br class="">
<br class="">
And the runs I considered were:<br class="">
(a) Stock SelectionDAG. This is prior to anyone trying
to write or run fast-isel<br class="">
(b) Basic fast-isel (i.e., calls selectOperator), and
has no hand-written fast-isel code<br class="">
(c) Basic fast-isel + hand written code for return and
branch (this is 300 LOC on AArch64)<br class="">
(d) The new stack machine then the above code (this is
about 900 LOC in addition)<br class="">
(e) Current fast-isel without the new stack machine
(this is 5100 LOC in tree currently)<br class="">
(f) Current fast-isel falling back to the stack machine
when it fails<br class="">
<br class="">
Time to run ISel:<br class="">
(a) 27.6<br class="">
(b) 25.0<br class="">
(c) 20.0<br class="">
(d) 14.1<br class="">
(e) 7.3<br class="">
(f) 7.7<br class="">
<br class="">
Number of machine instrs printed by asm-printer:<br
class="">
(a) 1912570<br class="">
(b) 4685108<br class="">
(c) 4321009<br class="">
(d) 4598457<br class="">
(e) 4230855<br class="">
(f) 4231056<br class="">
<br class="">
BBs selected entirely by fast-isel:<br class="">
(a) N/A<br class="">
(b) 63794<br class="">
(c) 122225<br class="">
(d) 266476<br class="">
(e) 329909<br class="">
(f) 330010<br class="">
<br class="">
Number of instrs fast-isel selected:<br class="">
(a) N/A<br class="">
(b) 292623<br class="">
(c) 638551<br class="">
(d) 1389476<br class="">
(e) 1471200<br class="">
(f) 1474131<br class="">
<br class="">
Apologies if there’s a better way to present that. I
don’t want to be presumptuous and put a spreadsheet not
everyone can open in an email.<br class="">
<br class="">
The interesting points to take away are that going from
(c) to (d), we move from a backend with basic fast-isel
support, to the new one. This results in compile time
in ISel dropping 30%, an increase in # instructions
generated (i’m investigating this), over 2x the number
of BBs entirely handled in fast-isel, and over 2x the
number of instructions generated by fast-isel.<br
class="">
<br class="">
The number of BBs selected entirely is where almost all
the compile time improvement comes from. Fast-ISel gets
the biggest wins in compile time when we never fall back
to SelectionDAG. Given that this patch improves fully
selected BBs by 2x, its not surprising to see about 2x
from compile time as a side-effect.<br class="">
<br class="">
(e) to (f) is also interesting. This is what happens if
AArch64 uses the current path, but then adds the state
machine as a fall-back. We select about 100 more BBs in
fast-isel, and about 3000 more instructions, but compile
time actually regresses a little. I haven’t yet spent
much time tuning the state machine so i think i can
recover this loss. More importantly though, the state
machine is optional and the backend doesn’t have to call
it if it doesn’t want to. So the code owner can make
the call as to whether its worth it or not. With a less
tuned implementation that AArch64, its likely still a
win to use the new code as a fallback.<br class="">
<br class="">
So, from here i’d like to see if I can get this code in
tree. The code is entirely optional. No backends have
to change. If no-one calls the code then it’ll be dead
stripped, although we need at least one user at some
point or its just dead code.<br class="">
<br class="">
The AArch64 changes are a demonstration and its up to
the code owners there if they want this or not. The
changes to TargetSelectionDAG.td and tablegen itself are
necessary for this to work on any other targets. I’m
happy to discuss what the changes are in more detail,
and whatever pieces people are happy with being landed
(see the changes to ‘def fma’ for some of the more
controversial tablegen fixes to get this to work).<br
class="">
<br class="">
Comments welcome.<br class="">
<br class="">
Cheers,<br class="">
Pete<br class="">
<br class="">
<br class="">
<br class="">
_______________________________________________<br
class="">
llvm-commits mailing list<br class="">
<a moz-do-not-send="true"
href="mailto:llvm-commits-Tmj1lob9twqVc3sceRu5cw@public.gmane.org"
class="">llvm-commits-Tmj1lob9twqVc3sceRu5cw@public.gmane.org</a><br
class="">
<a moz-do-not-send="true"
href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits"
class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a></blockquote>
</div>
</blockquote>
</div>
<br class="">
</div>
</blockquote>
<br>
</body>
</html>