[PATCH] Fast-ISel state machine
rkotler at mips.com
Mon Apr 20 15:31:44 PDT 2015
You can put them all in one review but there may be separate people that
to approve the different parts.
On 04/20/2015 01:28 PM, Pete Cooper wrote:
> Thanks Reed. I can do that. I guess it’ll be 3 reviews, but that
> should be fine to start with.
>> On Apr 20, 2015, at 1:27 PM, Reed Kotler <rkotler at mips.com
>> <mailto:rkotler at mips.com>> wrote:
>> Consider putting this in Phabricator. It will make it much easier for
>> people to comment on it.
>> On 04/17/2015 09:49 AM, Pete Cooper wrote:
>>> Hi all
>>> I’ve been working on improving fast-isel coverage. Our current
>>> fast-isel model involves auto-generating a bunch of C++ code from
>>> tablegen, but then hand writing a significant proportion in C++ to
>>> get coverage and performance.
>>> I’ve ported the state machine used by selection DAG to fast-isel.
>>> This is able to walk IR in much the same way that the SD machine
>>> walks nodes, and produces MIs. This is able to handle predicates,
>>> transforms, and complex patterns, all of which are not handled by
>>> the current fast-isel tablegen emitter.
>>> There are a few different pieces of this work:
>>> (1) Extend tablegen SDNode to take the IR ValueID of the thing we
>>> are matching
>>> (2) Extend tablegen PatFrag to take fast-isel versions of the
>>> predicate code and the transform code
>>> (3) Teach the tablegen DAG emitter to use these IR constructs where
>>> available, and when emitting for fast-isel
>>> (4) The state machine itself, which is just a port of the SD one,
>>> but with SDValue->Value* and a bunch of other changes like handling
>>> register class constraining.
>>> (5) Porting the complex patterns, predicates, and transforms from SD
>>> to fast-isel. This is mostly target specific code in the targets
>>> own FastISel.cpp file, and td files.
>>> As my test case, i took a bitcode which contains llc itself compiled
>>> for AArch64. All the target specific work i’ve done here is for
>>> AArch64. It involves writing about 600 LOC for the complex
>>> patterns, and about 300 LOC to handle predicates/transforms. This
>>> is vs the 5100 LOC AArch64FastISel.cpp currently takes.
>>> To measure performance, i tried to see what it would take to get
>>> from SD, all the way to the currently extremely good AArch64
>>> fast-isel implementation, and see how this new code could help us
>>> either get there quicker, or even improve what we have.
>>> The metrics are:
>>> - Time to run ISel
>>> - Number of machine instrs printed by asm-printer
>>> - BBs selected entirely by fast-isel
>>> - Number of instrs fast-isel selected
>>> And the runs I considered were:
>>> (a) Stock SelectionDAG. This is prior to anyone trying to write or
>>> run fast-isel
>>> (b) Basic fast-isel (i.e., calls selectOperator), and has no
>>> hand-written fast-isel code
>>> (c) Basic fast-isel + hand written code for return and branch (this
>>> is 300 LOC on AArch64)
>>> (d) The new stack machine then the above code (this is about 900 LOC
>>> in addition)
>>> (e) Current fast-isel without the new stack machine (this is 5100
>>> LOC in tree currently)
>>> (f) Current fast-isel falling back to the stack machine when it fails
>>> Time to run ISel:
>>> (a) 27.6
>>> (b) 25.0
>>> (c) 20.0
>>> (d) 14.1
>>> (e) 7.3
>>> (f) 7.7
>>> Number of machine instrs printed by asm-printer:
>>> (a) 1912570
>>> (b) 4685108
>>> (c) 4321009
>>> (d) 4598457
>>> (e) 4230855
>>> (f) 4231056
>>> BBs selected entirely by fast-isel:
>>> (a) N/A
>>> (b) 63794
>>> (c) 122225
>>> (d) 266476
>>> (e) 329909
>>> (f) 330010
>>> Number of instrs fast-isel selected:
>>> (a) N/A
>>> (b) 292623
>>> (c) 638551
>>> (d) 1389476
>>> (e) 1471200
>>> (f) 1474131
>>> Apologies if there’s a better way to present that. I don’t want to
>>> be presumptuous and put a spreadsheet not everyone can open in an email.
>>> The interesting points to take away are that going from (c) to (d),
>>> we move from a backend with basic fast-isel support, to the new one.
>>> This results in compile time in ISel dropping 30%, an increase in #
>>> instructions generated (i’m investigating this), over 2x the number
>>> of BBs entirely handled in fast-isel, and over 2x the number of
>>> instructions generated by fast-isel.
>>> The number of BBs selected entirely is where almost all the compile
>>> time improvement comes from. Fast-ISel gets the biggest wins in
>>> compile time when we never fall back to SelectionDAG. Given that
>>> this patch improves fully selected BBs by 2x, its not surprising to
>>> see about 2x from compile time as a side-effect.
>>> (e) to (f) is also interesting. This is what happens if AArch64
>>> uses the current path, but then adds the state machine as a
>>> fall-back. We select about 100 more BBs in fast-isel, and about
>>> 3000 more instructions, but compile time actually regresses a
>>> little. I haven’t yet spent much time tuning the state machine so i
>>> think i can recover this loss. More importantly though, the state
>>> machine is optional and the backend doesn’t have to call it if it
>>> doesn’t want to. So the code owner can make the call as to whether
>>> its worth it or not. With a less tuned implementation that AArch64,
>>> its likely still a win to use the new code as a fallback.
>>> So, from here i’d like to see if I can get this code in tree. The
>>> code is entirely optional. No backends have to change. If no-one
>>> calls the code then it’ll be dead stripped, although we need at
>>> least one user at some point or its just dead code.
>>> The AArch64 changes are a demonstration and its up to the code
>>> owners there if they want this or not. The changes to
>>> TargetSelectionDAG.td and tablegen itself are necessary for this to
>>> work on any other targets. I’m happy to discuss what the changes
>>> are in more detail, and whatever pieces people are happy with being
>>> landed (see the changes to ‘def fma’ for some of the more
>>> controversial tablegen fixes to get this to work).
>>> Comments welcome.
>>> llvm-commits mailing list
>>> llvm-commits-Tmj1lob9twqVc3sceRu5cw at public.gmane.org
>>> <mailto:llvm-commits-Tmj1lob9twqVc3sceRu5cw at public.gmane.org>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits