[PATCH] Fast-ISel state machine

Mon Apr 20 15:36:24 PDT 2015

> On Apr 20, 2015, at 3:15 PM, Michael Ilseman <milseman at apple.com> wrote:
> 
> Some quick comments, while you’re transitioning to Fabricator:
Thanks.  I appreciate the initial comments.
> 
> You have a lot of commented out code. Is that on purpose, or is there more work to be done to these patches before they can apply?
If you’re talking about FastISel.h and FastISelStateMachine.cpp, the commented out code shows what hasn’t been implemented yet in the copy from SelectionDAG.  I’ve actually now implemented everything I need to so i can remove that.  Its still there as a reference if anyone who knows the SelectionDAG state machine wants to easily see the differences.
> 
> You have a lot of llvm_unreachable(“TableGen should of generated code here”). Would it be possible to instead have these be purely virtual methods, so that it’s a compilation time error rather than a run-time error (that we may miss in testing)?
Those are all copied from SelectionDAGISel.h.  I believe they are like this so that targets can opt in to things like ComplexPatterns, although as this points I suspect that all targets should just use them, and tablegen could just create the stub with llvm_unreachable(“You have no complex patterns”).
> 
> All of these functions and methods violate LLVM naming convention, is there a specific reason for them to be proud-camel-case, or can you use humble-camel-case?
Similarly, copied from SelectionDAGISel.h.  I’m happy to fix the code in this patch to be correctly cased.  I’m not 100% sure on the standpoint for fixing exists violations.  I think it might be to fix them whenever making another fix in the same piece of code.
> 
> There’s a massive amount of code here. Is it all original, or is there a possibility to unify with code elsewhere in the DAGs or other Tablegen definitions?
I tried to maximize the amount of code reuse in tablegen itself by using the DAGISel* files in there.  It results in quite a few conditionals about whether we are generating for SD or Fast-ISel, but at least I didn’t have to copy and paste the whole file(s).

For the state machine itself, thats an interesting question.  In SD, everything is an SDValue.  The state machine maps from SDValue -> SDValue.  However, the fast-isel one maps from Value* -> MachineInstr.   It might be possible with some templates to share the majority of the state machine, but i’m genuinely unsure whether it would be more or less readable and maintainable.  It would be an interesting challenge though :)  

More difficult for code sharing though, is that there are subtle behavior differences.  SD-ISel needs to maintain chains and glue, fast-isel does not.  Much of the commented out code in the fast-isel state machine is to do with these unused operands.  I guess a templated state machine could have methods which operate on chains and glue, and the fast-isel methods could just be empty.

Cheers,
Pete
> 
> 
> 
>> On Apr 17, 2015, at 9:49 AM, Pete Cooper <peter_cooper at apple.com> wrote:
>> 
>> Hi all
>> 
>> I’ve been working on improving fast-isel coverage.  Our current fast-isel model involves auto-generating a bunch of C++ code from tablegen, but then hand writing a significant proportion in C++ to get coverage and performance.
>> 
>> I’ve ported the state machine used by selection DAG to fast-isel.  This is able to walk IR in much the same way that the SD machine walks nodes, and produces MIs.  This is able to handle predicates, transforms, and complex patterns, all of which are not handled by the current fast-isel tablegen emitter.
>> 
>> There are a few different pieces of this work:
>> (1) Extend tablegen SDNode to take the IR ValueID of the thing we are matching
>> (2) Extend tablegen PatFrag to take fast-isel versions of the predicate code and the transform code
>> (3) Teach the tablegen DAG emitter to use these IR constructs where available, and when emitting for fast-isel
>> (4) The state machine itself, which is just a port of the SD one, but with SDValue->Value* and a bunch of other changes like handling register class constraining.
>> (5) Porting the complex patterns, predicates, and transforms from SD to fast-isel.  This is mostly target specific code in the targets own FastISel.cpp file, and td files.
>> 
>> As my test case, i took a bitcode which contains llc itself compiled for AArch64.  All the target specific work i’ve done here is for AArch64.  It involves writing about 600 LOC for the complex patterns, and about 300 LOC to handle predicates/transforms.  This is vs the 5100 LOC AArch64FastISel.cpp currently takes.
>> 
>> To measure performance, i tried to see what it would take to get from SD, all the way to the currently extremely good AArch64 fast-isel implementation, and see how this new code could help us either get there quicker, or even improve what we have.
>> 
>> The metrics are:
>> - Time to run ISel
>> - Number of machine instrs printed by asm-printer
>> - BBs selected entirely by fast-isel
>> - Number of instrs fast-isel selected
>> 
>> And the runs I considered were:
>> (a) Stock SelectionDAG.  This is prior to anyone trying to write or run fast-isel
>> (b) Basic fast-isel (i.e., calls selectOperator), and has no hand-written fast-isel code
>> (c) Basic fast-isel + hand written code for return and branch (this is 300 LOC on AArch64)
>> (d) The new stack machine then the above code (this is about 900 LOC in addition)
>> (e) Current fast-isel without the new stack machine (this is 5100 LOC in tree currently)
>> (f) Current fast-isel falling back to the stack machine when it fails
>> 
>> Time to run ISel:
>> (a) 27.6
>> (b) 25.0
>> (c) 20.0
>> (d) 14.1
>> (e) 7.3
>> (f) 7.7
>> 
>> Number of machine instrs printed by asm-printer:
>> (a) 1912570
>> (b) 4685108
>> (c) 4321009
>> (d) 4598457
>> (e) 4230855
>> (f) 4231056
>> 
>> BBs selected entirely by fast-isel:
>> (a) N/A
>> (b) 63794
>> (c) 122225
>> (d) 266476
>> (e) 329909
>> (f) 330010
>> 
>> Number of instrs fast-isel selected:
>> (a) N/A
>> (b) 292623
>> (c) 638551
>> (d) 1389476
>> (e) 1471200
>> (f) 1474131
>> 
>> Apologies if there’s a better way to present that.  I don’t want to be presumptuous and put a spreadsheet not everyone can open in an email.
>> 
>> The interesting points to take away are that going from (c) to (d), we move from a backend with basic fast-isel support, to the new one.  This results in compile time in ISel dropping 30%, an increase in # instructions generated (i’m investigating this), over 2x the number of BBs entirely handled in fast-isel, and over 2x the number of instructions generated by fast-isel.
>> 
>> The number of BBs selected entirely is where almost all the compile time improvement comes from.  Fast-ISel gets the biggest wins in compile time when we never fall back to SelectionDAG.  Given that this patch improves fully selected BBs by 2x, its not surprising to see about 2x from compile time as a side-effect.
>> 
>> (e) to (f) is also interesting.  This is what happens if AArch64 uses the current path, but then adds the state machine as a fall-back.  We select about 100 more BBs in fast-isel, and about 3000 more instructions, but compile time actually regresses a little.  I haven’t yet spent much time tuning the state machine so i think i can recover this loss.  More importantly though, the state machine is optional and the backend doesn’t have to call it if it doesn’t want to.  So the code owner can make the call as to whether its worth it or not.  With a less tuned implementation that AArch64, its likely still a win to use the new code as a fallback.
>> 
>> So, from here i’d like to see if I can get this code in tree.  The code is entirely optional.  No backends have to change.  If no-one calls the code then it’ll be dead stripped, although we need at least one user at some point or its just dead code.
>> 
>> The AArch64 changes are a demonstration and its up to the code owners there if they want this or not.  The changes to TargetSelectionDAG.td and tablegen itself are necessary for this to work on any other targets.  I’m happy to discuss what the changes are in more detail, and whatever pieces people are happy with being landed (see the changes to ‘def fma’ for some of the more controversial tablegen fixes to get this to work).
>> 
>> Comments welcome.
>> 
>> Cheers,
>> Pete
>> 
>> <fast-isel-state-machine.diff><fast-isel-aarch64.diff><fast-isel-tablegen.diff>_______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>