[PATCH] Fast-ISel state machine

Mon Apr 20 16:26:46 PDT 2015

Hi Pete,

I definitely interested in this. I'd like to see a backend ported fully to
use this in practice so that we can get a better idea of what the tradeoffs
are. (Possibly just all of the fast-isel backends to maximize code removal).

Also I agree that something like Phab might help with the visualization of
the changes. It's not a very incremental patch by nature and touches a lot
of work and so anything that can help would be good.

Thanks.

-eric

On Fri, Apr 17, 2015 at 9:54 AM Pete Cooper <peter_cooper at apple.com> wrote:

> Hi all
>
> I’ve been working on improving fast-isel coverage.  Our current fast-isel
> model involves auto-generating a bunch of C++ code from tablegen, but then
> hand writing a significant proportion in C++ to get coverage and
> performance.
>
> I’ve ported the state machine used by selection DAG to fast-isel.  This is
> able to walk IR in much the same way that the SD machine walks nodes, and
> produces MIs.  This is able to handle predicates, transforms, and complex
> patterns, all of which are not handled by the current fast-isel tablegen
> emitter.
>
> There are a few different pieces of this work:
> (1) Extend tablegen SDNode to take the IR ValueID of the thing we are
> matching
> (2) Extend tablegen PatFrag to take fast-isel versions of the predicate
> code and the transform code
> (3) Teach the tablegen DAG emitter to use these IR constructs where
> available, and when emitting for fast-isel
> (4) The state machine itself, which is just a port of the SD one, but with
> SDValue->Value* and a bunch of other changes like handling register class
> constraining.
> (5) Porting the complex patterns, predicates, and transforms from SD to
> fast-isel.  This is mostly target specific code in the targets own
> FastISel.cpp file, and td files.
>
> As my test case, i took a bitcode which contains llc itself compiled for
> AArch64.  All the target specific work i’ve done here is for AArch64.  It
> involves writing about 600 LOC for the complex patterns, and about 300 LOC
> to handle predicates/transforms.  This is vs the 5100 LOC
> AArch64FastISel.cpp currently takes.
>
> To measure performance, i tried to see what it would take to get from SD,
> all the way to the currently extremely good AArch64 fast-isel
> implementation, and see how this new code could help us either get there
> quicker, or even improve what we have.
>
> The metrics are:
> - Time to run ISel
> - Number of machine instrs printed by asm-printer
> - BBs selected entirely by fast-isel
> - Number of instrs fast-isel selected
>
> And the runs I considered were:
> (a) Stock SelectionDAG.  This is prior to anyone trying to write or run
> fast-isel
> (b) Basic fast-isel (i.e., calls selectOperator), and has no hand-written
> fast-isel code
> (c) Basic fast-isel + hand written code for return and branch (this is 300
> LOC on AArch64)
> (d) The new stack machine then the above code (this is about 900 LOC in
> addition)
> (e) Current fast-isel without the new stack machine (this is 5100 LOC in
> tree currently)
> (f) Current fast-isel falling back to the stack machine when it fails
>
> Time to run ISel:
> (a) 27.6
> (b) 25.0
> (c) 20.0
> (d) 14.1
> (e) 7.3
> (f) 7.7
>
> Number of machine instrs printed by asm-printer:
> (a) 1912570
> (b) 4685108
> (c) 4321009
> (d) 4598457
> (e) 4230855
> (f) 4231056
>
> BBs selected entirely by fast-isel:
> (a) N/A
> (b) 63794
> (c) 122225
> (d) 266476
> (e) 329909
> (f) 330010
>
> Number of instrs fast-isel selected:
> (a) N/A
> (b) 292623
> (c) 638551
> (d) 1389476
> (e) 1471200
> (f) 1474131
>
> Apologies if there’s a better way to present that.  I don’t want to be
> presumptuous and put a spreadsheet not everyone can open in an email.
>
> The interesting points to take away are that going from (c) to (d), we
> move from a backend with basic fast-isel support, to the new one.  This
> results in compile time in ISel dropping 30%, an increase in # instructions
> generated (i’m investigating this), over 2x the number of BBs entirely
> handled in fast-isel, and over 2x the number of instructions generated by
> fast-isel.
>
> The number of BBs selected entirely is where almost all the compile time
> improvement comes from.  Fast-ISel gets the biggest wins in compile time
> when we never fall back to SelectionDAG.  Given that this patch improves
> fully selected BBs by 2x, its not surprising to see about 2x from compile
> time as a side-effect.
>
> (e) to (f) is also interesting.  This is what happens if AArch64 uses the
> current path, but then adds the state machine as a fall-back.  We select
> about 100 more BBs in fast-isel, and about 3000 more instructions, but
> compile time actually regresses a little.  I haven’t yet spent much time
> tuning the state machine so i think i can recover this loss.  More
> importantly though, the state machine is optional and the backend doesn’t
> have to call it if it doesn’t want to.  So the code owner can make the call
> as to whether its worth it or not.  With a less tuned implementation that
> AArch64, its likely still a win to use the new code as a fallback.
>
> So, from here i’d like to see if I can get this code in tree.  The code is
> entirely optional.  No backends have to change.  If no-one calls the code
> then it’ll be dead stripped, although we need at least one user at some
> point or its just dead code.
>
> The AArch64 changes are a demonstration and its up to the code owners
> there if they want this or not.  The changes to TargetSelectionDAG.td and
> tablegen itself are necessary for this to work on any other targets.  I’m
> happy to discuss what the changes are in more detail, and whatever pieces
> people are happy with being landed (see the changes to ‘def fma’ for some
> of the more controversial tablegen fixes to get this to work).
>
> Comments welcome.
>
> Cheers,
> Pete
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150420/c1baafcf/attachment.html>