[PATCH] Fast-ISel state machine

Mon Apr 20 15:31:44 PDT 2015

You can put them all in one review but there may be separate people that 
need
to approve the different parts.

On 04/20/2015 01:28 PM, Pete Cooper wrote:
> Thanks Reed.  I can do that.  I guess it’ll be 3 reviews, but that 
> should be fine to start with.
>
> Pete
>> On Apr 20, 2015, at 1:27 PM, Reed Kotler <rkotler at mips.com 
>> <mailto:rkotler at mips.com>> wrote:
>>
>>
>>
>> Consider putting this in Phabricator. It will make it much easier for 
>> people to comment on it.
>>
>> On 04/17/2015 09:49 AM, Pete Cooper wrote:
>>> Hi all
>>>
>>> I’ve been working on improving fast-isel coverage.  Our current 
>>> fast-isel model involves auto-generating a bunch of C++ code from 
>>> tablegen, but then hand writing a significant proportion in C++ to 
>>> get coverage and performance.
>>>
>>> I’ve ported the state machine used by selection DAG to fast-isel. 
>>>  This is able to walk IR in much the same way that the SD machine 
>>> walks nodes, and produces MIs.  This is able to handle predicates, 
>>> transforms, and complex patterns, all of which are not handled by 
>>> the current fast-isel tablegen emitter.
>>>
>>> There are a few different pieces of this work:
>>> (1) Extend tablegen SDNode to take the IR ValueID of the thing we 
>>> are matching
>>> (2) Extend tablegen PatFrag to take fast-isel versions of the 
>>> predicate code and the transform code
>>> (3) Teach the tablegen DAG emitter to use these IR constructs where 
>>> available, and when emitting for fast-isel
>>> (4) The state machine itself, which is just a port of the SD one, 
>>> but with SDValue->Value* and a bunch of other changes like handling 
>>> register class constraining.
>>> (5) Porting the complex patterns, predicates, and transforms from SD 
>>> to fast-isel.  This is mostly target specific code in the targets 
>>> own FastISel.cpp file, and td files.
>>>
>>> As my test case, i took a bitcode which contains llc itself compiled 
>>> for AArch64.  All the target specific work i’ve done here is for 
>>> AArch64.  It involves writing about 600 LOC for the complex 
>>> patterns, and about 300 LOC to handle predicates/transforms.  This 
>>> is vs the 5100 LOC AArch64FastISel.cpp currently takes.
>>>
>>> To measure performance, i tried to see what it would take to get 
>>> from SD, all the way to the currently extremely good AArch64 
>>> fast-isel implementation, and see how this new code could help us 
>>> either get there quicker, or even improve what we have.
>>>
>>> The metrics are:
>>> - Time to run ISel
>>> - Number of machine instrs printed by asm-printer
>>> - BBs selected entirely by fast-isel
>>> - Number of instrs fast-isel selected
>>>
>>> And the runs I considered were:
>>> (a) Stock SelectionDAG.  This is prior to anyone trying to write or 
>>> run fast-isel
>>> (b) Basic fast-isel (i.e., calls selectOperator), and has no 
>>> hand-written fast-isel code
>>> (c) Basic fast-isel + hand written code for return and branch (this 
>>> is 300 LOC on AArch64)
>>> (d) The new stack machine then the above code (this is about 900 LOC 
>>> in addition)
>>> (e) Current fast-isel without the new stack machine (this is 5100 
>>> LOC in tree currently)
>>> (f) Current fast-isel falling back to the stack machine when it fails
>>>
>>> Time to run ISel:
>>> (a) 27.6
>>> (b) 25.0
>>> (c) 20.0
>>> (d) 14.1
>>> (e) 7.3
>>> (f) 7.7
>>>
>>> Number of machine instrs printed by asm-printer:
>>> (a) 1912570
>>> (b) 4685108
>>> (c) 4321009
>>> (d) 4598457
>>> (e) 4230855
>>> (f) 4231056
>>>
>>> BBs selected entirely by fast-isel:
>>> (a) N/A
>>> (b) 63794
>>> (c) 122225
>>> (d) 266476
>>> (e) 329909
>>> (f) 330010
>>>
>>> Number of instrs fast-isel selected:
>>> (a) N/A
>>> (b) 292623
>>> (c) 638551
>>> (d) 1389476
>>> (e) 1471200
>>> (f) 1474131
>>>
>>> Apologies if there’s a better way to present that.  I don’t want to 
>>> be presumptuous and put a spreadsheet not everyone can open in an email.
>>>
>>> The interesting points to take away are that going from (c) to (d), 
>>> we move from a backend with basic fast-isel support, to the new one. 
>>>  This results in compile time in ISel dropping 30%, an increase in # 
>>> instructions generated (i’m investigating this), over 2x the number 
>>> of BBs entirely handled in fast-isel, and over 2x the number of 
>>> instructions generated by fast-isel.
>>>
>>> The number of BBs selected entirely is where almost all the compile 
>>> time improvement comes from.  Fast-ISel gets the biggest wins in 
>>> compile time when we never fall back to SelectionDAG. Given that 
>>> this patch improves fully selected BBs by 2x, its not surprising to 
>>> see about 2x from compile time as a side-effect.
>>>
>>> (e) to (f) is also interesting.  This is what happens if AArch64 
>>> uses the current path, but then adds the state machine as a 
>>> fall-back.  We select about 100 more BBs in fast-isel, and about 
>>> 3000 more instructions, but compile time actually regresses a 
>>> little.  I haven’t yet spent much time tuning the state machine so i 
>>> think i can recover this loss.  More importantly though, the state 
>>> machine is optional and the backend doesn’t have to call it if it 
>>> doesn’t want to.  So the code owner can make the call as to whether 
>>> its worth it or not.  With a less tuned implementation that AArch64, 
>>> its likely still a win to use the new code as a fallback.
>>>
>>> So, from here i’d like to see if I can get this code in tree.  The 
>>> code is entirely optional.  No backends have to change.  If no-one 
>>> calls the code then it’ll be dead stripped, although we need at 
>>> least one user at some point or its just dead code.
>>>
>>> The AArch64 changes are a demonstration and its up to the code 
>>> owners there if they want this or not.  The changes to 
>>> TargetSelectionDAG.td and tablegen itself are necessary for this to 
>>> work on any other targets.  I’m happy to discuss what the changes 
>>> are in more detail, and whatever pieces people are happy with being 
>>> landed (see the changes to ‘def fma’ for some of the more 
>>> controversial tablegen fixes to get this to work).
>>>
>>> Comments welcome.
>>>
>>> Cheers,
>>> Pete
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits-Tmj1lob9twqVc3sceRu5cw at public.gmane.org 
>>> <mailto:llvm-commits-Tmj1lob9twqVc3sceRu5cw at public.gmane.org>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150420/2e00f146/attachment.html>