[LLVMdev] [global-isel] Random comments on Proposal for a global instruction selector

Fri Aug 9 02:16:41 PDT 2013

On 9 Aug 2013, at 00:18, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:

> I am hoping that this proposal will generate a lot of feedback, and there are many different topics to discuss. When replying to this email, please change the subject header to something more specific, but keep the [global-isel] tag.

Subject changed, but I'm not sure if helps...

Overall, I really like this design (and would have had a much easier 18 months if we'd had it two years ago).  Comments inline.

> Thanks,
> /jakob
> 
> Proposal for a Global Instruction Selector
> 
> It is becoming evident that we need a replacement for the SelectionDAG-based instruction selector. This proposal describes the architecture of a global instruction selector based on the MachineInstr intermediate representation.
> 
> A new instruction selector is a large project that will probably take a couple of years to complete. This proposal only describes the desired final design, it does not describe the path of incremental development we should follow to get there.
> 
> This is a strawman design which shouldn't be seen as a final proposal. I am hoping for a lot of community feedback, and I don't expect we'll have a final design before we actually begin implementing it.
> 
> Why not legalize LLVM IR?
> 
> It's been proposed to legalize LLVM IR before instruction selection, see http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-April/061567.html. I recommend reading the mailing list thread debating the proposal, but let me summarize the reasons for legalizing MI instead of LLVM IR:
> 
> 	• ABI boundaries have a too high-level representation in LLVM IR. By ABI boundaries, I mean function call and return, landing pads, and inline assembly.
> 
> LLVM IR doesn't make it clear which function arguments go in registers and which go on the stack. Stack frame and call frame maintenance instructions are not represented in LLVM IR. Byval arguments are a big mystery, and some arguments are actually passed in multiple registers, depending on their type.
> 
> While LLVM IR could be extended to represent all these things, it would pollute the clean design of LLVM IR, and it is not the right level of abstraction for this IR.

There is a lot of black magic required for ABI support.  I can read the SysV ABI supplement for a given architecture and understand how, from my source language, types should be represented and where function arguments should go.  There is then almost no documentation on how this maps to the IR, so the only way when writing a new front end for an existing back end is to see what clang generates.  The 'clean' design of the IR here is not very clean, as the IR contains something that is neither what the back end nor the front end wants and has a non-obvious mapping to either.  

> 	• The getelementpointer instruction is too abstract. We want pointer arithmetic to be exposed as normal integer arithmetic in a type system without pointers. This would require a lot of awkward pointer-to-integer conversions in LLVM IR.

For our back end, we absolutely do not want a type system without pointers - not only do we have separate registers for pointers, we also have separate instructions for manipulating them and representing this in SelectionDAG requires some very ugly hacks.  This is also the case for several DSPs, which have separate address and data registers, and for upcoming Intel chips with MPX that have 4 hardware bounds registers that need keeping in sync with pointers.  Something like GEP is actually a very good model for these, because it allows the same bounds register to be used for a set of pointer operations, however it would be fine to have a single pointer type that was independent of its pointee type (or small set of different-sized ones, which will be useful for several GPUs) and a lower-level GEP that was just a byte offset.

For the architectures that don't distinguish between pointers and integers, there should be some option of lowering to integer types, however the information that a particular register contains an 

> Motivation
> 
> Everybody loves to hate SelectionDAG, but it is still useful to make its shortcomings explicit. These are some of the goals for a new instruction selector architecture:

I would add:

It is very difficult to extend SelectionDAG / TableGen with new value types that are not very similar to existing ones.  For example, adding a new kind of vector is easy, but adding a fat pointer representation is not really possible.  

> Overall design
> 
> The global instruction selector is implemented as a series of machine function passes. Targets can inject their own passes or replace standard passes, using the existing mechanism for configuring code generation pipelines.
> 
> The MI intermediate representation is extended with a set of abstract target-independent opcodes very similar to the ISD opcodes used by SelectionDAG. Virtual registers can be typed so they have an EVT instead of a register class.
> 
> The standard passes are:
> 
> 	• MI builder.
> 	• Switch lowering.
> 	• Iterative legalization and cleanup.
> 	• Register bank selection.
> 	• Common subexpression elimination.
> 	• Instruction selection.
> I'll describe these passes in more detail below.
> 
> MI builder
> 
> The MI builder pass translates LLVM IR to MI IR, much like the current SelectionDAGBuilder phase. Like SelectionDAGBuilder, this pass will:
> 
> 	• Expand LLVM IR first-class aggregate types into their constituent parts. This also includes splitting load, store, phi, and select instructions.
> 	• Replace pointer types with target-specific integers.
> 	• Expand getelementptr instructions with integer adds and multiplies.
> 	• Map LLVM instructions to target-independent abstract MI opcodes.
> 	• Lower ABI boundaries with help from target hooks.
> Unlike SelectionDAGBuilder, this pass will:
> 
> 	• Create a 1-1 mapping of the LLVM IR CFG. No new basic blocks are created.
> 	• Preserve switch instructions.
> 	• Preserve i1 logic instructions.
> 	• Not use MERGE_VALUES instructions. We'll use a value map that can handle aggregates.
> The aggregate type expansion creates value types that can all be represented by EVTs, and MachineRegisterInfo will be extended to allow virtual registers to have an EVT instead of a register class. EVTs are all the integer, floating point, and vector types from LLVM IR.

That all sounds very nice and clean.  I'd also like to represent pointers (including their address space, but not their pointee type) in this level and then have an optional pass (or a parameter for this one) that lowers pointer operations to integers.

> The ABI boundary lowering requires types to be broken down further into 'legal types' that can be mapped to registers. The secondary breakdown is currently handled by TargetLowering::LowerCallTo() calling getRegisterType() and getNumRegisters(). Most ABIs are defined in terms of C types, not LLVM IR types, so there is a close connection between the C frontend and the ABI lowering code in the instruction selector. It would be a good idea to have the ABI lowering code work independently of the type system used during instruction selection. This is made possible by having ABI lowering be part of the LLVM IR to MI translation process.

This seems cleaner.  I wonder if there is a way of making the C ABI more explicit to front-end authors at this point.  

> The global instruction selector will assign virtual registers to register banks explicitly, not by using the type system. The set of register banks is typically small (2-3) and defined by the target. Modelling register banks explicitly makes it possible to move operations between register banks to minimize cross-bank copies which are often expensive. SPARC even requires cross-bank copies to go through memory, as does x86 in some cases.

And on some ARM chips (e.g. Cortex A8), the via-memory path is faster than the register-register path in some cases.

> The interaction between legalization and bank selection is not clear yet. For example, i64 would be a legal type on ARM with and/or/xor/add/sub operations supported in a D-register. Some i64 operations are not supported in D-registers, and it would be necessary to expand those i64 operations to i32 operations in the GPR bank. When that happens, it is most likely preferable to 'legalize' more connected i64 operations so they can be moved to the GPR bank as well.

How would you model the microMIPS/Thumb-1/RISC-V 16-bit (and, in some cases, x86) idea that a denser encoding can be used if you restrict register accesses to the bottom half of the register set?  The registers are still the same bank, and the set of operations is the same, but in some cases much more expensive (having to switch in and out of microMIPS mode, for example, is very expensive).  This problem hits multiple layers in the lowering process.  I don't have a good solution for it (nor a strong requirement for it, but if there's a design that will make it easier to solve in the future then that would be nice, as we and the people at Berkeley are going to be doing a reasonable amount of work with variable-length encodings over the coming years).

> Instruction selection
> 
> The instruction selection pass replaces most target-independent instructions with real target opcodes, and it ensures that all virtual registers have register classes instead of EVTs. Some target-independent instructions, like COPY, are not lowered until after register allocation.

What is a target-independent instruction?  Does a register-register add count?  It would be nice if target-specific machine instructions could have enough metadata that some MI passes could become target independent.  For example, I'm not sure how many people have implemented a delay slot filler so far (I know of 4, but there could be more) and it would be great if we could have one generic one and enough metadata added to the instruction definitions that it could detect branches with nops in delay slots and determine which instructions were safe to move there.

> 	• Full EVTs. Finally, arithmetic instructions actually depend on the full type of their operands. It is worth noting, though, that the int/float distinction is already redundantly encoded in the operations. LLVM IR has separate add and fadd instructions, for example.

And a separate GEP for pointer arithmetic...

David