[llvm-dev] [RFC] Implementing a general purpose 64-bit target (RISC-V 64-bit) with i64 as the only legal integer type

Wed Oct 3 02:27:21 PDT 2018

# Purpose of this RFC
This RFC describes the challenges of modelling the 64-bit RISC-V target (RV64)
and details the two most obvious implementation choices:
1) Having i64 as the only legal integer type
2) Introducing i32 subregisters

I've worked on implementing both approaches and fleshed out a pretty complete
implementation of 1), which is my preferred option. With this RFC, I would
welcome further feedback and insight, as well as suggestions or comments on
the target-independent modifications (e.g. TargetInstrInfo hooks) I suggest as
worthwhile.

# Background: RV64
The RISC-V instruction set is structured as a set of bases (RV32I, RV32E,
RV64I, RV128I) with a series of optional extensions (e.g. M for
multiply/divide, A for atomics, F+D for single+double precision floating
point). It's important to note that RV64I is not just RV32I with some
additional instructions, it's a completely different base where operations
work on 64-bit rather than 32-bit values. RV64I also introduces 10 new
instructions: ld/sd (64-bit load/store), addiw, slliw, srliw, sraiw, addw,
subw, sllw, srlw, sraw. The `*W` instructions all produce a sign-extended
result and take the lower 32-bits of their operands as inputs. Unlike MIPS64,
there is no requirement that inputs to these `*W` are sign-extended in order
to avoid unpredictable behaviour.

# Background: RISC-V backend implementation.
Other backends aiming to support both 32-bit and 64-bit architecture variants
handle this by defining two versions of each instruction with overlapping
encodings, with one marked as isCodeGenOnly.  This leads to unwanted
duplication, both in terms of tablegen descriptions and throughout the C++
implementation of the backend (e.g. any code checking for RISCV::ADD would
also want to check for RISCV::ADD64). Fortunately we can avoid this thanks to
the work Krzysztof Parzyszek contributed to support variable-sized register
classes <http://lists.llvm.org/pipermail/llvm-dev/2016-September/105027.html>.
The in-tree RISC-V backend exploits this, parameterising the base instruction
definitions by XLEN (the size of the general purpose registers).

# Option 1: Have i64 as the only legal type
## Approach
Every register class in RISCVRegisterInfo.td is parameterised by XLenVT, which
is i32 for RV32 and i64 for RV64. No subregisters are defined, meaning i32 is
not a legal type. Patterns for the `*W` instructions tend to look something
like:

    def : Pat<(sext_inreg (add GPR:$rs1, GPR:$rs2), i32),
              (ADDW GPR:$rs1, GPR:$rs2)>;

Essentially all patterns for RV32I are also valid for RV64I.

## Changes needed
* Introduction of new patterns, RV64I-specific immediate materialisation

* A number of SelectionDAG nodes generated from LLVM intrinsics take i32
arguments and the DAG legalizer doesn't currently know how to legalize them.
Promoting these arguments is trivial but requires additions to
LegalizeIntegerTypes.cpp. So far I've had to do this for
frameaddr/returnaddr/prefetch, but there are likely more.

* The shift amount type is i64. If the shift amount operand is smaller than
this, SelectionDAGBuilder will zero-extend it (changed from any-extend in
rL125457). i32->i64 zero-extension is more expensive than sign-extension, but
it's unnecessary anyway as only the lower 6 bits are used. Introduce
TargetLowering::getExtendForShiftAmount which is called during
SelectionDAGBuilder::visitShift.

* When promoting setcc operands, DAGTypeLegalizer::PromoteSetCCOperands makes
the arbitrary choice to zero-extend. It is cheaper to sign-extend from i32 to
i64, so introduce TargetLowering::isSExtCheaperThanZExt(FromTY, ToTy). For now
this is only used through PromoteSetCCOperands, but perhaps there are other
cases where it would be useful?

* When 32-bit srl is legalized, the dag combiner will try to reduce the bits
in the mask in: (srl (and val, 0xffffffff), imm) based on the knowledge of the
lower bits that will be shifted out. This means a tablegen pattern matching
0xffffff won't work. Custom selection code in RISCVDAGToDAGISel can recognize
when this has happened and produce SRLIW.

* New i64 versions of the target-specific intrinsics added to aid the lowering
of part-word atomicrmw must be defined.

* RV64F (single-precision floating point) requires a little extra work due to
the fact i32 is not a legal type. When call lowering happens post-legalisation
(e.g. when an intrinsic was inserted during legalisation). A bitcast from f32
to i32 can't be introduced. There's a similar challenge for RV32D. Introduce
target-specific DAG nodes that perform bitcast+sext for f32->i64 and
trunc+bitcast for i64->f32. Custom-lower ISD::BITCAST to ensure these nodes
are selected.

## Questions
Does anyone have any reservations about this approach of having i64 as the
only legal type?

Some of the target hooks could perhaps be replaced with more heroics in the
backend. What are people's feelings here?

# Option 2: Model 32-bit subregs
## Approach
Define 32-bit subregisters for the GPRs that can be used in patterns and
instruction definitions. The following node types are potentially useful:
* `EXTRACT_SUBREG`: Supports getting the lower 32-bits of a 64-bit register
* `INSERT_SUBREG`: Assumes only the lower bits are modified. Can be used with
`IMPLICIT_DEF` to indicate that the upper bits are undefined. You can't
directly represent sign-extension, but you can do what Mips64 does and define
extra patterns to catch redundant sign-extension after one of the `*W`
instructions.
* `SUBREG_TO_REG`: a constant argument asserts the value of the bits left in
the upper portion of the register. This is perfect for zero-extension, and not
much good for the sign-extension RISC-V performs.

You end up with patterns like:

    def : Pat<(anyext GPR32:$reg),
              (SUBREG_TO_REG (i64 0), GPR32:$reg, sub_32)>;
def : Pat<(trunc GPR:$reg), (EXTRACT_SUBREG GPR:$reg, sub_32)>;

def : Pat<(add GPR32:$src, GPR32:$src2),
(ADDW GPR32:$src, GPR32:$src2)>;

def : Pat<(add GPR32:$rs1, simm12_i32:$imm12),
(ADDIW GPR32:$rs1, simm12_i32:$imm12)>;

## Changes needed
* 32-bit subregisters must be defined. Some register classes need GPR32
versions, e.g. GPR, GPRNoX0, GPRC.

* The RISCVAsmParser and RISCVDisassembler must be modified to support the new
register classes used for the 32-bit subregs.

* The calling convention implementation must handle promotion of i32
arguments/returns to i64.

* The `*W` instructions must be defined using GPR32.

* New `Operand<i32>` types must be defined and used in the `*W` instructions.

* When defining a variable-sized register class you specify a DefaultMode.
This must be set to i64 to avoid breaking RV32 compilation.

* This gives enough to define working support for the `*W` operations, but to
enable codegen for the other integer instructions requires either duplication
or smarts. To write patterns using i32 you need to define a new variant of the
instruction. TableGen changes might remove the need for this. Even with such
support, it's not particularly desirable to write a bunch of new patterns for
instructions other than the `*W` ones.

I'm sure solutions are possible, but given that the i64-only approach
seems to work very well, I'm not sure it's worth pushing further.

# Conclusion
Taking full advantage of support for variable-sized register classes and
sticking with i64 as the only legal integer type seems very workable and is
definitely my preference based on the work I've done. I'd be really interested
if anyone has any particular concerns or advice, or feedback on the suggested
new target hooks.

Best,

Alex Bradbury, lowRISC CIC