[llvm-dev] [RFC] Implementing a general purpose 64-bit target (RISC-V 64-bit) with i64 as the only legal integer type

Thu Oct 4 00:03:01 PDT 2018

Now rebased to ToT, as of now.

All that mess in divu is the same as is generated from:

long foo(){
    return 0x00000000ffffffffl;
}

0000000000000000 <foo>:
   0: 00000537          lui a0,0x0
   4: 0005059b          sext.w a1,a0
   8: 1582                slli a1,a1,0x20
   a: 357d                addiw a0,a0,-1
   c: 1502                slli a0,a0,0x20
   e: 9101                srli a0,a0,0x20
  10: 8d4d                or a0,a0,a1
  12: 8082                ret

For sure that's not the best way to generate that constant!

On Wed, Oct 3, 2018 at 6:48 PM, Bruce Hoult <brucehoult at sifive.com> wrote:

> Only having i64 seems cleaner to me. Of course you can still have i32 in
> the code up until legalisation.
>
> I think the only real downside is you can end up with 64 bit arithmetic on
> things that are actually 32 bit, followed by a sext? That can be cleaned up
> to a *w instruction in most cases, and already is.
>
> Example:
>
> ----------- ops.c
> int add(int a, int b){return a+b;}
> int sub(int a, int b){return a-b;}
> int mul(int a, int b){return a*b;}
> int div(int a, int b){return a/b;}
>
> unsigned addu(unsigned a, unsigned b){return a+b;}
> unsigned subu(unsigned a, unsigned b){return a-b;}
> unsigned mulu(unsigned a, unsigned b){return a*b;}
> unsigned divu(unsigned a, unsigned b){return a/b;}
> -----------
> bruce at nuc:~/riscv/tests$ clang -O -c ops.c --target=riscv64 -march=rv64gc
> bruce at nuc:~/riscv/tests$ riscv64-unknown-elf-objdump -d ops.o
>
> ops.o:     file format elf64-littleriscv
>
>
> Disassembly of section .text:
>
> 0000000000000000 <add>:
>    0: 9d2d                addw a0,a0,a1
>    2: 8082                ret
>
> 0000000000000004 <sub>:
>    4: 9d0d                subw a0,a0,a1
>    6: 8082                ret
>
> 0000000000000008 <mul>:
>    8: 02a58533          mul a0,a1,a0
>    c: 2501                sext.w a0,a0
>    e: 8082                ret
>
> 0000000000000010 <div>:
>   10: 02b54533          div a0,a0,a1
>   14: 2501                sext.w a0,a0
>   16: 8082                ret
>
> 0000000000000018 <addu>:
>   18: 9d2d                addw a0,a0,a1
>   1a: 8082                ret
>
> 000000000000001c <subu>:
>   1c: 9d0d                subw a0,a0,a1
>   1e: 8082                ret
>
> 0000000000000020 <mulu>:
>   20: 02a58533          mul a0,a1,a0
>   24: 2501                sext.w a0,a0
>   26: 8082                ret
>
> 0000000000000028 <divu>:
>   28: 00000637          lui a2,0x0
>   2c: 0006069b          sext.w a3,a2
>   30: 1682                slli a3,a3,0x20
>   32: 367d                addiw a2,a2,-1
>   34: 1602                slli a2,a2,0x20
>   36: 9201                srli a2,a2,0x20
>   38: 8e55                or a2,a2,a3
>   3a: 8df1                and a1,a1,a2
>   3c: 8d71                and a0,a0,a2
>   3e: 02b55533          divu a0,a0,a1
>   42: 2501                sext.w a0,a0
>   44: 8082                ret
>
> The divu is pretty bad. The add/sub/addu/subu are perfect.
>
> The mul/mulu and div could all be cleaned up to use a *w instruction and
> drop the sext.w. Is this not happening because the information has been
> lost that the inputs are restricted to i32?
>
> I did this test using branch "experimental" at https://github.com/
> brucehoult/llvm-project-20170507 which contains recent (Sep 21) LLVM ToT
> with lowRISC patches applied.
>
> On Wed, Oct 3, 2018 at 2:27 AM, Alex Bradbury via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> # Purpose of this RFC
>> This RFC describes the challenges of modelling the 64-bit RISC-V target
>> (RV64)
>> and details the two most obvious implementation choices:
>> 1) Having i64 as the only legal integer type
>> 2) Introducing i32 subregisters
>>
>> I've worked on implementing both approaches and fleshed out a pretty
>> complete
>> implementation of 1), which is my preferred option. With this RFC, I would
>> welcome further feedback and insight, as well as suggestions or comments
>> on
>> the target-independent modifications (e.g. TargetInstrInfo hooks) I
>> suggest as
>> worthwhile.
>>
>> # Background: RV64
>> The RISC-V instruction set is structured as a set of bases (RV32I, RV32E,
>> RV64I, RV128I) with a series of optional extensions (e.g. M for
>> multiply/divide, A for atomics, F+D for single+double precision floating
>> point). It's important to note that RV64I is not just RV32I with some
>> additional instructions, it's a completely different base where operations
>> work on 64-bit rather than 32-bit values. RV64I also introduces 10 new
>> instructions: ld/sd (64-bit load/store), addiw, slliw, srliw, sraiw, addw,
>> subw, sllw, srlw, sraw. The `*W` instructions all produce a sign-extended
>> result and take the lower 32-bits of their operands as inputs. Unlike
>> MIPS64,
>> there is no requirement that inputs to these `*W` are sign-extended in
>> order
>> to avoid unpredictable behaviour.
>>
>> # Background: RISC-V backend implementation.
>> Other backends aiming to support both 32-bit and 64-bit architecture
>> variants
>> handle this by defining two versions of each instruction with overlapping
>> encodings, with one marked as isCodeGenOnly.  This leads to unwanted
>> duplication, both in terms of tablegen descriptions and throughout the C++
>> implementation of the backend (e.g. any code checking for RISCV::ADD would
>> also want to check for RISCV::ADD64). Fortunately we can avoid this
>> thanks to
>> the work Krzysztof Parzyszek contributed to support variable-sized
>> register
>> classes <http://lists.llvm.org/pipermail/llvm-dev/2016-September/
>> 105027.html>.
>> The in-tree RISC-V backend exploits this, parameterising the base
>> instruction
>> definitions by XLEN (the size of the general purpose registers).
>>
>> # Option 1: Have i64 as the only legal type
>> ## Approach
>> Every register class in RISCVRegisterInfo.td is parameterised by XLenVT,
>> which
>> is i32 for RV32 and i64 for RV64. No subregisters are defined, meaning
>> i32 is
>> not a legal type. Patterns for the `*W` instructions tend to look
>> something
>> like:
>>
>>     def : Pat<(sext_inreg (add GPR:$rs1, GPR:$rs2), i32),
>>               (ADDW GPR:$rs1, GPR:$rs2)>;
>>
>> Essentially all patterns for RV32I are also valid for RV64I.
>>
>> ## Changes needed
>> * Introduction of new patterns, RV64I-specific immediate materialisation
>>
>> * A number of SelectionDAG nodes generated from LLVM intrinsics take i32
>> arguments and the DAG legalizer doesn't currently know how to legalize
>> them.
>> Promoting these arguments is trivial but requires additions to
>> LegalizeIntegerTypes.cpp. So far I've had to do this for
>> frameaddr/returnaddr/prefetch, but there are likely more.
>>
>> * The shift amount type is i64. If the shift amount operand is smaller
>> than
>> this, SelectionDAGBuilder will zero-extend it (changed from any-extend in
>> rL125457). i32->i64 zero-extension is more expensive than sign-extension,
>> but
>> it's unnecessary anyway as only the lower 6 bits are used. Introduce
>> TargetLowering::getExtendForShiftAmount which is called during
>> SelectionDAGBuilder::visitShift.
>>
>> * When promoting setcc operands, DAGTypeLegalizer::PromoteSetCCOperands
>> makes
>> the arbitrary choice to zero-extend. It is cheaper to sign-extend from
>> i32 to
>> i64, so introduce TargetLowering::isSExtCheaperThanZExt(FromTY, ToTy).
>> For now
>> this is only used through PromoteSetCCOperands, but perhaps there are
>> other
>> cases where it would be useful?
>>
>> * When 32-bit srl is legalized, the dag combiner will try to reduce the
>> bits
>> in the mask in: (srl (and val, 0xffffffff), imm) based on the knowledge
>> of the
>> lower bits that will be shifted out. This means a tablegen pattern
>> matching
>> 0xffffff won't work. Custom selection code in RISCVDAGToDAGISel can
>> recognize
>> when this has happened and produce SRLIW.
>>
>> * New i64 versions of the target-specific intrinsics added to aid the
>> lowering
>> of part-word atomicrmw must be defined.
>>
>> * RV64F (single-precision floating point) requires a little extra work
>> due to
>> the fact i32 is not a legal type. When call lowering happens
>> post-legalisation
>> (e.g. when an intrinsic was inserted during legalisation). A bitcast from
>> f32
>> to i32 can't be introduced. There's a similar challenge for RV32D.
>> Introduce
>> target-specific DAG nodes that perform bitcast+sext for f32->i64 and
>> trunc+bitcast for i64->f32. Custom-lower ISD::BITCAST to ensure these
>> nodes
>> are selected.
>>
>> ## Questions
>> Does anyone have any reservations about this approach of having i64 as the
>> only legal type?
>>
>> Some of the target hooks could perhaps be replaced with more heroics in
>> the
>> backend. What are people's feelings here?
>>
>> # Option 2: Model 32-bit subregs
>> ## Approach
>> Define 32-bit subregisters for the GPRs that can be used in patterns and
>> instruction definitions. The following node types are potentially useful:
>> * `EXTRACT_SUBREG`: Supports getting the lower 32-bits of a 64-bit
>> register
>> * `INSERT_SUBREG`: Assumes only the lower bits are modified. Can be used
>> with
>> `IMPLICIT_DEF` to indicate that the upper bits are undefined. You can't
>> directly represent sign-extension, but you can do what Mips64 does and
>> define
>> extra patterns to catch redundant sign-extension after one of the `*W`
>> instructions.
>> * `SUBREG_TO_REG`: a constant argument asserts the value of the bits left
>> in
>> the upper portion of the register. This is perfect for zero-extension,
>> and not
>> much good for the sign-extension RISC-V performs.
>>
>> You end up with patterns like:
>>
>>     def : Pat<(anyext GPR32:$reg),
>>               (SUBREG_TO_REG (i64 0), GPR32:$reg, sub_32)>;
>> def : Pat<(trunc GPR:$reg), (EXTRACT_SUBREG GPR:$reg, sub_32)>;
>>
>> def : Pat<(add GPR32:$src, GPR32:$src2),
>> (ADDW GPR32:$src, GPR32:$src2)>;
>>
>> def : Pat<(add GPR32:$rs1, simm12_i32:$imm12),
>> (ADDIW GPR32:$rs1, simm12_i32:$imm12)>;
>>
>> ## Changes needed
>> * 32-bit subregisters must be defined. Some register classes need GPR32
>> versions, e.g. GPR, GPRNoX0, GPRC.
>>
>> * The RISCVAsmParser and RISCVDisassembler must be modified to support
>> the new
>> register classes used for the 32-bit subregs.
>>
>> * The calling convention implementation must handle promotion of i32
>> arguments/returns to i64.
>>
>> * The `*W` instructions must be defined using GPR32.
>>
>> * New `Operand<i32>` types must be defined and used in the `*W`
>> instructions.
>>
>> * When defining a variable-sized register class you specify a DefaultMode.
>> This must be set to i64 to avoid breaking RV32 compilation.
>>
>> * This gives enough to define working support for the `*W` operations,
>> but to
>> enable codegen for the other integer instructions requires either
>> duplication
>> or smarts. To write patterns using i32 you need to define a new variant
>> of the
>> instruction. TableGen changes might remove the need for this. Even with
>> such
>> support, it's not particularly desirable to write a bunch of new patterns
>> for
>> instructions other than the `*W` ones.
>>
>> I'm sure solutions are possible, but given that the i64-only approach
>> seems to work very well, I'm not sure it's worth pushing further.
>>
>> # Conclusion
>> Taking full advantage of support for variable-sized register classes and
>> sticking with i64 as the only legal integer type seems very workable and
>> is
>> definitely my preference based on the work I've done. I'd be really
>> interested
>> if anyone has any particular concerns or advice, or feedback on the
>> suggested
>> new target hooks.
>>
>> Best,
>>
>> Alex Bradbury, lowRISC CIC
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181004/fab524b1/attachment.html>