<div dir="ltr"><div dir="ltr"><div dir="ltr">Now rebased to ToT, as of now.<div><br></div><div>All that mess in divu is the same as is generated from:</div><div><br></div><div><div>long foo(){</div><div>    return 0x00000000ffffffffl;</div><div>}</div></div><div><br></div><div><div>0000000000000000 <foo>:</div><div>   0:<span style="white-space:pre">    </span>00000537          <span style="white-space:pre">      </span>lui<span style="white-space:pre">  </span>a0,0x0</div><div>   4:<span style="white-space:pre"> </span>0005059b          <span style="white-space:pre">      </span>sext.w<span style="white-space:pre">       </span>a1,a0</div><div>   8:<span style="white-space:pre">  </span>1582                <span style="white-space:pre"> </span>slli<span style="white-space:pre"> </span>a1,a1,0x20</div><div>   a:<span style="white-space:pre">     </span>357d                <span style="white-space:pre"> </span>addiw<span style="white-space:pre">        </span>a0,a0,-1</div><div>   c:<span style="white-space:pre">       </span>1502                <span style="white-space:pre"> </span>slli<span style="white-space:pre"> </span>a0,a0,0x20</div><div>   e:<span style="white-space:pre">     </span>9101                <span style="white-space:pre"> </span>srli<span style="white-space:pre"> </span>a0,a0,0x20</div><div>  10:<span style="white-space:pre">      </span>8d4d                <span style="white-space:pre"> </span>or<span style="white-space:pre">   </span>a0,a0,a1</div><div>  12:<span style="white-space:pre">        </span>8082                <span style="white-space:pre"> </span>ret</div><div></div></div><div><br></div><div>For sure that's not the best way to generate that constant!<br></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 3, 2018 at 6:48 PM, Bruce Hoult <span dir="ltr"><<a href="mailto:brucehoult@sifive.com" target="_blank">brucehoult@sifive.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Only having i64 seems cleaner to me. Of course you can still have i32 in the code up until legalisation.<div><br></div><div>I think the only real downside is you can end up with 64 bit arithmetic on things that are actually 32 bit, followed by a sext? That can be cleaned up to a *w instruction in most cases, and already is.</div><div><br></div><div>Example:</div><div><br></div><div>----------- ops.c</div><div><div>int add(int a, int b){return a+b;}</div><div>int sub(int a, int b){return a-b;}</div><div>int mul(int a, int b){return a*b;}</div><div>int div(int a, int b){return a/b;}</div><div><br></div><div>unsigned addu(unsigned a, unsigned b){return a+b;}</div><div>unsigned subu(unsigned a, unsigned b){return a-b;}</div><div>unsigned mulu(unsigned a, unsigned b){return a*b;}</div><div>unsigned divu(unsigned a, unsigned b){return a/b;}</div></div><div><div style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">-----------</div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><div>bruce@nuc:~/riscv/tests$ clang -O -c ops.c --target=riscv64 -march=rv64gc</div><div>bruce@nuc:~/riscv/tests$ riscv64-unknown-elf-objdump -d ops.o</div><div><br></div><div>ops.o:     file format elf64-littleriscv</div><div><br></div><div><br></div><div>Disassembly of section .text:</div><div><br></div><div>0000000000000000 <add>:</div><div>   0:<span style="white-space:pre-wrap">  </span>9d2d                <span style="white-space:pre-wrap">    </span>addw<span style="white-space:pre-wrap">    </span>a0,a0,a1</div><div>   2:<span style="white-space:pre-wrap">  </span>8082                <span style="white-space:pre-wrap">    </span>ret</div><div><br></div><div>0000000000000004 <sub>:</div><div>   4:<span style="white-space:pre-wrap">  </span>9d0d                <span style="white-space:pre-wrap">    </span>subw<span style="white-space:pre-wrap">    </span>a0,a0,a1</div><div>   6:<span style="white-space:pre-wrap">  </span>8082                <span style="white-space:pre-wrap">    </span>ret</div><div><br></div><div>0000000000000008 <mul>:</div><div>   8:<span style="white-space:pre-wrap">  </span>02a58533          <span style="white-space:pre-wrap"> </span>mul<span style="white-space:pre-wrap">     </span>a0,a1,a0</div><div>   c:<span style="white-space:pre-wrap">  </span>2501                <span style="white-space:pre-wrap">    </span>sext.w<span style="white-space:pre-wrap">  </span>a0,a0</div><div>   e:<span style="white-space:pre-wrap">     </span>8082                <span style="white-space:pre-wrap">    </span>ret</div><div><br></div><div>0000000000000010 <div>:</div><div>  10:<span style="white-space:pre-wrap">   </span>02b54533          <span style="white-space:pre-wrap"> </span>div<span style="white-space:pre-wrap">     </span>a0,a0,a1</div><div>  14:<span style="white-space:pre-wrap">   </span>2501                <span style="white-space:pre-wrap">    </span>sext.w<span style="white-space:pre-wrap">  </span>a0,a0</div><div>  16:<span style="white-space:pre-wrap">      </span>8082                <span style="white-space:pre-wrap">    </span>ret</div><div><br></div><div>0000000000000018 <addu>:</div><div>  18:<span style="white-space:pre-wrap">  </span>9d2d                <span style="white-space:pre-wrap">    </span>addw<span style="white-space:pre-wrap">    </span>a0,a0,a1</div><div>  1a:<span style="white-space:pre-wrap">   </span>8082                <span style="white-space:pre-wrap">    </span>ret</div><div><br></div><div>000000000000001c <subu>:</div><div>  1c:<span style="white-space:pre-wrap">  </span>9d0d                <span style="white-space:pre-wrap">    </span>subw<span style="white-space:pre-wrap">    </span>a0,a0,a1</div><div>  1e:<span style="white-space:pre-wrap">   </span>8082                <span style="white-space:pre-wrap">    </span>ret</div><div><br></div><div>0000000000000020 <mulu>:</div><div>  20:<span style="white-space:pre-wrap">  </span>02a58533          <span style="white-space:pre-wrap"> </span>mul<span style="white-space:pre-wrap">     </span>a0,a1,a0</div><div>  24:<span style="white-space:pre-wrap">   </span>2501                <span style="white-space:pre-wrap">    </span>sext.w<span style="white-space:pre-wrap">  </span>a0,a0</div><div>  26:<span style="white-space:pre-wrap">      </span>8082                <span style="white-space:pre-wrap">    </span>ret</div><div><br></div><div>0000000000000028 <divu>:</div><div>  28:<span style="white-space:pre-wrap">  </span>00000637          <span style="white-space:pre-wrap"> </span>lui<span style="white-space:pre-wrap">     </span>a2,0x0</div><div>  2c:<span style="white-space:pre-wrap">     </span>0006069b          <span style="white-space:pre-wrap"> </span>sext.w<span style="white-space:pre-wrap">  </span>a3,a2</div><div>  30:<span style="white-space:pre-wrap">      </span>1682                <span style="white-space:pre-wrap">    </span>slli<span style="white-space:pre-wrap">    </span>a3,a3,0x20</div><div>  32:<span style="white-space:pre-wrap"> </span>367d                <span style="white-space:pre-wrap">    </span>addiw<span style="white-space:pre-wrap">   </span>a2,a2,-1</div><div>  34:<span style="white-space:pre-wrap">   </span>1602                <span style="white-space:pre-wrap">    </span>slli<span style="white-space:pre-wrap">    </span>a2,a2,0x20</div><div>  36:<span style="white-space:pre-wrap"> </span>9201                <span style="white-space:pre-wrap">    </span>srli<span style="white-space:pre-wrap">    </span>a2,a2,0x20</div><div>  38:<span style="white-space:pre-wrap"> </span>8e55                <span style="white-space:pre-wrap">    </span>or<span style="white-space:pre-wrap">      </span>a2,a2,a3</div><div>  3a:<span style="white-space:pre-wrap">   </span>8df1                <span style="white-space:pre-wrap">    </span>and<span style="white-space:pre-wrap">     </span>a1,a1,a2</div><div>  3c:<span style="white-space:pre-wrap">   </span>8d71                <span style="white-space:pre-wrap">    </span>and<span style="white-space:pre-wrap">     </span>a0,a0,a2</div><div>  3e:<span style="white-space:pre-wrap">   </span>02b55533          <span style="white-space:pre-wrap"> </span>divu<span style="white-space:pre-wrap">    </span>a0,a0,a1</div><div>  42:<span style="white-space:pre-wrap">   </span>2501                <span style="white-space:pre-wrap">    </span>sext.w<span style="white-space:pre-wrap">  </span>a0,a0</div><div>  44:<span style="white-space:pre-wrap">      </span>8082                <span style="white-space:pre-wrap">    </span>ret</div><div><br></div><div>The divu is pretty bad. The add/sub/addu/subu are perfect.</div><div><br></div><div>The mul/mulu and div could all be cleaned up to use a *w instruction and drop the sext.w. Is this not happening because the information has been lost that the inputs are restricted to i32?</div><div><br></div><div>I did this test using branch "experimental" at <a href="https://github.com/brucehoult/llvm-project-20170507" target="_blank">https://github.com/<wbr>brucehoult/llvm-project-<wbr>20170507</a> which contains recent (Sep 21) LLVM ToT with lowRISC patches applied.</div></div></div></div></div></div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 3, 2018 at 2:27 AM, Alex Bradbury via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"># Purpose of this RFC<br>

This RFC describes the challenges of modelling the 64-bit RISC-V target (RV64)<br>

and details the two most obvious implementation choices:<br>

1) Having i64 as the only legal integer type<br>

2) Introducing i32 subregisters<br>

<br>

I've worked on implementing both approaches and fleshed out a pretty complete<br>

implementation of 1), which is my preferred option. With this RFC, I would<br>

welcome further feedback and insight, as well as suggestions or comments on<br>

the target-independent modifications (e.g. TargetInstrInfo hooks) I suggest as<br>

worthwhile.<br>

<br>

# Background: RV64<br>

The RISC-V instruction set is structured as a set of bases (RV32I, RV32E,<br>

RV64I, RV128I) with a series of optional extensions (e.g. M for<br>

multiply/divide, A for atomics, F+D for single+double precision floating<br>

point). It's important to note that RV64I is not just RV32I with some<br>

additional instructions, it's a completely different base where operations<br>

work on 64-bit rather than 32-bit values. RV64I also introduces 10 new<br>

instructions: ld/sd (64-bit load/store), addiw, slliw, srliw, sraiw, addw,<br>

subw, sllw, srlw, sraw. The `*W` instructions all produce a sign-extended<br>

result and take the lower 32-bits of their operands as inputs. Unlike MIPS64,<br>

there is no requirement that inputs to these `*W` are sign-extended in order<br>

to avoid unpredictable behaviour.<br>

<br>

# Background: RISC-V backend implementation.<br>

Other backends aiming to support both 32-bit and 64-bit architecture variants<br>

handle this by defining two versions of each instruction with overlapping<br>

encodings, with one marked as isCodeGenOnly.  This leads to unwanted<br>

duplication, both in terms of tablegen descriptions and throughout the C++<br>

implementation of the backend (e.g. any code checking for RISCV::ADD would<br>

also want to check for RISCV::ADD64). Fortunately we can avoid this thanks to<br>

the work Krzysztof Parzyszek contributed to support variable-sized register<br>

classes <<a href="http://lists.llvm.org/pipermail/llvm-dev/2016-September/105027.html" rel="noreferrer" target="_blank">http://lists.llvm.org/piperma<wbr>il/llvm-dev/2016-September/<wbr>105027.html</a>>.<br>

The in-tree RISC-V backend exploits this, parameterising the base instruction<br>

definitions by XLEN (the size of the general purpose registers).<br>

<br>

# Option 1: Have i64 as the only legal type<br>

## Approach<br>

Every register class in RISCVRegisterInfo.td is parameterised by XLenVT, which<br>

is i32 for RV32 and i64 for RV64. No subregisters are defined, meaning i32 is<br>

not a legal type. Patterns for the `*W` instructions tend to look something<br>

like:<br>

<br>

    def : Pat<(sext_inreg (add GPR:$rs1, GPR:$rs2), i32),<br>

              (ADDW GPR:$rs1, GPR:$rs2)>;<br>

<br>

Essentially all patterns for RV32I are also valid for RV64I.<br>

<br>

## Changes needed<br>

* Introduction of new patterns, RV64I-specific immediate materialisation<br>

<br>

* A number of SelectionDAG nodes generated from LLVM intrinsics take i32<br>

arguments and the DAG legalizer doesn't currently know how to legalize them.<br>

Promoting these arguments is trivial but requires additions to<br>

LegalizeIntegerTypes.cpp. So far I've had to do this for<br>

frameaddr/returnaddr/prefetch, but there are likely more.<br>

<br>

* The shift amount type is i64. If the shift amount operand is smaller than<br>

this, SelectionDAGBuilder will zero-extend it (changed from any-extend in<br>

rL125457). i32->i64 zero-extension is more expensive than sign-extension, but<br>

it's unnecessary anyway as only the lower 6 bits are used. Introduce<br>

TargetLowering::getExtendForSh<wbr>iftAmount which is called during<br>

SelectionDAGBuilder::visitShif<wbr>t.<br>

<br>

* When promoting setcc operands, DAGTypeLegalizer::PromoteSetCC<wbr>Operands makes<br>

the arbitrary choice to zero-extend. It is cheaper to sign-extend from i32 to<br>

i64, so introduce TargetLowering::isSExtCheaperT<wbr>hanZExt(FromTY, ToTy). For now<br>

this is only used through PromoteSetCCOperands, but perhaps there are other<br>

cases where it would be useful?<br>

<br>

* When 32-bit srl is legalized, the dag combiner will try to reduce the bits<br>

in the mask in: (srl (and val, 0xffffffff), imm) based on the knowledge of the<br>

lower bits that will be shifted out. This means a tablegen pattern matching<br>

0xffffff won't work. Custom selection code in RISCVDAGToDAGISel can recognize<br>

when this has happened and produce SRLIW.<br>

<br>

* New i64 versions of the target-specific intrinsics added to aid the lowering<br>

of part-word atomicrmw must be defined.<br>

<br>

* RV64F (single-precision floating point) requires a little extra work due to<br>

the fact i32 is not a legal type. When call lowering happens post-legalisation<br>

(e.g. when an intrinsic was inserted during legalisation). A bitcast from f32<br>

to i32 can't be introduced. There's a similar challenge for RV32D. Introduce<br>

target-specific DAG nodes that perform bitcast+sext for f32->i64 and<br>

trunc+bitcast for i64->f32. Custom-lower ISD::BITCAST to ensure these nodes<br>

are selected.<br>

<br>

## Questions<br>

Does anyone have any reservations about this approach of having i64 as the<br>

only legal type?<br>

<br>

Some of the target hooks could perhaps be replaced with more heroics in the<br>

backend. What are people's feelings here?<br>

<br>

# Option 2: Model 32-bit subregs<br>

## Approach<br>

Define 32-bit subregisters for the GPRs that can be used in patterns and<br>

instruction definitions. The following node types are potentially useful:<br>

* `EXTRACT_SUBREG`: Supports getting the lower 32-bits of a 64-bit register<br>

* `INSERT_SUBREG`: Assumes only the lower bits are modified. Can be used with<br>

`IMPLICIT_DEF` to indicate that the upper bits are undefined. You can't<br>

directly represent sign-extension, but you can do what Mips64 does and define<br>

extra patterns to catch redundant sign-extension after one of the `*W`<br>

instructions.<br>

* `SUBREG_TO_REG`: a constant argument asserts the value of the bits left in<br>

the upper portion of the register. This is perfect for zero-extension, and not<br>

much good for the sign-extension RISC-V performs.<br>

<br>

You end up with patterns like:<br>

<br>

    def : Pat<(anyext GPR32:$reg),<br>

              (SUBREG_TO_REG (i64 0), GPR32:$reg, sub_32)>;<br>

def : Pat<(trunc GPR:$reg), (EXTRACT_SUBREG GPR:$reg, sub_32)>;<br>

<br>

def : Pat<(add GPR32:$src, GPR32:$src2),<br>

(ADDW GPR32:$src, GPR32:$src2)>;<br>

<br>

def : Pat<(add GPR32:$rs1, simm12_i32:$imm12),<br>

(ADDIW GPR32:$rs1, simm12_i32:$imm12)>;<br>

<br>

## Changes needed<br>

* 32-bit subregisters must be defined. Some register classes need GPR32<br>

versions, e.g. GPR, GPRNoX0, GPRC.<br>

<br>

* The RISCVAsmParser and RISCVDisassembler must be modified to support the new<br>

register classes used for the 32-bit subregs.<br>

<br>

* The calling convention implementation must handle promotion of i32<br>

arguments/returns to i64.<br>

<br>

* The `*W` instructions must be defined using GPR32.<br>

<br>

* New `Operand<i32>` types must be defined and used in the `*W` instructions.<br>

<br>

* When defining a variable-sized register class you specify a DefaultMode.<br>

This must be set to i64 to avoid breaking RV32 compilation.<br>

<br>

* This gives enough to define working support for the `*W` operations, but to<br>

enable codegen for the other integer instructions requires either duplication<br>

or smarts. To write patterns using i32 you need to define a new variant of the<br>

instruction. TableGen changes might remove the need for this. Even with such<br>

support, it's not particularly desirable to write a bunch of new patterns for<br>

instructions other than the `*W` ones.<br>

<br>

I'm sure solutions are possible, but given that the i64-only approach<br>

seems to work very well, I'm not sure it's worth pushing further.<br>

<br>

# Conclusion<br>

Taking full advantage of support for variable-sized register classes and<br>

sticking with i64 as the only legal integer type seems very workable and is<br>

definitely my preference based on the work I've done. I'd be really interested<br>

if anyone has any particular concerns or advice, or feedback on the suggested<br>

new target hooks.<br>

<br>

Best,<br>

<br>

Alex Bradbury, lowRISC CIC<br>

______________________________<wbr>_________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

</blockquote></div><br></div>

</div></div></blockquote></div><br></div>