<div dir="ltr"><div dir="ltr"><div dir="ltr">Now rebased to ToT, as of now.<div><br></div><div>All that mess in divu is the same as is generated from:</div><div><br></div><div><div>long foo(){</div><div> return 0x00000000ffffffffl;</div><div>}</div></div><div><br></div><div><div>0000000000000000 <foo>:</div><div> 0:<span style="white-space:pre"> </span>00000537 <span style="white-space:pre"> </span>lui<span style="white-space:pre"> </span>a0,0x0</div><div> 4:<span style="white-space:pre"> </span>0005059b <span style="white-space:pre"> </span>sext.w<span style="white-space:pre"> </span>a1,a0</div><div> 8:<span style="white-space:pre"> </span>1582 <span style="white-space:pre"> </span>slli<span style="white-space:pre"> </span>a1,a1,0x20</div><div> a:<span style="white-space:pre"> </span>357d <span style="white-space:pre"> </span>addiw<span style="white-space:pre"> </span>a0,a0,-1</div><div> c:<span style="white-space:pre"> </span>1502 <span style="white-space:pre"> </span>slli<span style="white-space:pre"> </span>a0,a0,0x20</div><div> e:<span style="white-space:pre"> </span>9101 <span style="white-space:pre"> </span>srli<span style="white-space:pre"> </span>a0,a0,0x20</div><div> 10:<span style="white-space:pre"> </span>8d4d <span style="white-space:pre"> </span>or<span style="white-space:pre"> </span>a0,a0,a1</div><div> 12:<span style="white-space:pre"> </span>8082 <span style="white-space:pre"> </span>ret</div><div></div></div><div><br></div><div>For sure that's not the best way to generate that constant!<br></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 3, 2018 at 6:48 PM, Bruce Hoult <span dir="ltr"><<a href="mailto:brucehoult@sifive.com" target="_blank">brucehoult@sifive.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Only having i64 seems cleaner to me. Of course you can still have i32 in the code up until legalisation.<div><br></div><div>I think the only real downside is you can end up with 64 bit arithmetic on things that are actually 32 bit, followed by a sext? That can be cleaned up to a *w instruction in most cases, and already is.</div><div><br></div><div>Example:</div><div><br></div><div>----------- ops.c</div><div><div>int add(int a, int b){return a+b;}</div><div>int sub(int a, int b){return a-b;}</div><div>int mul(int a, int b){return a*b;}</div><div>int div(int a, int b){return a/b;}</div><div><br></div><div>unsigned addu(unsigned a, unsigned b){return a+b;}</div><div>unsigned subu(unsigned a, unsigned b){return a-b;}</div><div>unsigned mulu(unsigned a, unsigned b){return a*b;}</div><div>unsigned divu(unsigned a, unsigned b){return a/b;}</div></div><div><div style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">-----------</div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><div>bruce@nuc:~/riscv/tests$ clang -O -c ops.c --target=riscv64 -march=rv64gc</div><div>bruce@nuc:~/riscv/tests$ riscv64-unknown-elf-objdump -d ops.o</div><div><br></div><div>ops.o: file format elf64-littleriscv</div><div><br></div><div><br></div><div>Disassembly of section .text:</div><div><br></div><div>0000000000000000 <add>:</div><div> 0:<span style="white-space:pre-wrap"> </span>9d2d <span style="white-space:pre-wrap"> </span>addw<span style="white-space:pre-wrap"> </span>a0,a0,a1</div><div> 2:<span style="white-space:pre-wrap"> </span>8082 <span style="white-space:pre-wrap"> </span>ret</div><div><br></div><div>0000000000000004 <sub>:</div><div> 4:<span style="white-space:pre-wrap"> </span>9d0d <span style="white-space:pre-wrap"> </span>subw<span style="white-space:pre-wrap"> </span>a0,a0,a1</div><div> 6:<span style="white-space:pre-wrap"> </span>8082 <span style="white-space:pre-wrap"> </span>ret</div><div><br></div><div>0000000000000008 <mul>:</div><div> 8:<span style="white-space:pre-wrap"> </span>02a58533 <span style="white-space:pre-wrap"> </span>mul<span style="white-space:pre-wrap"> </span>a0,a1,a0</div><div> c:<span style="white-space:pre-wrap"> </span>2501 <span style="white-space:pre-wrap"> </span>sext.w<span style="white-space:pre-wrap"> </span>a0,a0</div><div> e:<span style="white-space:pre-wrap"> </span>8082 <span style="white-space:pre-wrap"> </span>ret</div><div><br></div><div>0000000000000010 <div>:</div><div> 10:<span style="white-space:pre-wrap"> </span>02b54533 <span style="white-space:pre-wrap"> </span>div<span style="white-space:pre-wrap"> </span>a0,a0,a1</div><div> 14:<span style="white-space:pre-wrap"> </span>2501 <span style="white-space:pre-wrap"> </span>sext.w<span style="white-space:pre-wrap"> </span>a0,a0</div><div> 16:<span style="white-space:pre-wrap"> </span>8082 <span style="white-space:pre-wrap"> </span>ret</div><div><br></div><div>0000000000000018 <addu>:</div><div> 18:<span style="white-space:pre-wrap"> </span>9d2d <span style="white-space:pre-wrap"> </span>addw<span style="white-space:pre-wrap"> </span>a0,a0,a1</div><div> 1a:<span style="white-space:pre-wrap"> </span>8082 <span style="white-space:pre-wrap"> </span>ret</div><div><br></div><div>000000000000001c <subu>:</div><div> 1c:<span style="white-space:pre-wrap"> </span>9d0d <span style="white-space:pre-wrap"> </span>subw<span style="white-space:pre-wrap"> </span>a0,a0,a1</div><div> 1e:<span style="white-space:pre-wrap"> </span>8082 <span style="white-space:pre-wrap"> </span>ret</div><div><br></div><div>0000000000000020 <mulu>:</div><div> 20:<span style="white-space:pre-wrap"> </span>02a58533 <span style="white-space:pre-wrap"> </span>mul<span style="white-space:pre-wrap"> </span>a0,a1,a0</div><div> 24:<span style="white-space:pre-wrap"> </span>2501 <span style="white-space:pre-wrap"> </span>sext.w<span style="white-space:pre-wrap"> </span>a0,a0</div><div> 26:<span style="white-space:pre-wrap"> </span>8082 <span style="white-space:pre-wrap"> </span>ret</div><div><br></div><div>0000000000000028 <divu>:</div><div> 28:<span style="white-space:pre-wrap"> </span>00000637 <span style="white-space:pre-wrap"> </span>lui<span style="white-space:pre-wrap"> </span>a2,0x0</div><div> 2c:<span style="white-space:pre-wrap"> </span>0006069b <span style="white-space:pre-wrap"> </span>sext.w<span style="white-space:pre-wrap"> </span>a3,a2</div><div> 30:<span style="white-space:pre-wrap"> </span>1682 <span style="white-space:pre-wrap"> </span>slli<span style="white-space:pre-wrap"> </span>a3,a3,0x20</div><div> 32:<span style="white-space:pre-wrap"> </span>367d <span style="white-space:pre-wrap"> </span>addiw<span style="white-space:pre-wrap"> </span>a2,a2,-1</div><div> 34:<span style="white-space:pre-wrap"> </span>1602 <span style="white-space:pre-wrap"> </span>slli<span style="white-space:pre-wrap"> </span>a2,a2,0x20</div><div> 36:<span style="white-space:pre-wrap"> </span>9201 <span style="white-space:pre-wrap"> </span>srli<span style="white-space:pre-wrap"> </span>a2,a2,0x20</div><div> 38:<span style="white-space:pre-wrap"> </span>8e55 <span style="white-space:pre-wrap"> </span>or<span style="white-space:pre-wrap"> </span>a2,a2,a3</div><div> 3a:<span style="white-space:pre-wrap"> </span>8df1 <span style="white-space:pre-wrap"> </span>and<span style="white-space:pre-wrap"> </span>a1,a1,a2</div><div> 3c:<span style="white-space:pre-wrap"> </span>8d71 <span style="white-space:pre-wrap"> </span>and<span style="white-space:pre-wrap"> </span>a0,a0,a2</div><div> 3e:<span style="white-space:pre-wrap"> </span>02b55533 <span style="white-space:pre-wrap"> </span>divu<span style="white-space:pre-wrap"> </span>a0,a0,a1</div><div> 42:<span style="white-space:pre-wrap"> </span>2501 <span style="white-space:pre-wrap"> </span>sext.w<span style="white-space:pre-wrap"> </span>a0,a0</div><div> 44:<span style="white-space:pre-wrap"> </span>8082 <span style="white-space:pre-wrap"> </span>ret</div><div><br></div><div>The divu is pretty bad. The add/sub/addu/subu are perfect.</div><div><br></div><div>The mul/mulu and div could all be cleaned up to use a *w instruction and drop the sext.w. Is this not happening because the information has been lost that the inputs are restricted to i32?</div><div><br></div><div>I did this test using branch "experimental" at <a href="https://github.com/brucehoult/llvm-project-20170507" target="_blank">https://github.com/<wbr>brucehoult/llvm-project-<wbr>20170507</a> which contains recent (Sep 21) LLVM ToT with lowRISC patches applied.</div></div></div></div></div></div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 3, 2018 at 2:27 AM, Alex Bradbury via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"># Purpose of this RFC<br>
This RFC describes the challenges of modelling the 64-bit RISC-V target (RV64)<br>
and details the two most obvious implementation choices:<br>
1) Having i64 as the only legal integer type<br>
2) Introducing i32 subregisters<br>
<br>
I've worked on implementing both approaches and fleshed out a pretty complete<br>
implementation of 1), which is my preferred option. With this RFC, I would<br>
welcome further feedback and insight, as well as suggestions or comments on<br>
the target-independent modifications (e.g. TargetInstrInfo hooks) I suggest as<br>
worthwhile.<br>
<br>
# Background: RV64<br>
The RISC-V instruction set is structured as a set of bases (RV32I, RV32E,<br>
RV64I, RV128I) with a series of optional extensions (e.g. M for<br>
multiply/divide, A for atomics, F+D for single+double precision floating<br>
point). It's important to note that RV64I is not just RV32I with some<br>
additional instructions, it's a completely different base where operations<br>
work on 64-bit rather than 32-bit values. RV64I also introduces 10 new<br>
instructions: ld/sd (64-bit load/store), addiw, slliw, srliw, sraiw, addw,<br>
subw, sllw, srlw, sraw. The `*W` instructions all produce a sign-extended<br>
result and take the lower 32-bits of their operands as inputs. Unlike MIPS64,<br>
there is no requirement that inputs to these `*W` are sign-extended in order<br>
to avoid unpredictable behaviour.<br>
<br>
# Background: RISC-V backend implementation.<br>
Other backends aiming to support both 32-bit and 64-bit architecture variants<br>
handle this by defining two versions of each instruction with overlapping<br>
encodings, with one marked as isCodeGenOnly. This leads to unwanted<br>
duplication, both in terms of tablegen descriptions and throughout the C++<br>
implementation of the backend (e.g. any code checking for RISCV::ADD would<br>
also want to check for RISCV::ADD64). Fortunately we can avoid this thanks to<br>
the work Krzysztof Parzyszek contributed to support variable-sized register<br>
classes <<a href="http://lists.llvm.org/pipermail/llvm-dev/2016-September/105027.html" rel="noreferrer" target="_blank">http://lists.llvm.org/piperma<wbr>il/llvm-dev/2016-September/<wbr>105027.html</a>>.<br>
The in-tree RISC-V backend exploits this, parameterising the base instruction<br>
definitions by XLEN (the size of the general purpose registers).<br>
<br>
# Option 1: Have i64 as the only legal type<br>
## Approach<br>
Every register class in RISCVRegisterInfo.td is parameterised by XLenVT, which<br>
is i32 for RV32 and i64 for RV64. No subregisters are defined, meaning i32 is<br>
not a legal type. Patterns for the `*W` instructions tend to look something<br>
like:<br>
<br>
def : Pat<(sext_inreg (add GPR:$rs1, GPR:$rs2), i32),<br>
(ADDW GPR:$rs1, GPR:$rs2)>;<br>
<br>
Essentially all patterns for RV32I are also valid for RV64I.<br>
<br>
## Changes needed<br>
* Introduction of new patterns, RV64I-specific immediate materialisation<br>
<br>
* A number of SelectionDAG nodes generated from LLVM intrinsics take i32<br>
arguments and the DAG legalizer doesn't currently know how to legalize them.<br>
Promoting these arguments is trivial but requires additions to<br>
LegalizeIntegerTypes.cpp. So far I've had to do this for<br>
frameaddr/returnaddr/prefetch, but there are likely more.<br>
<br>
* The shift amount type is i64. If the shift amount operand is smaller than<br>
this, SelectionDAGBuilder will zero-extend it (changed from any-extend in<br>
rL125457). i32->i64 zero-extension is more expensive than sign-extension, but<br>
it's unnecessary anyway as only the lower 6 bits are used. Introduce<br>
TargetLowering::getExtendForSh<wbr>iftAmount which is called during<br>
SelectionDAGBuilder::visitShif<wbr>t.<br>
<br>
* When promoting setcc operands, DAGTypeLegalizer::PromoteSetCC<wbr>Operands makes<br>
the arbitrary choice to zero-extend. It is cheaper to sign-extend from i32 to<br>
i64, so introduce TargetLowering::isSExtCheaperT<wbr>hanZExt(FromTY, ToTy). For now<br>
this is only used through PromoteSetCCOperands, but perhaps there are other<br>
cases where it would be useful?<br>
<br>
* When 32-bit srl is legalized, the dag combiner will try to reduce the bits<br>
in the mask in: (srl (and val, 0xffffffff), imm) based on the knowledge of the<br>
lower bits that will be shifted out. This means a tablegen pattern matching<br>
0xffffff won't work. Custom selection code in RISCVDAGToDAGISel can recognize<br>
when this has happened and produce SRLIW.<br>
<br>
* New i64 versions of the target-specific intrinsics added to aid the lowering<br>
of part-word atomicrmw must be defined.<br>
<br>
* RV64F (single-precision floating point) requires a little extra work due to<br>
the fact i32 is not a legal type. When call lowering happens post-legalisation<br>
(e.g. when an intrinsic was inserted during legalisation). A bitcast from f32<br>
to i32 can't be introduced. There's a similar challenge for RV32D. Introduce<br>
target-specific DAG nodes that perform bitcast+sext for f32->i64 and<br>
trunc+bitcast for i64->f32. Custom-lower ISD::BITCAST to ensure these nodes<br>
are selected.<br>
<br>
## Questions<br>
Does anyone have any reservations about this approach of having i64 as the<br>
only legal type?<br>
<br>
Some of the target hooks could perhaps be replaced with more heroics in the<br>
backend. What are people's feelings here?<br>
<br>
# Option 2: Model 32-bit subregs<br>
## Approach<br>
Define 32-bit subregisters for the GPRs that can be used in patterns and<br>
instruction definitions. The following node types are potentially useful:<br>
* `EXTRACT_SUBREG`: Supports getting the lower 32-bits of a 64-bit register<br>
* `INSERT_SUBREG`: Assumes only the lower bits are modified. Can be used with<br>
`IMPLICIT_DEF` to indicate that the upper bits are undefined. You can't<br>
directly represent sign-extension, but you can do what Mips64 does and define<br>
extra patterns to catch redundant sign-extension after one of the `*W`<br>
instructions.<br>
* `SUBREG_TO_REG`: a constant argument asserts the value of the bits left in<br>
the upper portion of the register. This is perfect for zero-extension, and not<br>
much good for the sign-extension RISC-V performs.<br>
<br>
You end up with patterns like:<br>
<br>
def : Pat<(anyext GPR32:$reg),<br>
(SUBREG_TO_REG (i64 0), GPR32:$reg, sub_32)>;<br>
def : Pat<(trunc GPR:$reg), (EXTRACT_SUBREG GPR:$reg, sub_32)>;<br>
<br>
def : Pat<(add GPR32:$src, GPR32:$src2),<br>
(ADDW GPR32:$src, GPR32:$src2)>;<br>
<br>
def : Pat<(add GPR32:$rs1, simm12_i32:$imm12),<br>
(ADDIW GPR32:$rs1, simm12_i32:$imm12)>;<br>
<br>
## Changes needed<br>
* 32-bit subregisters must be defined. Some register classes need GPR32<br>
versions, e.g. GPR, GPRNoX0, GPRC.<br>
<br>
* The RISCVAsmParser and RISCVDisassembler must be modified to support the new<br>
register classes used for the 32-bit subregs.<br>
<br>
* The calling convention implementation must handle promotion of i32<br>
arguments/returns to i64.<br>
<br>
* The `*W` instructions must be defined using GPR32.<br>
<br>
* New `Operand<i32>` types must be defined and used in the `*W` instructions.<br>
<br>
* When defining a variable-sized register class you specify a DefaultMode.<br>
This must be set to i64 to avoid breaking RV32 compilation.<br>
<br>
* This gives enough to define working support for the `*W` operations, but to<br>
enable codegen for the other integer instructions requires either duplication<br>
or smarts. To write patterns using i32 you need to define a new variant of the<br>
instruction. TableGen changes might remove the need for this. Even with such<br>
support, it's not particularly desirable to write a bunch of new patterns for<br>
instructions other than the `*W` ones.<br>
<br>
I'm sure solutions are possible, but given that the i64-only approach<br>
seems to work very well, I'm not sure it's worth pushing further.<br>
<br>
# Conclusion<br>
Taking full advantage of support for variable-sized register classes and<br>
sticking with i64 as the only legal integer type seems very workable and is<br>
definitely my preference based on the work I've done. I'd be really interested<br>
if anyone has any particular concerns or advice, or feedback on the suggested<br>
new target hooks.<br>
<br>
Best,<br>
<br>
Alex Bradbury, lowRISC CIC<br>
______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>