[PATCH] D41949: [RISCV] implement li pseudo instruction

Tue Feb 27 11:14:56 PST 2018

asb added a subscriber: kparzysz.
asb added a comment.

Hi Mario, sorry for the delay in more review comments. The vast majority of my comments are very minor nits - I think the main one is to take a closer look at the comments for emitRISCVLoadImm. Cleaning that up should make it easier to review that bit of logic. Thanks!

================
Comment at: lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp:965

+bool RISCVAsmParser::processInstruction(MCInst &Inst, SMLoc IDLoc,
+                                        MCStreamer &Out) {
----------------
It would be nice to add a comment documenting the purpose of processInstruction

================
Comment at: lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp:977
+    if (!isRV64())
+      Imm = (Imm << 32) >> 32;
+    emitRISCVLoadImm(Reg, Imm, Out, STI);
----------------
You might as well use SignExtend64 from MathExtras here.

================
Comment at: lib/Target/RISCV/MCTargetDesc/CMakeLists.txt:8
   RISCVMCTargetDesc.cpp
+  RISCVMCPseudoExpansion.cpp
 )
----------------
Sort alphabetically

================
Comment at: lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp:26
+
+namespace {
+
----------------
LLVM coding standards suggest just using `static` for single functions https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

================
Comment at: lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp:30
+// emitted and is not solely the result of zero extension due to SLLI.
+int findFirstSetBit(int64_t Value, int StartOffset = 0) {
+  Value >>= StartOffset;
----------------
Can we not avoid this and use findFirstSet from MathExtras? Unless I'm missing something, you could just mask out the first 12 bits at the call-site and so avoid the need for the 'StartOffset' parameter.

================
Comment at: lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp:53
+    int64_t Hi20 = ((Value + 0x800) >> 12) & 0xFFFFF;
+    int64_t Lo12 = SignExtend64<12>(Value & 0xFFF);
+    unsigned int SrcReg = RISCV::X0;
----------------
No need to mask the value passed to SignExtend64. Although it does no harm, I'd recommend changing to `SignExtend64<12>(Value)`.

================
Comment at: lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp:54
+    int64_t Lo12 = SignExtend64<12>(Value & 0xFFF);
+    unsigned int SrcReg = RISCV::X0;
+
----------------
Just `unsigned` is more usual in the LLVM tree

================
Comment at: lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp:63
+    if (Lo12 || Hi20 == 0) {
+      auto AddiOpcode =
+          STI->hasFeature(RISCV::Feature64Bit) ? RISCV::ADDIW : RISCV::ADDI;
----------------
LLVM is somewhat conservative when it comes to the use of auto. Given that there's not much saving in space, I'd be explicit and use `unsigned` here.

================
Comment at: lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp:71
+  }
+  assert(STI->hasFeature(RISCV::Feature64Bit));
+
----------------
Should have something like `&& "Target must be 64-bit to support a >32-bit constant"` or whatever phrasing you prefer

================
Comment at: lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp:73-74
+
+  // If more than 32 bits have to be emitted, schedule an ADDI instruction which
+  // handles 12 bits and deal with the remaining bits recursively.
+  int64_t Hi = (Value + 0x800);
----------------
The comment describing how emitting 32-bit constants works was fantastic - it would be nice to expand this comment to a similar level of detail. The comment doesn't quite seem to match the behaviour either, as in the implementation emitRISCVLoadImm is called recursively before emitting any other instructions.

================
Comment at: lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.cpp:77
+  int LsbIndex = findFirstSetBit(Hi, 12);
+  Hi >>= LsbIndex;
+
----------------
Do you rely on this being an arithmetic right shift? I'm not 100% sure if the  C++ standard guarantees that.

================
Comment at: lib/Target/RISCV/MCTargetDesc/RISCVMCPseudoExpansion.h:24
+
+void emitRISCVLoadImm(unsigned int DestReg, int64_t Value, MCStreamer &Out,
+                      const MCSubtargetInfo *STI);
----------------
Write this as `unsigned DestReg`.

================
Comment at: lib/Target/RISCV/RISCVInstrInfo.td:401
+// PseudoLI is probably not the best idea anyway given that up to
+// 8 32-bit instructions are needed to generate an arbitrary 64-bit immediate.
+let hasSideEffects = 0, mayLoad = 0, mayStore = 0, Size = 8,
----------------
niosHD wrote:
> efriedma wrote:
> > I'm not sure it's a good idea to make code generation use this pseudo-instruction; you'll miss optimization opportunities, like MachineCSE of lui instructions.
> Indeed, me neither. I also raised this concern in one of our weekly sync up calls and the consensus was to go with the Pseudo instruction for now. However, I am definitely not opposed to expand the respective immediate loads early into machine instructions.
If I recall correctly, @kparzysz reported that based on his experience there was probably little to gain.

================
Comment at: test/MC/RISCV/rv64i-aliases-valid.s:94
+# CHECK-EXPAND: addi t4, t4, -272
+li t4, 0x123456789abcdef0
+# CHECK-EXPAND: addiw t5, zero, -1
----------------
niosHD wrote:
> efriedma wrote:
> > This seems a little unfortunate... given you can load an arbitrary 32-bit immediate in two instructions, you should be able to load a 64-bit immediate in six instructions ("hi << 32 | lo").  But I guess that requires a second register?
> Correct, with a second register, 6 instructions would be sufficient. Unfortunately, using a second register is, at least for the assembler, not an option. On the other hand, during codegen I think we should invest these two (virtual) registers. Additionally, in the long term, loading the constant from a constant pool should be evaluated given that it could be even more efficient. (assuming RV64I: 64-bit constant + 1 load + at most 2 instructions for the address calculation)
For what it's worth, the RV64I codegen patches (not yet merged) do just use two registers and six instructions - but this is done in a dumb way that fails to recogise cases where <6 instructions can be used. Fully agree that it will be worth looking at using the constant pool

https://reviews.llvm.org/D41949