[Lldb-commits] [lldb] Support disassembling RISC-V proprietary instructions (PR #145793)

Thu Jun 26 18:46:51 PDT 2025

lenary wrote:

To also respond to something earlier in the thread, where there is a little complexity:

> The missing part is knowing how to split up that encoding value isn't it. For AArch64 you'd just print it because we only have 32-bit, Intel you would roll dice to randomly decide what to do and RISC-V we have these 2/3 formats.

One "weird" bit of the approach is that we actually still rely on LLVM's MC-layer to understand the length of the instruction. RISC-V currently has only 2 ratified lengths (16 and 32-bit), but describes an encoding scheme for longer instructions which both GNU objdump and LLVM's MC-layer understand when disassembling. RISC-V does not, at the moment, have a maximum length of instruction, but our callback only implements the scheme up to 176-bit long instructions. On the assembler side, we can only assemble up to 64-bit instructions, so we ensure our teams keep to this lower limit.

There are two relevant callbacks on MC's `MCDisassembler` interface:
- `MCDisassembler::getInstruction` which is the main interface, and interprets the `uint64_t &Size` whether it decodes an instruction or not. This is the only callback RISC-V implements.
- `MCDisassembler::suggestBytesToSkip`, which the Arm/AArch64 backends use for realigning the disassembly flow. We maybe should implement this given we know the instruction alignment in RISC-V is either 2 or 4, but we don't at the moment.

https://github.com/llvm/llvm-project/pull/145793