[Lldb-commits] [lldb] [lldb] Add SubtargetFeatures to ArchSpec (PR #173046)

via lldb-commits lldb-commits at lists.llvm.org
Mon Feb 9 08:37:23 PST 2026


daniilavdeev wrote:

I've investigated the extensions compatibility problem, and my analysis has revealed some subtle details about how riscv extensions are processed in the disassembler.

As @lenary already mentioned, pairs of conflicting extensions do exist in RISC-V. The most notable example is the P / V pair, where P extension, to my understanding, would introduce dozens of encodings that conflict with the V extension. However, since P extension aren't yet supported in LLVM, I couldn't use it for this investigation.

Other examples of incompatible extensions include:
- C with D / Zcmp
- C with D / Zcmt

The issue stems from the `C.FSDSP` instruction, which is only produced when the target supports both C and D extensions. For certain immediate and rs2 values, this instruction has the same encoding as Zcmp/Zcmt instructions like `CM.POP` and `CM.MVSA01`.

In order to observe how LLDB would handle such conflicting extensions, I attempted to create an executable containing C, D, and Zcmp instructions. Clang rightfully rejects this configuration via `-march`, failing with: `'zcmp' extension is incompatible with 'c' extension when 'd' extension is enabled`. However, I was able to work around this by compiling one file with `rv64imad_zcmp` and another with `rv64gc`, then linking them together. The resulting arch string in `.riscv.attributes` becomes `rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zmmul1p0_zaamo1p0_zalrsc1p0_zca1p0_zcmp1p0` — containing all three conflicting extensions: C, D, and Zcmp.

To my surprise, when disassembling the resulting executable, LLDB and llvm-objdump produce different outputs:

- **LLDB** correctly disassembles `function_with_zcd_instructions`. However, `function_with_zcmp_extension` is completely wrong - it doesn't contain any Zcmp instructions and instead shows incorrect C/D extension instructions.

- **llvm-objdump** correctly disassembles `function_with_zcmp_extension`, but makes an error in `function_with_zcd_instructions`: instead of `fsd fa1, 0x18(sp)`, it displays `cm.mvsa01 s0, s3`. This is the expected conflict, as both instructions share the same encoding `ac2e`, and seems like Zcmp takes priority over C/D extensions.

Both LLDB and llvm-objdump delegate disassembly to `MCDisassembler`, so these divergent results were unexpected to me. The futher investigation revealed the explanation:

- **llvm-objdump** parses the arch string from `.riscv.attributes` using `parseNormalizedArchString`, which is designed for pre-encoded arch strings and performs minimal validation (no consistency checks).

- **LLDB** uses `parseArchString`, which is intended for user input processing and applies additional consistency checks (via `postProcessAndChecking`). When LLDB encounters the C+D+Zcmp combination, `parseArchString` returns the error: `'zcmp' extension is incompatible with 'c' extension when 'd' extension is enabled`. LLDB then falls back to using only hardcoded extensions, making it unable to disassemble Zcmp instructions.

Consequently, replacing the `parseArchString` call with `parseNormalizedArchString` in this patch makes LLDB produce disassembly output consistent with llvm-objdump.

Regarding extension priority, the disassembler processes 16-bit instructions in the following priority order (highest to lowest):
```
RISCV32Only_16 table
RVZicfiss table (Shadow Stack)
Zcmt table
Zcmp table                    <-- Zcmp is here
Qualcomm custom tables
WCH custom table
RISCV_C table                 <-- C extension is here
```

This explains why llvm-objdump incorrectly decodes `ac2e` as `CM.MVSA01` - it matches the Zcmp instruction first and never checks the C extension.

At this point I would propose to use `parseNormalizedArchString` for consistency with llvm-objdump's behavior. However, I agree it is still necessary to provide a clear warning when `.riscv.attributes` contains an invalid or inconsistent arch string, thus I am considering to call `parseArchString` in order to detect inconsistencies and warn users when the disassembler output may be invalid:

```cpp
auto normalized_isa_info = llvm::RISCVISAInfo::parseNormalizedArchString(
    std::get<llvm::StringRef>(*value_or_opt));

// Set ArchSpec::m_subtarget_features with normalized_isa_info

auto isa_info = llvm::RISCVISAInfo::parseArchString(
    std::get<llvm::StringRef>(*value_or_opt),
    /* EnableExperimentalExtension=*/true);

// Report a warning to the user if isa_info contains errors
```

This approach allows LLDB to disassemble all present extensions while still alerting users to potential conflicts.

@DavidSpickett, @lenary, @bulbazord what are your thoughts on this approach? If you’re interested, I can provide the exact source code and the corresponding disassembly outputs.

https://github.com/llvm/llvm-project/pull/173046


More information about the lldb-commits mailing list