[llvm] [RISCV] Align MCOperandPredicates with AsmParser (PR #146184)

Tue Jul 1 22:06:58 PDT 2025

lenary wrote:

I do see that only RISC-V and CSKY use `MCOp.isBareSymbolRef()`, and can see that we want to be better about symbol expressions.

I understand that you don't like how the asm parser reports errors right now, and that it could be deferred (like gnu as does), but this might give up some information in the messages. We also have to defer errors when we move to treating branch immediates as absolute addresses (like gnu as does), which I have the start of a prototype for in #133555 (which I think you might be recalling?). I do want to push on this work in the future. I think some people like the extra information, and I do think it's a drawback that the error messages get worse when deferred.

However, I am unable to start this major refactor now, especially with the release branch happening in about a week or so, and this aims to solve a crash.

---

I think we still have a bit of a problem for compression here (maybe not for aliases?), but I also know that only two of the compression patterns are the problematic ones - unfortunately we think they are pretty important when someone writes a `qc.e.li` with a small constant immediate.

As the code flow currently works, the following happens:
1. Either `RISCVAsmPrinter::EmitToStreamer` or `RISCVAsmParser::emitToStreamer` are called, depending on whether the instruction came from codegen or from the assembler.
2. Both of these attempt to compress the instruction using `RISCVRVC::compress` before calling `emitInstruction`. `RISCVRVC::compress` will use the `MCOperandPredicate` to check whether an operand is compressible, and if so, just compress it. Right now, these contain a check that says "if there's a bare symbol, it's ok to compress".
3. `emitInstruction` (relaxall=false) eventually tries to emit the instruction as data, either to the existing fragment or to a new, relaxable, one (the latter if the instruction may need relaxation)
4. `RISCVMCCodeEmitter::encodeInstruction` is called, which calls `getBinaryCodeForInstr`, which eventually calls `RISCVMCCodeEmitter::getImmOpValue` on the immediates.
5. This fails the assert at the end, because we don't know the right fixup for a CI-type instruction.

It sounds like a better short-term solution from your PoV is:
- Abandon this change.
- Add a CI-format fixup, which we can relax, but we cannot emit a relocation for.
- Add the right logic to `mayNeedRelaxation` and `relaxInstruction` so that `c.li a0, symbol` is turned back into `qc.e.li a0, symbol`.

This approximately echoes how we solve this for almost everywhere else, except that everywhere else we already have a fixup and that fixup has a corresponding relocation. I'm not sure we really want a fixup which doesn't have a corresponding relocation, but at least that direction is less intrusive on target-independent code.

---

I'll upload that as a separate patch, and we can come back to the right way to sort out this issue.

https://github.com/llvm/llvm-project/pull/146184