[PATCH] D98916: [ARM] support symbolic expression as immediate in memory instructions

Wed Mar 24 15:15:33 PDT 2021

nickdesaulniers added inline comments.

================
Comment at: llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp:1099-1117
   bool isThumbMemPC() const {
     int64_t Val = 0;
     if (isImm()) {
       if (isa<MCSymbolRefExpr>(Imm.Val)) return true;
       const MCConstantExpr *CE = dyn_cast<MCConstantExpr>(Imm.Val);
       if (!CE) return false;
       Val = CE->getValue();
----------------
jcai19 wrote:
> nickdesaulniers wrote:
> > Something still really bugs me about these `is*` functions; perhaps I'm still hung up on D97568.
> > 
> > I understand that we can't validate immediates when expressions are used until post fixup; doing such validation of operands before fixups are performs seems like a major phase ordering issue in LLVM's assemblers.
> > 
> > So for a lot of these `is*` functions, if we don't have an immediate operand, we can't tell if it fits whatever constraints we're checking.  This change makes these functions return `true` as in "maybe."  Rather than return a type that's a tri-state for "definitely yes," "definitely no," or "can't tell until post fixups."  Then other `is*` functions don't have that added logic, like `isFPImm` below, `isLEOffset` and `isSignedOffset` above, etc.  I'm very concerned about `isARMBranchTarget`/`isThumbBranchTarget`.
> > 
> > I almost wonder if fixing the phase ordering in LLVM will fix this more generally for all instructions/operands or even additional ISAs.  Perhaps let's hold onto this change, but see how feasible it would be to validate instructions post fixups in LLVM?  Then we can keep these `is` functions more strict about only accepting immediate operands.
> > I almost wonder if fixing the phase ordering in LLVM will fix this more generally for all instructions/operands or even additional ISAs.  Perhaps let's hold onto this change, but see how feasible it would be to validate instructions post fixups in LLVM?  Then we can keep these `is` functions more strict about only accepting immediate operands.
> 
> I realized that resolving some of these fixups also depends on the chose encoding of the instructions between the symbols, i.e. whether it is 16 or 32 bits. So it may be difficult to switch the order of the two phases. Maybe we can add different an option to allow users to decide if size is a major concern, which would allow us to choose a better encoding. It's suboptimal but I suspect such code is not dominant in most code bases, so maybe it's acceptable. WDYT?
> 
It's not the size of the chosen instruction (relaxation) I'm concerned with.  It's the lack of symmetry between considering whether an operand is a `MCConstantExpr` or not between these `ARMAsmParser::is*` methods.

So the relevant call chains look like:

```
AsmParser::Run ->
  AsmParser::parseStatement ->
    ARMAsmParser::MatchAndEmitInstruction ->
      ARMAsmParser::validateInstruction ->
        is*
  MCStreamer::Finish ->
    MCObjectStreamer::finishImpl ->
      MCAssembler::Finish -> 
        MCAssembler::layout ->
          ARMAsmBackend::applyFixup ->
            ARMAsmBackend::adjustFixupValue
```

Where the is* methods of `ARMAsmParser` assume they have immediates that can validated eagerly, which for expression operands we can't validate until after `ARMAsmBackend::adjustFixupValue`.  `ARMAsmBackend::adjustFixupValue` does some checking/diagnostic emission, but I wonder if we can rerun `ARMAsmParser::validateInstruction` immediately upon applying a fixup?

I wonder whether the is* methods should have a "strict mode" check (boolean parameter, perhaps); in the case of `!MCConstantExpr` operands, we check once less strict (don't consider that the operand is not a `MCConstantExpr` a failure), then after fixups immediately call `validateInstruction` again but do consider `!MCConstantExpr` a failure.  I don't know if it's possible to get the `MCInst` from the `MCFixup` so that `ARMAsmBackend::adjustFixupValue` could call `ARMAsmParser::validateInstruction`. Seems the `MCAsmBackend`s and the backend parsers don't have references to one another....

>>  I'm very concerned about isARMBranchTarget/isThumbBranchTarget.

Nevermind that comment, they optimistically return `true` in the case of !MCConstantExpr.

If not, I do like how `ARMAsmParser::isAdrLabel` has comments like
```
// If we have an immediate that's not a constant, treat it as a label
// reference needing a fixup.
```
I think adding those elsewhere in this CL would go a long way in clarifying the intent of these `is*` methods, maybe without strictly copying the part about "a label."

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98916/new/

https://reviews.llvm.org/D98916