[PATCH] D98916: [ARM] support symbolic expression as immediate in memory instructions

Nick Desaulniers via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 22 13:09:09 PDT 2021


nickdesaulniers added inline comments.


================
Comment at: llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp:1099-1117
   bool isThumbMemPC() const {
     int64_t Val = 0;
     if (isImm()) {
       if (isa<MCSymbolRefExpr>(Imm.Val)) return true;
       const MCConstantExpr *CE = dyn_cast<MCConstantExpr>(Imm.Val);
       if (!CE) return false;
       Val = CE->getValue();
----------------
Something still really bugs me about these `is*` functions; perhaps I'm still hung up on D97568.

I understand that we can't validate immediates when expressions are used until post fixup; doing such validation of operands before fixups are performs seems like a major phase ordering issue in LLVM's assemblers.

So for a lot of these `is*` functions, if we don't have an immediate operand, we can't tell if it fits whatever constraints we're checking.  This change makes these functions return `true` as in "maybe."  Rather than return a type that's a tri-state for "definitely yes," "definitely no," or "can't tell until post fixups."  Then other `is*` functions don't have that added logic, like `isFPImm` below, `isLEOffset` and `isSignedOffset` above, etc.  I'm very concerned about `isARMBranchTarget`/`isThumbBranchTarget`.

I almost wonder if fixing the phase ordering in LLVM will fix this more generally for all instructions/operands or even additional ISAs.  Perhaps let's hold onto this change, but see how feasible it would be to validate instructions post fixups in LLVM?  Then we can keep these `is` functions more strict about only accepting immediate operands.


================
Comment at: llvm/lib/Target/ARM/MCTargetDesc/ARMMCCodeEmitter.cpp:1018-1019
+      isAdd = false; // 'U' bit is set as part of the fixup.
+      MCFixupKind Kind = MCFixupKind(ARM::fixup_arm_ldst_imm_12);
+      Fixups.push_back(MCFixup::create(0, MO1.getExpr(), Kind, MI.getLoc()));
+    }
----------------
jcai19 wrote:
> nickdesaulniers wrote:
> > jcai19 wrote:
> > > nickdesaulniers wrote:
> > > > What if we're in Thumb2 mode? I think we need a different fixup kind, like L991-996.
> > > > 
> > > > Do we need to increment MCNumCPRelocations?
> > > > What if we're in Thumb2 mode? I think we need a different fixup kind, like L991-996.
> > > 
> > > This patch is only intended for some memory instructions in arm mode. I will look into the thumb mode but do we care about it at this moment?
> > > 
> > > > Do we need to increment MCNumCPRelocations?
> > > 
> > > The expressions of interest here are really those would produce immediate values, so I'm not sure if I should count them as relocation. WDYT?
> > > 
> > > 
> > > I will look into the thumb mode but do we care about it at this moment?
> > 
> > We don't have a pressing need for THUMB2 support, but I worry that someone using clang may find the wrong fixup(relocation?) applied in the event they are assembling a THUMB2 source with a symbolic operand, due to this change. Perhaps a check for `isThumb2(STI)` then `assert` if thumb, or skip generating the fixup? (Or maybe another reviewer has thoughts?)
> > 
> > > The expressions of interest here are really those would produce immediate values, so I'm not sure if I should count them as relocation.
> > 
> > Looks like it's just a stat (so probably doesn't really matter too much):
> > 
> > ```
> > STATISTIC(MCNumCPRelocations, "Number of constant pool relocations created.");
> > ```
> > 
> > Since it's an immediate, it's encoded in the instruction; not a constant pool.  But seeing how it's incremented elsewhere throughout this TU, it looks like everytime we have an Expr MCOperand, we bump it. So I think you should here, too. Maybe other reviewers can help clarify?
> > 
> > 
> > 
> > Seeing how similar the users of `MCNumCPRelocations` makes me think there's a whole lot of other instruction formats with this issue that you're fixing.  Though I think it's fine that we start with ldr/str.
> > We don't have a pressing need for THUMB2 support, but I worry that someone using clang may find the wrong fixup(relocation?) applied in the event they are assembling a THUMB2 source with a symbolic operand, due to this change. Perhaps a check for `isThumb2(STI)` then `assert` if thumb, or skip generating the fixup? (Or maybe another reviewer has thoughts?)
> 
> I did some more digging and the thumb mode of these instructions are actually handled by different functions, so right now comping them in thumb mode (either with thumbv7 in the target triple or .thumb) will crash the compiler with assertion failures. I have been looking into supporting thumb mode a little bit, and found extra complexity: while the instructions supported by this patch (ldr/ldrb, str/strb) have one encoding in arm mode, they have multiple encodings in thumb mode, depending on the value of the immediate operand,  or if it is (un)signed. Some of these encodings are 16 bits while the others 32. IAS associate each encoding with a different opcode and tries to decide the opcode before the fixups are resolved, so it is difficult to decide the precise encoding when expressions are present. One solution I could think of right now is to always match such instructions to the shortest encoding so they don't unnecessarily generate 32-bit instructions when 16-bit version can be used, at the cost of not being able to support expressions with a result outside of the range. WDYT? Also I think it may be easier to implement the thumb mode support in a different CL so we don't have to deal with two complications at the same time.
> 
> PS: I re-examined my CL for supporting expressions in b.w instructions (https://reviews.llvm.org/D97568). b.w has two encoding in thumb mode but both are 32-bit instructions and they do not look that differently, so IAS somehow has only one opcode for both of them (verified with --debug-only=asm-matcher). So b.w probably don't have this issue.
> 
> 
> > Looks like it's just a stat (so probably doesn't really matter too much):
> > 
> > ```
> > STATISTIC(MCNumCPRelocations, "Number of constant pool relocations created.");
> > ```
> > 
> > Since it's an immediate, it's encoded in the instruction; not a constant pool.  But seeing how it's incremented elsewhere throughout this TU, it looks like everytime we have an Expr MCOperand, we bump it. So I think you should here, too. Maybe other reviewers can help clarify?
> 
> +1.
> 
> > Seeing how similar the users of `MCNumCPRelocations` makes me think there's a whole lot of other instruction formats with this issue that you're fixing.  Though I think it's fine that we start with ldr/str.
> 
> Yes there are other memory operations with similar issues but hopefully once we have this checked in it would be much easier to fix those.
> 
> 
> 
>  think it may be easier to implement the thumb mode support in a different CL so we don't have to deal with two complications at the same time.

Got it, thanks for the due diligence. Sounds good.  Sounds like "relaxation" might be needed to efficiently select the encoding when expressions are used.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98916/new/

https://reviews.llvm.org/D98916



More information about the llvm-commits mailing list