[PATCH] D70680: [ARM][Thumb2] Fix ADD/SUB invalid writes to SP

Tue Jan 7 08:12:27 PST 2020

dnsampaio marked 6 inline comments as done.
dnsampaio added inline comments.

================
Comment at: llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp:9851
+    if (Inst.getOperand(5).getReg() != 0 || HasWideQualifier)
+      break;
+    unsigned V = Inst.getOperand(2).getImm();
----------------
efriedma wrote:
> Is this new behavior?  Or was this handled elsewhere before, somehow?
It is not a new behavior, it was handled elsewhere. The only reference I can find about such conversions is in `Thumb2SizeReduce::ReduceSpecial`

running `llvm-mc -triple=thumbv7 -show-encoding <<< "add sp, #508"`
```
  name=llvm_currently
  add     sp, #508                @ encoding: [0x7f,0xb0]
```
```
  name=this_patch_without_this_part
  add.w   sp, sp, #508            @ encoding: [0x0d,0xf5,0xfe,0x7d]
```
```
  name=the_entire_patch
  add     sp, #508                @ encoding: [0x7f,0xb0]
```
Perhaps we might lose the optimization when we obtain a node that is a `t2ADDspImm` that could be converted to `t1`, but I prefer to leave that to another patch.

================
Comment at: llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp:5611
+    if (!Check(S, DecoderGPRRegisterClass(Inst, Rd, Address, Decoder)))
+      return MCDisassembler::Fail;
+  }
----------------
efriedma wrote:
> I assume this is supposed to reject `adr sp, #label` etc.  Is this new behavior?  If it is, can you split it into a separate patch? 
> 
> Not sure what the `Inst.getNumOperands()` check is supposed to be doing.
About the `Inst.getNumOperands()`, I was confused if `Inst` was empty here or not, as some case statements create new instructions instead of using this one. I replaced it by an assert that `Inst` should be empty.

Yes indeed it is a change of behavior. It will fail to accept `sp` to `thumbv7`.

Currently we have the same warning for both `thumbv7` and `thumbv8` when doing:
llvm-mc-9 --disassemble -triple=thumbv8 -show-encoding <<< "0x0f,0xf2,0x08,0x0d"
we obtain:
```
<stdin>:1:1: warning: potentially undefined instruction encoding
0x0f,0xf2,0x08,0x0d
^
        addw    sp, pc, #8              @ encoding: [0x0f,0xf2,0x08,0x0d]
```

After the patch, it will stop warning for `thumbv8` and will hard-fail for `thumbv7`. (indeed, it should not accept the softfail given by `DecoderGPRRegisterClass` ).

I can't move this changes to a distinct patch, as this code is not even executed currently. When I create the `spImm` variants in table-gen is when this decoder is actually used. Before, the instructions are either decoded as `addw` or `subw`, never as `adr.w`.

================
Comment at: llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp:5620
+      Inst.setOpcode(ARM::t2SUBri12);
+      if (!Check(S, DecoderGPRRegisterClass(Inst, 15, Address, Decoder)))
+        return MCDisassembler::Fail;
----------------
efriedma wrote:
> This could probably use a comment.  It looks like it's handling `sub r0, pc, #0`?  (That should actually be a valid instruction, as far as I know.)
Indeed it is a perfectly valid instruction. Is the singular case where (following the ARMv7-M Architecture Reference Manual) that the `ADR.w` `Encoding T2` with offset zero is decoded as a `sub`. Here we add the `pc` operand to the operation. Indeed, it should use the function `DecodeGPRRegisterClass`, not `DecoderGPRRegisterClass`.

So we will preserve the current behavior when doing:
`llvm-mc --disassemble -triple=thumbv7 -mcpu=cortex-a8 -show-encoding <<< "0xaf 0xf2 0x00 0x00"`
giving:
 `subw    r0, pc, #0              @ encoding: [0xaf,0xf2,0x00,0x00]`

The changes appear when the offset is not zero, such as: `"0xaf 0xf2 0x01 0x00"`
```
  name=currently
  subw    r0, pc, #1              @ encoding: [0xaf,0xf2,0x01,0x00]
```
```
  name=will_be
  adr.w   r0, #-1                 @ encoding: [0xaf,0xf2,0x01,0x00]
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70680/new/

https://reviews.llvm.org/D70680