[PATCH] D70680: [ARM][Thumb2] Fix ADD/SUB invalid writes to SP
Diogo N. Sampaio via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 7 08:12:27 PST 2020
dnsampaio marked 6 inline comments as done.
dnsampaio added inline comments.
================
Comment at: llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp:9851
+ if (Inst.getOperand(5).getReg() != 0 || HasWideQualifier)
+ break;
+ unsigned V = Inst.getOperand(2).getImm();
----------------
efriedma wrote:
> Is this new behavior? Or was this handled elsewhere before, somehow?
It is not a new behavior, it was handled elsewhere. The only reference I can find about such conversions is in `Thumb2SizeReduce::ReduceSpecial`
running `llvm-mc -triple=thumbv7 -show-encoding <<< "add sp, #508"`
```
name=llvm_currently
add sp, #508 @ encoding: [0x7f,0xb0]
```
```
name=this_patch_without_this_part
add.w sp, sp, #508 @ encoding: [0x0d,0xf5,0xfe,0x7d]
```
```
name=the_entire_patch
add sp, #508 @ encoding: [0x7f,0xb0]
```
Perhaps we might lose the optimization when we obtain a node that is a `t2ADDspImm` that could be converted to `t1`, but I prefer to leave that to another patch.
================
Comment at: llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp:5611
+ if (!Check(S, DecoderGPRRegisterClass(Inst, Rd, Address, Decoder)))
+ return MCDisassembler::Fail;
+ }
----------------
efriedma wrote:
> I assume this is supposed to reject `adr sp, #label` etc. Is this new behavior? If it is, can you split it into a separate patch?
>
> Not sure what the `Inst.getNumOperands()` check is supposed to be doing.
About the `Inst.getNumOperands()`, I was confused if `Inst` was empty here or not, as some case statements create new instructions instead of using this one. I replaced it by an assert that `Inst` should be empty.
Yes indeed it is a change of behavior. It will fail to accept `sp` to `thumbv7`.
Currently we have the same warning for both `thumbv7` and `thumbv8` when doing:
llvm-mc-9 --disassemble -triple=thumbv8 -show-encoding <<< "0x0f,0xf2,0x08,0x0d"
we obtain:
```
<stdin>:1:1: warning: potentially undefined instruction encoding
0x0f,0xf2,0x08,0x0d
^
addw sp, pc, #8 @ encoding: [0x0f,0xf2,0x08,0x0d]
```
After the patch, it will stop warning for `thumbv8` and will hard-fail for `thumbv7`. (indeed, it should not accept the softfail given by `DecoderGPRRegisterClass` ).
I can't move this changes to a distinct patch, as this code is not even executed currently. When I create the `spImm` variants in table-gen is when this decoder is actually used. Before, the instructions are either decoded as `addw` or `subw`, never as `adr.w`.
================
Comment at: llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp:5620
+ Inst.setOpcode(ARM::t2SUBri12);
+ if (!Check(S, DecoderGPRRegisterClass(Inst, 15, Address, Decoder)))
+ return MCDisassembler::Fail;
----------------
efriedma wrote:
> This could probably use a comment. It looks like it's handling `sub r0, pc, #0`? (That should actually be a valid instruction, as far as I know.)
Indeed it is a perfectly valid instruction. Is the singular case where (following the ARMv7-M Architecture Reference Manual) that the `ADR.w` `Encoding T2` with offset zero is decoded as a `sub`. Here we add the `pc` operand to the operation. Indeed, it should use the function `DecodeGPRRegisterClass`, not `DecoderGPRRegisterClass`.
So we will preserve the current behavior when doing:
`llvm-mc --disassemble -triple=thumbv7 -mcpu=cortex-a8 -show-encoding <<< "0xaf 0xf2 0x00 0x00"`
giving:
`subw r0, pc, #0 @ encoding: [0xaf,0xf2,0x00,0x00]`
The changes appear when the offset is not zero, such as: `"0xaf 0xf2 0x01 0x00"`
```
name=currently
subw r0, pc, #1 @ encoding: [0xaf,0xf2,0x01,0x00]
```
```
name=will_be
adr.w r0, #-1 @ encoding: [0xaf,0xf2,0x01,0x00]
```
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D70680/new/
https://reviews.llvm.org/D70680
More information about the llvm-commits
mailing list