[PATCH] D73531: [llvm-objdump] avoid crash disassembling unknown instruction

Tue Jan 28 05:09:23 PST 2020

SjoerdMeijer added a comment.

In D73531#1844203 <https://reviews.llvm.org/D73531#1844203>, @jhenderson wrote:

> I think I understand. Let me restate my understanding to check:
>
> 1. A disassembler doesn't understand some part of the machine code and consequently generates an <unknown> response in the disassembly.
> 2. The point at which it tries to carry on is incorrect (presumably because the amount of data it read for the operands or whatever wasn't correct).
> 3. Consequently, something later is parsed incorrectly (because we're no longer parsing the real instructions), causing it to be treated as a call instruction or whatever, but due to the mis-parse, llvm-objdump tries to load non-existent operands and crashes.

This is an excellent summary of my understanding of the problem(s).

> Assuming that's correct, it seems like there are potentially two separate issues here.  1) The disassembler should have read the instruction right in the first place, assuming it was a valid instruction.

Agreed, but in practice difficult to achieve I think. When you don't give the disassembler the right architecture information, we can't really blame if for not being able to disassemble everything correctly. But the least thing we should achieve is to be a bit more resilient to this, and avoid a crash. But in my example, it looks like it is trying disassembling 1 byte instructions which is nonsense in the Thumb2 ISA which has 16-bit and 32-bit instructions. So I guess `Size` increment is wrong somewhere.

> 2. If it was an invalid instruction, leading to later mis-parses, there needs to be safer error checking (i.e. not assertions or crashes), which can handle things potentially being wrong.

Agreed. This is essentially what this patch is proposing: I don't see the point of trying to evaluate a branch when disassembly has failed before that.

> One question from your example: if the `cb` byte isn't a valid instruction, why does it get treated as a call/branch instruction?

I haven't checked the instructions encodings yet, but it is just a random value mapping to some opcode, a branch in this case.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73531/new/

https://reviews.llvm.org/D73531