[llvm-bugs] [Bug 49974] New: [arm disassembler] Incorrect number of operands in MCInst generated by disassembler

via llvm-bugs llvm-bugs at lists.llvm.org
Thu Apr 15 10:49:03 PDT 2021


https://bugs.llvm.org/show_bug.cgi?id=49974

            Bug ID: 49974
           Summary: [arm disassembler] Incorrect number of operands in
                    MCInst generated by disassembler
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: ARM
          Assignee: unassignedbugs at nondot.org
          Reporter: minyihh at uci.edu
                CC: llvm-bugs at lists.llvm.org, smithp352 at googlemail.com,
                    Ties.Stuij at arm.com

This ticket actually contains two bugs but they're really similar so I just
combined them into one.

First, given this binary instruction "0x26,0x00,0x00,0xeb", we can disassemble
it with the following command:
```
$ echo "0x26 0x00 0x00 0xeb" | llvm-mc --disassemble -triple=armv7 -o -
        .text
        bl      #152
$
```
Although the above command looked normal, if we look into its disassembled
`MCInst` (currently the debug output of `llvm-mc --disassemble` doesn't print
the disassembled `MCInst` but you can observe it in other ways like using gdb),
it looks like this:
```
<MCInst #703 BL <MCOperand Imm:152> <MCOperand Imm:14> <MCOperand Reg:0>>
```
According to the instruction definition of `BL`, it only takes 1 operand rather
than 3. The latter two are predicate operands (the second operand represents
`ARMCC::AL` and the third is predicate register it depends on) inserted by
mistake.

Another input that triggers a similar bug is "0xad 0xf2 0x7c 0x4d":
```
$ echo "0xad 0xf2 0x7c 0x4d" | llvm-mc --disassemble -triple=thumbv7 -o -
        .text
        subw    sp, sp, #1148
$
```
Again, the disassembled text is benign, but the disassembled `MCInst` looks
like this:
```
<MCInst #4193 t2SUBspImm12 \
              <MCOperand Reg:15> <MCOperand Reg:15> \
              <MCOperand Imm:1148> \
              <MCOperand Imm:14> <MCOperand Reg:0> \
              <MCOperand Reg:0> <MCOperand Reg:0>>
```
According to the instruction definition of `t2SUBspImm12`, there should be only
5 operands rather than 7. The last two operands are inserted by mistake.

These bug affect some of the users that directly consume the disassembled
`MCInst` object. For example, feeding the disassembled `MCInst` into LLVM MCA
-- it will cause MCA to choke because MCA is more sensitive to the total number
of operands in a `MCInst`.

The reason these two bugs were never caught is because we never directly test
on the in-memory `MCInst` object (or its textual format). The testing
infrastructure we have translate the `MCInst` into assembly code before
checking them. But as you can see above, this can not detect surplus operands
appended at the _end_.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210415/7b817f92/attachment-0001.html>


More information about the llvm-bugs mailing list