<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - [arm disassembler] Incorrect number of operands in MCInst generated by disassembler"
href="https://bugs.llvm.org/show_bug.cgi?id=49974">49974</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[arm disassembler] Incorrect number of operands in MCInst generated by disassembler
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: ARM
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>minyihh@uci.edu
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org, smithp352@googlemail.com, Ties.Stuij@arm.com
</td>
</tr></table>
<p>
<div>
<pre>This ticket actually contains two bugs but they're really similar so I just
combined them into one.
First, given this binary instruction "0x26,0x00,0x00,0xeb", we can disassemble
it with the following command:
```
$ echo "0x26 0x00 0x00 0xeb" | llvm-mc --disassemble -triple=armv7 -o -
.text
bl #152
$
```
Although the above command looked normal, if we look into its disassembled
`MCInst` (currently the debug output of `llvm-mc --disassemble` doesn't print
the disassembled `MCInst` but you can observe it in other ways like using gdb),
it looks like this:
```
<MCInst #703 BL <MCOperand Imm:152> <MCOperand Imm:14> <MCOperand Reg:0>>
```
According to the instruction definition of `BL`, it only takes 1 operand rather
than 3. The latter two are predicate operands (the second operand represents
`ARMCC::AL` and the third is predicate register it depends on) inserted by
mistake.
Another input that triggers a similar bug is "0xad 0xf2 0x7c 0x4d":
```
$ echo "0xad 0xf2 0x7c 0x4d" | llvm-mc --disassemble -triple=thumbv7 -o -
.text
subw sp, sp, #1148
$
```
Again, the disassembled text is benign, but the disassembled `MCInst` looks
like this:
```
<MCInst #4193 t2SUBspImm12 \
<MCOperand Reg:15> <MCOperand Reg:15> \
<MCOperand Imm:1148> \
<MCOperand Imm:14> <MCOperand Reg:0> \
<MCOperand Reg:0> <MCOperand Reg:0>>
```
According to the instruction definition of `t2SUBspImm12`, there should be only
5 operands rather than 7. The last two operands are inserted by mistake.
These bug affect some of the users that directly consume the disassembled
`MCInst` object. For example, feeding the disassembled `MCInst` into LLVM MCA
-- it will cause MCA to choke because MCA is more sensitive to the total number
of operands in a `MCInst`.
The reason these two bugs were never caught is because we never directly test
on the in-memory `MCInst` object (or its textual format). The testing
infrastructure we have translate the `MCInst` into assembly code before
checking them. But as you can see above, this can not detect surplus operands
appended at the _end_.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>