<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/150427>150427</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[X86] Assembler does not correctly parse Intel syntax expressions involving registers in brackets, misassembles or asserts
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Heath123
</td>
</tr>
</table>
<pre>
The x86 assembler, when parsing operand expressions using Intel syntax, seems to attempt to match specific patterns in the code when parsing tokens, such as a register followed by a `*` symbol followed by an integer. This is fragile and breaks with more complex expressions such as those involving brackets.
However, the parser is able to parse complex expressions by pushing operators onto a stack and evaluating them, but only supports this for constant integers. When encountering a complex expression involving registers, the parser sometimes pushes registers to this stack, which when evaluated are treated as 0, and for some operations hit an assert such as this:
https://github.com/llvm/llvm-project/blob/94aa08a3b0e979e6977619064a27ca74bb15fcf6/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp#L316-L317
LLVM should either parse these expressions correctly (which may be complex and involve algebraic expansion in some cases), or throw an error instead of misassembling.
The two instructions in the following code block are incorrectly accepted and misassembled, but should result in an `invalid base+index expression` error:
```asm
.intel_syntax noprefix
.code64
test_case:
lea rdi, [4 * (rax + rdi)]
lea rdi, [(rax + rdi) * 4]
```
Output (`clang -c test.s && llvm-objdump -M intel -d test.o`):
```asm
testcases.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <test_case>:
0: 48 8d 3c 38 lea rdi, [rax + rdi]
4: 48 8d 3c 38 lea rdi, [rax + rdi]
```
The expected result would be an error.
These examples work in the GNU assembler, but are currently misassembled by LLVM:
```asm
.intel_syntax noprefix
.code64
test_case:
lea rdi, [rax + 4*(rdi + 1)]
lea rdi, [(rax + 1) * 2]
lea rdi, [(rax) * 4]
lea rdi, [rax * (1 + 1)]
```
Output:
```asm
0000000000000000 <test_case>:
0: 48 8d 7c 38 04 lea rdi, [rax + rdi + 0x4]
5: 48 8d 78 02 lea rdi, [rax + 0x2]
9: 48 8d 38 lea rdi, [rax]
c: 48 8d 3c 25 00 00 00 00 lea rdi, [0x0
```
The last two lines also trigger the `Multiply operation with an immediate and a register!` assertion shown above when assertions are enabled.
Expected output/GNU assembler output:
```asm
0000000000000000 <test_case>:
0: 48 8d 7c b8 04 lea rdi, [rax + 4*rdi + 0x4]
5: 48 8d 3c 45 02 00 00 00 lea rdi, [2*rax + 0x2]
d: 48 8d 3c 85 00 00 00 00 lea rdi, [4*rax]
```
Or we should at least throw an error if we don't want to handle these cases
Cases like `lea rdi, [(4 * rax) / 2]` also currently misassemble but I'm not sure if the correct behaviour here would be to treat it as `[2 * rax]`, or throw an error as the GNU assembler currently does. Finally, `lea rdi, [(1 + 1) * rax]` works on the GNU assembler but throws `error: unknown token in expression` in LLVM, which is at least much better than a silent incorrect result, but ideally LLVM wolud handle this case too.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJysV1uP4yoS_jXkpdQRwZckD3lIT5_sOVLP7j7MXt6OMK7ETGOwAOfy71eFc3N3ZmfmaKxISQwUVV99VXzIEPTOIq5Y8cyKl4nsY-P86neUsZmJbFK5-rT60iAcFyXIELCtDHomPsGhQQud9EHbHbgOvbQ14LHzGIJ2NkCfRv6wEQ2Ek43ySMsCYhsgOpAxYttF-tnKqBoIHSq91Qo6GvI2gLYQGwTlahxvF90b2pDM9aoBGUCCx50OET1snTHugDVUJ5DASs7EmpUcwqmtnBkPW9A24g79FL40OoAOsPVypw0ChVN5lG8BDjo20DpPrrSdweMozosLsXEBQdu9M3tysvJSvWEMU8bXjK9_dwfcD9BRUBQKetpQVgYJhfTm4Q7VCbo-NFego_MBnCUQIUSp3pKzuJemlzHh02BLG1V9BGfNCULfdc5HcpJCdB6UsyFKGy8AhCn8hyBGq1xvI3qyIx-4cxfhBfLwLqjgWoy6xZDcxnCbSHEmF5LbA420aobsngPAGqRHiB6H3wE4TaQQyXEyfoYhgdPoSHkkbvp4lwwdWLYeoG9i7NI_sWFis9Ox6aupci0TG2P2l6-nzruvqCITm8q4ionNMpeSL2RWcVzOl1gu5_NytuRlLsVcyXleVbNiq7blnR1N675Iv0Oy898Fja1D-88EzPDm-nequo6J7DWblU-v2Ww-OPv6-u_PEBrXmxpQxwb9mRmxwYAjXijnPapoTsDEYgCylSeobiwi0IZ8IUizw8pLrciGtOdcDngqGTAwsSSgnYfYeHcgVNF750HbEFHW4LbQ6nDuAtruzsym9hAPLk3zvRrScq7dodqILKmKK-OIrZ7q5Oa9VAq7lGtb3-2A9YXDZzg8ht4QY8k1VnJt99LoGioZkIlnbesRU6noUwAsW8PgKnWD9JGhZXw9JfKbP4fuBNZ1Hrf6SAPkbZkzvo4Y4p8Ez8AmOD8GZfr2tSYnWfGcAxNrSoSXR2DieRhasuKFln2c_2FmWp8P869-Mr7-Rx-7PpJlVnJlpN3BkwLyaxqAiZKJEhJ_XfW17tsOnj6nojbwVA_TXGqCy3M9jCGgCSn7U0cwAcCWut_W-VZGQLMt86fjonxKYDC-frmm50R8CJjyDdOIx3gtOP7uAZZ9ugPytxGWnPbNF7CoIVOQLeDj8xG9e-gGhIcn_yXG7uEnduOxQ0UEPTPwkOhY4bVEbpWQSlRS9QU4OP92KYS__f1f4_OTeE2FoHrv0VIZ3DOfej51giukv5S5H0PP6ZAUC1_r9Hd2Y-43Ubsj8OxCX_Fx0fsF75j-vaykkpq9c-pjeTzk9l-l4Twxh-c_xZz0zY_jsIo7owvg4v-Y4McxeMs7Ji8erxvNVyPmiwI4v32-5T8_8gd0NzLE1NGNthhAmuAger3boU9cZiX_3JuoO-oBl5N4UEmkqNoWay3jIKFuwoyJGXXk4aCmBaFxBwuycvuzvLsOhVQZaEka1VRbv10K0A3ZFptRPV1e_2ISVD9HAqqhHyBCpiAviAvfzY4gg9-gRz0yuPixdOeDwQdF5OGAl2NWRlpKFHgnA7Y0qXaWiXmEA4nH6KCRtjYXdTLoiNSwPtFPMPot8eVh-xiOzGtP2AztgzhCjHvYGFPb_IOJeQvWkdwjKbE9XxOSnoAKG7nXrvfQoMdbrybpSZoSSDCGdDEonsXVhbTzY_2T9OS7Fn7nXu0wTGGjrTTmlKIb4h3FOhs3yuuO6ZAgNf9gC4o1uZK8vUqZ3r5Zqpx0C6LzZSx5tB1Ojqu4pjvGJaUt6eMK6YIFsSHdDEEbTPeAC4DDIXc5o3SNFFeyCQdn-vqWch1SxiE6N53Uq6xeZks5wdVsXmRZOc9FOWlWKp_LZVlwnNUzXqutXKIqUXKFspyJAid6Jbgo-FzkszzjIpvKeZYvl4uiLouiUHPOco6t1GZKMmfq_G6iQ-hxNSt4LuYTIys0Id1ghbB4gDTKBLFp4ldJG1X9LrCcGx1iuJmJOpp09SWhXrzA-go95TQx7KZSBxV-f58difEH1yJKxeUWSHDe8ThA4hX1uzDpvVn99B0lBRmY2JxR2K_E_wIAAP__Kcjg8g">