<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/62080>62080</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Clang assembler has bugs in Intel syntax
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
soomin-kim
</td>
</tr>
</table>
<pre>
Hi, I'm Soomin Kim from KAIST SoftSec Lab.
We are reporting two x86-64 assembler bugs we found, which are all relevant to Intel assembly syntax. The bugs were discovered while we manipulated the label names of toy assembly programs.
--------------------------------
The first bug:
```
$ cat ./variant1.s
.intel_syntax noprefix
.text
or:
ret
call or
$ clang -masm=intel -o ./variant1.o -c ./variant1.s
./variant1.s:5:8: error: unknown token in expression
call or
^
```
Clang rejects this program because of the token `or`. Note that this program is generated from the below assembly program by changing the label name:
```
$ cat ./normal1.s
.intel_syntax noprefix
.text
LABEL:
ret
call LABEL
$ clang -masm=intel -o ./normal1.o -c ./normal1.s
```
Unlike `variant1.s`, Clang can compile this program. However, it was indeed hard for me to find on the Internet why the name (`or`) matters. For example, a Wikipedia webpage (https://en.wikipedia.org/wiki/X86_assembly_language) lists several keywords but does not include `or`.
Surprisingly, `or` does not raise a problem in AT&T syntax. Please refer to the below program:
```
$ cat ./variant2.s
.text
or:
ret
call or
$ clang -masm=att -o ./variant2.o -c ./variant2.s
```
We thought this is a bug of `Clang` because (1) the one written in AT&T was accepted by `Clang`, and (2) there are no reasons to reject the case. Other usages of `or` (an instruction mnemonic, for example) cannot be applied to the argument of `call` instruction, and clearly there is a definition of the label `or`.
--------------------------------
The second bug:
```
$ cat ./variant1.s
.intel_syntax noprefix
.data
rsp:
.long 1
.long 2
.long 3
.long 4
.text
lea rax, [rsp] // rsp here is intended to refer to a pointer in .data section
$ clang -masm=intel -o ./variant1.o -c ./variant1.s
$ objdump -d ./variant1.o
./variant1.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <.text>:
0: 48 8d 04 24 lea (%rsp),%rax
```
This bug is somewhat similar to the first bug, but has a different aspect. We'd better show the original assembly program to make it easy to understand this bug.
```
$ cat ./normal.s
.intel_syntax noprefix
.data
LABEL:
.long 1
.long 2
.long 3
.long 4
.text
lea rax, [LABEL]
$ clang -masm=intel -o ./normal1.o -c ./normal1.s
```
The code semantics of the original program is loading the pointer LABEL to the register `rax`. However, after we change the name of the label to `rsp`, which is an existing register name, the resulting program certainly has different code semantics. The binary code from `Clang` moves a value stored in the register `rsp` to `rax`.
The problem here is that even though there is an ambiguity in choosing the right target between the label `rsp` and the register `rsp`, `Clang` randomly chooses one of them, so the program has an unintended behavior.
Likewise, this issue will never happen with AT&T syntax. Please refer to the below code:
```
$ cat ./variant2.s
.data
rsp:
.long 1
.long 2
.long 3
.long 4
.text
leaq (rsp), %rax
$ clang -masm=intel -o ./variant2.o -c ./variant2.s
$ objdump -d ./variant2.o
./variant2.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <.text>:
0: 48 8d 04 25 00 00 00 00 lea 0x0,%rax
```
The label `rsp` is successfully transformed into a relocation entry in the object file.
--------------------------------
We have seen two different situations where the names of labels can make `Clang` confused. We thought these are very interesting, as it is rather hard to strictly say that `Clang` is wrong.
We think there are two possibilities:
(1) Intel syntax rejects the use of an opcode name as a label, or
(2) `Clang` just mishandles the label.
In one sense, the ambiguity of Intel syntax (due to the absence of an official Intel assembly syntax manual) is the problem. For decades, many assemblers have been developed ad-hoc without any standards. So, it seems to be a hard decision problem to allow/deny several tokens or to choose the right usage.
On the other hand, `Clang` need to handle both two cases. They may reduce the usability and correctness of `Clang`. A user might want to write a function named `or`, but get rejected by `Clang`. A user might want to load a data pointer named `rsp`, but the resulting program loads a stack pointer, which can differ from the user's intention.
We suggest that `Clang` should compile the first case, and `Clang` should *not* compile the second case or should raise the alarm for the one.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMWF9v3DgO_zTOC5GBo_nTyUMeskmDLba4PSA99N4K2aJtNbLkE-Vx_O0PlOwZT5Letl0UuEGQxGOJpPjnxx8liXRtEW-y7W_Z9v5C9qFx_oaca7W9fNLtReHUePO7zsQdfMjEuxYe4zv4Q7dQedfCH7cfHj_Bo6vCI5bwURarLL_P8tv0-zOC9AgeO-eDtjWEwcHzfne524AkwrYw6KHoa4IBoXK9VaxqaHTZxJ3SGPBo8CBtgODggw1o5q0j0GiDfF7BpwZnKR5BaSrdAT0qlmSQZbfS6q43MqCC0CAYWaABK1skcBUEN56kdt7VXrZ0dpTLv_gs17I5lfYU2KhsPX2b7fLpJz2KDZQywCoTDwfptbThakXp3UrzOb-k44F1ncdKP0_vAj6H9K_zR-Eep-9KdpnzCx1G2houW0lttr6PguHSnat1cFm-bcj5d-vbbba-3WfrW0Dvo3ro7ZN1g4XgntCCtoDPnUci7ewbFkH6ZNv3bzrlLhrr8SuWgSA0muZoQIGl7AljtBqc1GW73Plsl6_gHy4ghEaG812aoEaLPgY-ZixvLtC44VXAoRihbKStY6aeJcn3BNE630rzozH8ePvb-4_fCGN69z2RnHUfA_nCmBdm_8sa_YTsvUVwdznXXopAKS2Uru24epb-XMHvbsADel6qAwySQFuFqKCRXkHlPLQcHKi0VeBs9CNXrbcYYGjG-AW7FDKxn8OXiWtoZQjoaQUPzgM-y7YzyFokfNZPukOlJQxYdLKOW5sQOk7ITDxk4gHtaphXrZyvM_HAz5l4-Pd-92UO9Bc-Wy9rZH1GUyAgPow08ITj4LwiKPoAyiGBdQG0LU2v8JRmyyp_7H3nNWlbm5ENnRedtnupCUGy8wqDLRfH7adM7D4dgeufBiUxQFbo2Wun5Jwc_gPoIY6Z97cQQobwAh_EK3wQ38irz5wtrq-bqQo1gWQM5KLNdnnMLfbQXMuZ2F9xLPjYziIMXoeQUGRyFCeYLEvsuH6LcSklJodVLERMQnzqN9aBR0nOEvs0oUnUUUrCFfzJK6EnWSfwPwYuE3vJyin4vgzaWWgtts7qknVVy7y85hLhGBcIsuuM5r6S4id93bdowySa3c3CF2Jny0uD0ptxsjw6S2GlrY66J6BLIPRmBv5oTyIsnVW_oilNK5QMcso26o46VsbZGq6WD2L5sF4-bM4EnjLZoAQvn2OhbX9j8dt7SMUPnjqYXchGWpWicSwrCZ3jF54zK1rJvgjHFvW3-yQLcMVX1bcdXKoXm95opY47J38qRtiKwToAmmq3uUz0aOmF9Pte07FhuWq2H5KL1mdL8xcfyNZ307r3x6UAkCcrNnvYK8g3IDaw_LDLuVeLfSa27HFxnYk7_l8-v5k9n7jmudw1AbkWB27HpFtt5BHdTrxI3EWwbWTMe11V6LlqJHVYhhV8xky8U1AgtwWgxg0JJryutZXmdfcODlr5hNyWUNLIz71V6ClwrYXJttV39vGfyvrzZv4L8j4p2N7_ClLACFE6xTDRSht0STMEHX2-YFXGSTXzpLm4onVzoD3WmvjbbJfzAZikLciDrPjdgIlx4YkWnMFecHE7dRPep8mAkZJ5pqY4VBw1RaYm7ib11Jv4eja6RB-ktmaMKXdKuPMzT8OEttKP6VVkjcvu1boDcs4epOkRKDieNLR9fepo9nyG5IKXkDxTgxm-IoPFA9qpkS56gwXZFrrudRhZW9k4R3MEvI49V_oauSWFAdGed4_JmFQKb9k5UZjjKb20yrVmTIq4U9o5OC2vpRTm2buxjC309gjABTbyoJ0_O_NH_YSDpilMkSJQjzBoY8BybkAjuw4tDDo038-WOFA_RZV-bcf6D4PnETnhDDq_q-X8L-r1zZYj5pbzqvGI_9fGs4U8P_1MjSd_zv-q37xOcG49fVkiUdUb5lZeWuJzxhqNXMCjcaWMh0Ab_DjXrisiUWTP_DTN-ozQyAPjCRfg4BY4Qzr0USvBEIt6hryIs_EcFAev2MWWpVg6W_WEipvigmAjJbp7wHiGgB4jIEZ4Je6DmsDLyHbjdBYcUPC6DGYEkmMCm6UiTTB4Z-uXlzih0fZpQbD5YJ0j0oU2OmikUx5MjD5d1Exd8zTPI0wjvLTgugivEfQjB4guYONPw0li9ksTv_YUoNXUSKsM0gnkzmz-YCNcEVo6doQTerrq3L5M7FWPR_5eENryaGVV6VJL8_bVE7TS9tKwkZpmPGRAT2OswlIqJLaglXY83XlRypKCs0ThAY3rUIFUl40rI_S5PgDviORFekUreHTT0E2IbZxsePZIkVVYauKEnvsJJ7oxbsjEg0KWM8258dqEwEX8TMi-aCBxJjpz5J9TaUxJlO7nlvGwmLh2igcULjQxPXjUSs10hFaO4FH1JU4pIGPejGkIct5jGSwSvRgTV3DL6eKhjcYN0x0gD4l88Kq3CYk4gxSc7hISr-ROmBLv1eT4DcFMaZiK8ngwU5qj7FOTZOFvMwwWwIlMQZZPs4gTbeHaTnBwuoliMzLxbppb-Dgva4_6ukYKr4uVGtcbtbimmck1u_44HL_ekIlb60Imbs-2TrMh7-XsmNamG4xYFUb6Ns7A07C-ulA3a3W9vpYXeHO121-J3WZzdX3R3GzF5npbVbtrJa7f4bpY55vdLhebnRLVer8WF_pG5GKdb65EfrVdr69XmyJ_V23ldne1VmKtMNvk2EptVsYc2pXz9UWkCTc7ke_ziwSV8dJaCItD4hCZENn2_sLf8J7Loq8p2-TxkuckJehg8CZdcZ3un5m6xNtjbc9w4aL35ub8nqnWoemLVenaTDyw2OnPZecdp1omHqIxlImHaOx_AwAA__-qy1rJ">