<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/117304>117304</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[x86][MC] Fail to decode some long multi-byte NOPs
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Mar3yZhang
</td>
</tr>
</table>
<pre>
### Work environment
| Questions | Answers
|------------------------------------------|--------------------
| OS/arch/bits | x86_64 Ubuntu 20.04
| Architecture | x86_64
| Source of Capstone | `git clone`, default on `master` branch.
| Version/git commit | llvm-20git, [f08278](https://github.com/llvm/llvm-project/commit/f082782c1b3ec98f50237ddfc92e6776013bf62f)
<!-- INCORRECT DISASSEMBLY BUGS -->
### minimum disassembler PoC program
```c
int main(int argc, char *argv[]){
/*
some input sanity check of hex string from argv
*/
// Initialize LLVM after input validation
LLVMInitializeAllTargetInfos();
LLVMInitializeAllTargets();
LLVMInitializeAllTargetMCs();
LLVMInitializeAllDisassemblers();
LLVMDisasmContextRef disasm = LLVMCreateDisasm("x86_64", NULL, 0, NULL, NULL);
if (!disasm) {
errx(1, "Error: LLVMCreateDisasm() failed.");
}
// Set disassembler options: print immediates as hex, use Intel syntax
if (!LLVMSetDisasmOptions(disasm, LLVMDisassembler_Option_PrintImmHex |
LLVMDisassembler_Option_AsmPrinterVariant)) {
errx(1, "Error: LLVMSetDisasmOptions() failed.");
}
char output_string[MAX_OUTPUT_LENGTH];
uint64_t address = 0;
size_t instr_len = LLVMDisasmInstruction(disasm, raw_bytes, bytes_len, address,
output_string, sizeof(output_string));
if (instr_len > 0) {
printf("%s\n", output_string);
} else {
printf("Error: Unable to disassemble the input bytes.\n");
}
}
```
### Instruction bytes giving faulty results
```
0f 1a de
```
### Expected results
It should be:
```
nop esi, ebx
```
### Actually results
```sh
$./min_llvm_disassembler "0f1ade"
Error: Unable to disassemble the input bytes.
```
### Other cases seem to work
```sh
$./min_llvm_disassembler "0f1f00"
nop dword ptr [rax]
```
<!-- ADDITIONAL CONTEXT -->
### Additional Logs, screenshots, source code, configuration dump, ...
Instructions with opcodes ranging from `0f 18` to `0f 1f` are defined as multi-byte NOP (No Operation) instructions in x86 ISA. Please refer to the [StackOverflow post](https://stackoverflow.com/questions/25545470/long-multi-byte-nops-commonly-understood-macros-or-other-notation) for more details. It should be decoded in the following logic.
- "0x0f 0x1a" is extended opcode.
- The ModR/M byte DE translates to binary 11011110 (0xde).
- Bits 7-6 (Mod): 11 (binary) = 3 (decimal)
Indicates register-direct addressing mode.
- Bits 5-3 (Reg): 011 (binary) = 3 (decimal)
Corresponds to the EBX (or RBX in 64-bit mode) register.
- Bits 2-0 (R/M): 110 (binary) = 6 (decimal)
Corresponds to the ESI (or RSI in 64-bit mode) register.
XED also translates "0f 1a de" into "nop esi, ebx".
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJycV0tz6joS_jXKpsuULYOBBQue91KVhExIzmRmQ8l2GzTHlhhJTsj99VMt88whObnjogDZ_fj09UNtYa1cK8QB64xYZ3IjarfRZnAnTPz-741Q65tU5-8DxuPmA__U5iegepVGqwqVY-GEhcP9d3cM_6jROqmVhd9cJDxU9g2NPWoH374-ET7hWCwZnwmTbRifpdJ9DYcUdr1klbThOa2Vq4GHrbB9sjY02UY6zFxt8JuWTspLXZsMQRcwFlvrtEIvxpJwLR1kpVbIkpDxMeRYiLp0oBU9rYR1aFgSQmqEyjatk8kfaKzUivGZN6GrSrpP0JTlaxVwckUeWGdUhD3e7bHOhPHexrmtZfGQ8Vlja1OnrUxXjM9Ib_8TbI3-D2aO8VnjivFZY4VnURpj1u8VnZDH3Twvsj7HpNtNwihOi4QXjPcvUiQeMx4FAczvx4vHx-n4CSbz5XC5nN6Nbv8Fo-c_lhAELJ5eKB2zr5JKVnUFubTCWqzSEg086DFsjV4bUe3lk7D5ZM1aKgeVkIrxHv0VZp0RF9lGGGB8KMz6tUl_AtsdNUqeP6JleFoDgNUVglTb2oEVSrp3yDaY_aTobnAH1hmp1lAYXYG3e2bLs3xpewZzJZ0UpfwL4fb2xx2IwqHZO3gVpcwFVdNJjYROOsOyfBJmjW6uCm0Z79EO4sMOPpH9Re5zs3fj3xudnMXio_SlAy9ZjbVyuHOPWDRhrIDFE_98bFA4bKS8Ib4vJc4pXvfPt7f0G54vmt_LzcgCvHqU70314RRWNGbHeC_y1cD51BhtWDy8DqAPhZAl5i0P4dIL604-bnIf0yW6ywzVW98Syc3WUArKqsJcCocWhKXEITS1RZgrhyXYd-XE7sqGCOQSXYNwsbfKe4d9jk80712vGqHVA7mdV9WfuKOucBbLa9JDW3kFND-EkUI52vsFi_vrKzKv4PyC0Cts-vrUtdvWbtXUFeuM7oYvq8Xz08Pz0-p2ev_H059UtudhqaVySXvlQOS5QWt9doUXIlb-hSsHUllnViWqYwI2eOd0v8583Z1za8TbKn13aGnh_5AyLfauGD8Qe4maj71LXTDe-_Ckf71amoCfA5xS2p9HwCdS0ZQJ4x3LOmO1L5RffHxIW8DS4iemjiF8ViItEZw-T2Vwm0P38wS0jl6vhvL459CQr3f1M8Ibs7CWr76N0mn4DgZtXTp7oXxpMSwgEpDjt9xNd1vMHObX7M4d2I2uyxxSpHPxmj2lt4BWEtOY7r7lcpi5WpTl11uxm4Nau8X4rJJqRafv6qKVMM7DIhI5Eute_O9G7Dt4F26DBjJh0YJFrMjomzY__3_ERRgeEROBdOVv2uSwdYbGEiN2VMpfoDuMDsPJZP40X9wPb2G8uH-avjx9NTIM81xSZokSbvXa167NDKKyG-2aZTOfZZpIHQNkWhVyXRt_8kJeV1u63WrtqTvLVgtv0m1Ab0nXghFqfTz9WRJSVvZofnP6sCxoKQzSqCcV5tT-q7p0MqDowP3igQr_XsNiiw0Aqnp57lIqmjBhvhy24KFEYREMFmjICwWbdUZLJ7Kfi1c0RanfYKutuzbuWZLSe6n91Pffw_jO-Ix3Ou1OuxvSFKjVOjjhDJTe2oBmQa3K96BWORrrtM6DSmRG20CbQFMKBUq74y4KbaDSfu9OyNK24LzaIEdiMaf90TYKXZb6jegs9VpmLWjYD3w-7cICwl0kGOcgLeDOoSLdJhKtg-jTBuFO54-Mz-58-sNkCs4IZUt_9joNqVTCvEMUhVEURSGxH-4oEfqtU88MYESvEN0goed3Ovf9bghRROvGhO_O8QRiupVjJitRHodfmKtcZt6nwbWkuT7IpcHseEzRRqsT9jOvncCbfMT13mv4fbcAMNbGoN1qldtDhkxHLyStDTyOXojvpB2k0nn_ZO8A8VcsPPAMeUKPHIS_okmuorkGZTk_QFnOfw-l-X6ZTkCUVp-H0neZ_SFASaGo5jj_0Ks53xu6yQdx3o_74gYHUTfmPEmSsHezGcRx3O7lSdRP057AdhIJnoQij7GT51Ge9G_kgIe8HUWch_0oDHutRHSTouh24l47T9JeytohVkKWLWqFLW3WN9LaGgdR1I3D9k0pUizt4X3bDPwLVlqvLWuHpbTOnvScdKV_M9_1EirgzuhuzDoTmAlZ-jbvK6Z5JaEK_dBJ7E1tysHffsXzaKn-94BfB_x_AQAA__-o3LF2">