<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/117304>117304</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [x86][MC] Fail to decode some long multi-byte NOPs
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Mar3yZhang
      </td>
    </tr>
</table>

<pre>
    ### Work environment

| Questions                                | Answers
|------------------------------------------|--------------------
| OS/arch/bits                             | x86_64 Ubuntu 20.04
| Architecture                             | x86_64
| Source of Capstone | `git clone`, default on `master` branch.
| Version/git commit                   | llvm-20git, [f08278](https://github.com/llvm/llvm-project/commit/f082782c1b3ec98f50237ddfc92e6776013bf62f)

<!-- INCORRECT DISASSEMBLY BUGS -->

### minimum disassembler PoC program
```c
int main(int argc, char *argv[]){
    /*
       some input sanity check of hex string from argv
    */
    // Initialize LLVM after input validation
    LLVMInitializeAllTargetInfos();
 LLVMInitializeAllTargets();
    LLVMInitializeAllTargetMCs();
 LLVMInitializeAllDisassemblers();

    LLVMDisasmContextRef disasm = LLVMCreateDisasm("x86_64", NULL, 0, NULL, NULL);
    if (!disasm) {
 errx(1, "Error: LLVMCreateDisasm() failed.");
    }

    // Set disassembler options: print immediates as hex, use Intel syntax
    if (!LLVMSetDisasmOptions(disasm, LLVMDisassembler_Option_PrintImmHex |
 LLVMDisassembler_Option_AsmPrinterVariant)) {
        errx(1, "Error: LLVMSetDisasmOptions() failed.");
 }

    char output_string[MAX_OUTPUT_LENGTH];
    uint64_t address = 0;
    size_t instr_len = LLVMDisasmInstruction(disasm, raw_bytes, bytes_len, address,
 output_string, sizeof(output_string));

    if (instr_len > 0) {
 printf("%s\n", output_string);
    } else {
 printf("Error: Unable to disassemble the input bytes.\n");
 }
}
```

### Instruction bytes giving faulty results

```
0f 1a de
```

### Expected results

It should be:
```
nop esi, ebx
```

### Actually results

```sh
$./min_llvm_disassembler "0f1ade"
Error: Unable to disassemble the input bytes.
```

### Other cases seem to work
```sh
$./min_llvm_disassembler "0f1f00"
nop     dword ptr [rax]
```

<!-- ADDITIONAL CONTEXT -->

### Additional Logs, screenshots, source code,  configuration dump, ...
Instructions with opcodes ranging from `0f 18` to `0f 1f` are defined as multi-byte NOP (No Operation) instructions in x86 ISA. Please refer to the [StackOverflow post](https://stackoverflow.com/questions/25545470/long-multi-byte-nops-commonly-understood-macros-or-other-notation) for more details. It should be decoded in the following logic. 
- "0x0f 0x1a" is extended opcode.
- The ModR/M byte DE translates to binary 11011110 (0xde).
    - Bits 7-6 (Mod): 11 (binary) = 3 (decimal)
 Indicates register-direct addressing mode.
    - Bits 5-3 (Reg): 011 (binary) = 3 (decimal)
    Corresponds to the EBX (or RBX in 64-bit mode) register.
    - Bits 2-0 (R/M): 110 (binary) = 6 (decimal)
 Corresponds to the ESI (or RSI in 64-bit mode) register.

XED also translates "0f 1a de" into "nop esi, ebx".

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJycV0tz6joS_jXKpsuULYOBBQue91KVhExIzmRmQ8l2GzTHlhhJTsj99VMt88whObnjogDZ_fj09UNtYa1cK8QB64xYZ3IjarfRZnAnTPz-741Q65tU5-8DxuPmA__U5iegepVGqwqVY-GEhcP9d3cM_6jROqmVhd9cJDxU9g2NPWoH374-ET7hWCwZnwmTbRifpdJ9DYcUdr1klbThOa2Vq4GHrbB9sjY02UY6zFxt8JuWTspLXZsMQRcwFlvrtEIvxpJwLR1kpVbIkpDxMeRYiLp0oBU9rYR1aFgSQmqEyjatk8kfaKzUivGZN6GrSrpP0JTlaxVwckUeWGdUhD3e7bHOhPHexrmtZfGQ8Vlja1OnrUxXjM9Ib_8TbI3-D2aO8VnjivFZY4VnURpj1u8VnZDH3Twvsj7HpNtNwihOi4QXjPcvUiQeMx4FAczvx4vHx-n4CSbz5XC5nN6Nbv8Fo-c_lhAELJ5eKB2zr5JKVnUFubTCWqzSEg086DFsjV4bUe3lk7D5ZM1aKgeVkIrxHv0VZp0RF9lGGGB8KMz6tUl_AtsdNUqeP6JleFoDgNUVglTb2oEVSrp3yDaY_aTobnAH1hmp1lAYXYG3e2bLs3xpewZzJZ0UpfwL4fb2xx2IwqHZO3gVpcwFVdNJjYROOsOyfBJmjW6uCm0Z79EO4sMOPpH9Re5zs3fj3xudnMXio_SlAy9ZjbVyuHOPWDRhrIDFE_98bFA4bKS8Ib4vJc4pXvfPt7f0G54vmt_LzcgCvHqU70314RRWNGbHeC_y1cD51BhtWDy8DqAPhZAl5i0P4dIL604-bnIf0yW6ywzVW98Syc3WUArKqsJcCocWhKXEITS1RZgrhyXYd-XE7sqGCOQSXYNwsbfKe4d9jk80712vGqHVA7mdV9WfuKOucBbLa9JDW3kFND-EkUI52vsFi_vrKzKv4PyC0Cts-vrUtdvWbtXUFeuM7oYvq8Xz08Pz0-p2ev_H059UtudhqaVySXvlQOS5QWt9doUXIlb-hSsHUllnViWqYwI2eOd0v8583Z1za8TbKn13aGnh_5AyLfauGD8Qe4maj71LXTDe-_Ckf71amoCfA5xS2p9HwCdS0ZQJ4x3LOmO1L5RffHxIW8DS4iemjiF8ViItEZw-T2Vwm0P38wS0jl6vhvL459CQr3f1M8Ibs7CWr76N0mn4DgZtXTp7oXxpMSwgEpDjt9xNd1vMHObX7M4d2I2uyxxSpHPxmj2lt4BWEtOY7r7lcpi5WpTl11uxm4Nau8X4rJJqRafv6qKVMM7DIhI5Eute_O9G7Dt4F26DBjJh0YJFrMjomzY__3_ERRgeEROBdOVv2uSwdYbGEiN2VMpfoDuMDsPJZP40X9wPb2G8uH-avjx9NTIM81xSZokSbvXa167NDKKyG-2aZTOfZZpIHQNkWhVyXRt_8kJeV1u63WrtqTvLVgtv0m1Ab0nXghFqfTz9WRJSVvZofnP6sCxoKQzSqCcV5tT-q7p0MqDowP3igQr_XsNiiw0Aqnp57lIqmjBhvhy24KFEYREMFmjICwWbdUZLJ7Kfi1c0RanfYKutuzbuWZLSe6n91Pffw_jO-Ix3Ou1OuxvSFKjVOjjhDJTe2oBmQa3K96BWORrrtM6DSmRG20CbQFMKBUq74y4KbaDSfu9OyNK24LzaIEdiMaf90TYKXZb6jegs9VpmLWjYD3w-7cICwl0kGOcgLeDOoSLdJhKtg-jTBuFO54-Mz-58-sNkCs4IZUt_9joNqVTCvEMUhVEURSGxH-4oEfqtU88MYESvEN0goed3Ovf9bghRROvGhO_O8QRiupVjJitRHodfmKtcZt6nwbWkuT7IpcHseEzRRqsT9jOvncCbfMT13mv4fbcAMNbGoN1qldtDhkxHLyStDTyOXojvpB2k0nn_ZO8A8VcsPPAMeUKPHIS_okmuorkGZTk_QFnOfw-l-X6ZTkCUVp-H0neZ_SFASaGo5jj_0Ks53xu6yQdx3o_74gYHUTfmPEmSsHezGcRx3O7lSdRP057AdhIJnoQij7GT51Ge9G_kgIe8HUWch_0oDHutRHSTouh24l47T9JeytohVkKWLWqFLW3WN9LaGgdR1I3D9k0pUizt4X3bDPwLVlqvLWuHpbTOnvScdKV_M9_1EirgzuhuzDoTmAlZ-jbvK6Z5JaEK_dBJ7E1tysHffsXzaKn-94BfB_x_AQAA__-o3LF2">