[llvm] [X86] Ignore REX prefixes not immediately before opcode (PR #117299)

Aiden Grossman via llvm-commits llvm-commits at lists.llvm.org
Fri Nov 22 10:51:13 PST 2024


https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/117299

>From f2510112ca7fa70c382c837c6219175e4f681199 Mon Sep 17 00:00:00 2001
From: Aiden Grossman <aidengrossman at google.com>
Date: Fri, 22 Nov 2024 07:11:55 +0000
Subject: [PATCH 1/2] [X86] Ignore REX prefixes not immediately before opcode

The Intel X86 Architecture Manual says the following:

> A REX prefix is ignored, as are its individual bits, when it is not needed
> for an instruction or when it does not immediately precede the opcode byte or
> the escape opcode byte (0FH) of an instruction for which it is needed. This
> has the implication that only one REX prefix, properly located, can affect an
> instruction.

We currently do not handle these cases in the disassembler, leading to
incorrect disassembly. This patch rectifies the situation by treating
REX prefixes as standard prefixes rather than only expecting them before
the Opcode.

The motivating test case added as a test was fuzzer generated.
---
 .../lib/Target/X86/Disassembler/X86Disassembler.cpp | 13 ++++++++-----
 llvm/test/MC/Disassembler/X86/x86-64.txt            |  5 +++++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp b/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp
index c3eae294919f3c..c27177484f55a4 100644
--- a/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp
+++ b/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp
@@ -329,6 +329,14 @@ static int readPrefixes(struct InternalInstruction *insn) {
       break;
     }
 
+    if (isREX(insn, byte)) {
+      insn->rexPrefix = byte;
+      isPrefix = true;
+      LLVM_DEBUG(dbgs() << format("Found REX prefix 0x%hhx", byte));
+    } else if (isPrefix) {
+      insn->rexPrefix = 0;
+    }
+
     if (isPrefix)
       LLVM_DEBUG(dbgs() << format("Found prefix 0x%hhx", byte));
   }
@@ -506,11 +514,6 @@ static int readPrefixes(struct InternalInstruction *insn) {
     LLVM_DEBUG(dbgs() << format("Found REX2 prefix 0x%hhx 0x%hhx",
                                 insn->rex2ExtensionPrefix[0],
                                 insn->rex2ExtensionPrefix[1]));
-  } else if (isREX(insn, byte)) {
-    if (peek(insn, nextByte))
-      return -1;
-    insn->rexPrefix = byte;
-    LLVM_DEBUG(dbgs() << format("Found REX prefix 0x%hhx", byte));
   } else
     --insn->readerCursor;
 
diff --git a/llvm/test/MC/Disassembler/X86/x86-64.txt b/llvm/test/MC/Disassembler/X86/x86-64.txt
index 8d6564dd098990..b41226766a194e 100644
--- a/llvm/test/MC/Disassembler/X86/x86-64.txt
+++ b/llvm/test/MC/Disassembler/X86/x86-64.txt
@@ -770,3 +770,8 @@
 
 # CHECK: prefetchit1 (%rip)
 0x0f,0x18,0x35,0x00,0x00,0x00,0x00
+
+# Check that we correctly ignore a REX prefix that is not immediately before
+# the opcode.
+# CHECK: orw $25659, %ax
+0x66 0x4c 0x64 0x0d 0x3b 0x64

>From d14b59bd8e4d54e5c8fa6a33958badaed7d4f28e Mon Sep 17 00:00:00 2001
From: Aiden Grossman <aidengrossman at google.com>
Date: Fri, 22 Nov 2024 18:51:02 +0000
Subject: [PATCH 2/2] Address feedback

---
 llvm/test/MC/Disassembler/X86/x86-64.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/llvm/test/MC/Disassembler/X86/x86-64.txt b/llvm/test/MC/Disassembler/X86/x86-64.txt
index b41226766a194e..9a18097c8f9623 100644
--- a/llvm/test/MC/Disassembler/X86/x86-64.txt
+++ b/llvm/test/MC/Disassembler/X86/x86-64.txt
@@ -772,6 +772,7 @@
 0x0f,0x18,0x35,0x00,0x00,0x00,0x00
 
 # Check that we correctly ignore a REX prefix that is not immediately before
-# the opcode.
+# the opcode. REX prefixes not immediately preceding the Opcode are ignored
+# according to Section 2.2.1 of the Intel 64 Architecture Manual.
 # CHECK: orw $25659, %ax
 0x66 0x4c 0x64 0x0d 0x3b 0x64



More information about the llvm-commits mailing list