[llvm] [MC] AsmLexer invalid read fix. (PR #154972)

Szymon Piotr Milczek via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 9 04:07:56 PDT 2025


https://github.com/smilczek updated https://github.com/llvm/llvm-project/pull/154972

>From 00e8d9fb67d3f420d9109bce1e6b2f3fc6ad5457 Mon Sep 17 00:00:00 2001
From: "Milczek, Szymon" <szymon.milczek at intel.com>
Date: Fri, 22 Aug 2025 17:44:51 +0200
Subject: [PATCH 1/3] [MCParser] AsmLexer invalid read fix.

AsmLexer::LexToken() switch statement contains a loop that's meant to
skip past indentation. This loop however doesn't check if CurPtr is at
the end of CurBuf before dereferencing. This can cause an issue with
invalid reads.

This commit adds a condition `CurPtr != CurBuf.end()` to ensure no
invalid reads will be made.
---
 llvm/lib/MC/MCParser/AsmLexer.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/lib/MC/MCParser/AsmLexer.cpp b/llvm/lib/MC/MCParser/AsmLexer.cpp
index 968ccf776440b..e70eae7720258 100644
--- a/llvm/lib/MC/MCParser/AsmLexer.cpp
+++ b/llvm/lib/MC/MCParser/AsmLexer.cpp
@@ -878,7 +878,7 @@ AsmToken AsmLexer::LexToken() {
   case ' ':
   case '\t':
     IsAtStartOfStatement = OldIsAtStartOfStatement;
-    while (*CurPtr == ' ' || *CurPtr == '\t')
+    while (CurPtr != CurBuf.end() && (*CurPtr == ' ' || *CurPtr == '\t'))
       CurPtr++;
     if (SkipSpace)
       return LexToken(); // Ignore whitespace.

>From 16b45b791765fda5ed0b3e460b125697b5b7c103 Mon Sep 17 00:00:00 2001
From: "Milczek, Szymon" <szymon.milczek at intel.com>
Date: Tue, 9 Sep 2025 10:49:20 +0200
Subject: [PATCH 2/3] approach 2

---
 llvm/lib/MC/MCParser/AsmLexer.cpp | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/llvm/lib/MC/MCParser/AsmLexer.cpp b/llvm/lib/MC/MCParser/AsmLexer.cpp
index e70eae7720258..c27d378e1f8c5 100644
--- a/llvm/lib/MC/MCParser/AsmLexer.cpp
+++ b/llvm/lib/MC/MCParser/AsmLexer.cpp
@@ -878,7 +878,16 @@ AsmToken AsmLexer::LexToken() {
   case ' ':
   case '\t':
     IsAtStartOfStatement = OldIsAtStartOfStatement;
+#ifdef LLVM_DEBUG
+    // This block is for the purpose of out-of-bounds read being testable.
+    // CurPtr being a simple pointer doesn't contain any overhead verifying
+    // whether the memory where a read is attempted is valid.
+    // StringRef [] operator is used instead.
+    while (CurPtr != CurBuf.end() && (CurBuf[CurPtr - CurBuf.begin()] == ' ' ||
+                                      CurBuf[CurPtr - CurBuf.begin()] == '\t'))
+#else  // LLVM_DEBUG
     while (CurPtr != CurBuf.end() && (*CurPtr == ' ' || *CurPtr == '\t'))
+#endif // LLVM_DEBUG
       CurPtr++;
     if (SkipSpace)
       return LexToken(); // Ignore whitespace.

>From 3108689d39a682d7e0c8fdbd06b84270a5fe7ba9 Mon Sep 17 00:00:00 2001
From: "Milczek, Szymon" <szymon.milczek at intel.com>
Date: Tue, 9 Sep 2025 11:38:06 +0200
Subject: [PATCH 3/3] test

---
 llvm/test/MC/AsmParser/invalid-read.s | 5 +++++
 1 file changed, 5 insertions(+)
 create mode 100644 llvm/test/MC/AsmParser/invalid-read.s

diff --git a/llvm/test/MC/AsmParser/invalid-read.s b/llvm/test/MC/AsmParser/invalid-read.s
new file mode 100644
index 0000000000000..7247cbff9e9a6
--- /dev/null
+++ b/llvm/test/MC/AsmParser/invalid-read.s
@@ -0,0 +1,5 @@
+# RUN: llvm-mc -triple=x86_64 --as-lex %s
+
+# This test ensures AsmLexer doesn't perform an invalid read in a case where
+# buffer ends with '\0', ' ' or '\t'
+  
\ No newline at end of file



More information about the llvm-commits mailing list