[llvm] [MC][AsmLexer] 'LexToken()': fix potential buffer overflow. (PR #105312)

via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 21 06:31:54 PDT 2024


PavelKopyl wrote:

> Do you have a test (perhaps under `llvm/test/MC/AsmParser`) to demonstrate the issue?

Unfortunately, it's not easy to reproduce that on TOT. I faced it on our downstream target, where we also have a custom MC Assembler C-API (much like the LLVM C-API). This MC API employs MemoryBuffer objects for In/Out. Nevertheless, I can demonstrate the idea of how the issue appears by making a slight change in the llvm-mc tool:
```
--- a/llvm/tools/llvm-mc/llvm-mc.cpp
+++ b/llvm/tools/llvm-mc/llvm-mc.cpp
@@ -382,12 +382,22 @@ int main(int argc, char **argv) {
         << InputFilename << ": " << EC.message() << '\n';
     return 1;
   }
-  MemoryBuffer *Buffer = BufferPtr->get();
+
+  size_t BufSize = BufferPtr->get()->getBufferSize();
+  auto CharBuf = std::make_unique<char[]>(BufSize);
+  std::memcpy(CharBuf.get(),
+              BufferPtr->get()->getBuffer().data(),
+              BufSize);
+  StringRef StrBuf(CharBuf.get(), BufSize);
+  std::unique_ptr<MemoryBuffer> BufCopyPtr =
+      MemoryBuffer::getMemBuffer(StrBuf, "", /*RequiresNullTerminator=*/false);
+
+  MemoryBuffer *Buffer = BufCopyPtr.get();
 
   SourceMgr SrcMgr;
 
   // Tell SrcMgr about this buffer, which is what the parser will pick up.
-  SrcMgr.AddNewSourceBuffer(std::move(*BufferPtr), SMLoc());
+  SrcMgr.AddNewSourceBuffer(std::move(BufCopyPtr), SMLoc());
 
   // Record the location of the include directories so that the lexer can find
   // it later.
```
Then I just build LLVM with the enabled address sanitizer and run `llvm-mc` on any asm file. For instance:

```
bin/llvm-mc -filetype=obj --triple=riscv32 ~/llvm-project/llvm/test/MC/RISCV/relocations.s -o /dev/null
=================================================================
==49894==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x000109707c5c at pc 0x00010336c270 bp 0x00016d6a3f90 sp 0x00016d6a3f88
READ of size 1 at 0x000109707c5c thread T0
    #0 0x10336c26c in llvm::AsmLexer::LexToken() AsmLexer.cpp:755
    #1 0x10276a1f0 in llvm::MCAsmLexer::Lex() MCAsmLexer.h:87
    #2 0x10337e924 in (anonymous namespace)::AsmParser::Lex() AsmParser.cpp:921
    #3 0x103386d10 in (anonymous namespace)::AsmParser::parseStatement((anonymous namespace)::ParseStatementInfo&, llvm::MCAsmParserSemaCallback*) AsmParser.cpp:1792
    #4 0x1033748d0 in (anonymous namespace)::AsmParser::Run(bool, bool) AsmParser.cpp:1001
    #5 0x102762a70 in AssembleInput(char const*, llvm::Target const*, llvm::SourceMgr&, llvm::MCContext&, llvm::MCStreamer&, llvm::MCAsmInfo&, llvm::MCSubtargetInfo&, llvm::MCInstrInfo&, llvm::MCTargetOptions const&) llvm-mc.cpp:342
    #6 0x10275ecc8 in main llvm-mc.cpp:588
    #7 0x19793b150  (<unknown module>)

0x000109707c5c is located 0 bytes after 7004-byte region [0x000109706100,0x000109707c5c)
allocated by thread T0 here:
    #0 0x1071a5874 in wrap__Znam+0x74 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x61874)
    #1 0x10275c198 in main llvm-mc.cpp:387
    #2 0x19793b150  (<unknown module>)

SUMMARY: AddressSanitizer: heap-buffer-overflow AsmLexer.cpp:755 in llvm::AsmLexer::LexToken()
Shadow bytes around the buggy address:
  0x000109707980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000109707a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000109707a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000109707b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000109707b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x000109707c00: 00 00 00 00 00 00 00 00 00 00 00[04]fa fa fa fa
  0x000109707c80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x000109707d00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x000109707d80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x000109707e00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x000109707e80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
```

The change in llvm-mc.cpp is quite simple: I just create one more buffer and copy the original buffer's content. The new buffer is created without a null-terminator, which should be absolutely fine and should not affect (as far as I can see) the AsmParser logic. 



https://github.com/llvm/llvm-project/pull/105312


More information about the llvm-commits mailing list