[llvm] [MC][AsmLexer] 'LexToken()': fix potential buffer overflow. (PR #105312)
via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 21 06:31:54 PDT 2024
PavelKopyl wrote:
> Do you have a test (perhaps under `llvm/test/MC/AsmParser`) to demonstrate the issue?
Unfortunately, it's not easy to reproduce that on TOT. I faced it on our downstream target, where we also have a custom MC Assembler C-API (much like the LLVM C-API). This MC API employs MemoryBuffer objects for In/Out. Nevertheless, I can demonstrate the idea of how the issue appears by making a slight change in the llvm-mc tool:
```
--- a/llvm/tools/llvm-mc/llvm-mc.cpp
+++ b/llvm/tools/llvm-mc/llvm-mc.cpp
@@ -382,12 +382,22 @@ int main(int argc, char **argv) {
<< InputFilename << ": " << EC.message() << '\n';
return 1;
}
- MemoryBuffer *Buffer = BufferPtr->get();
+
+ size_t BufSize = BufferPtr->get()->getBufferSize();
+ auto CharBuf = std::make_unique<char[]>(BufSize);
+ std::memcpy(CharBuf.get(),
+ BufferPtr->get()->getBuffer().data(),
+ BufSize);
+ StringRef StrBuf(CharBuf.get(), BufSize);
+ std::unique_ptr<MemoryBuffer> BufCopyPtr =
+ MemoryBuffer::getMemBuffer(StrBuf, "", /*RequiresNullTerminator=*/false);
+
+ MemoryBuffer *Buffer = BufCopyPtr.get();
SourceMgr SrcMgr;
// Tell SrcMgr about this buffer, which is what the parser will pick up.
- SrcMgr.AddNewSourceBuffer(std::move(*BufferPtr), SMLoc());
+ SrcMgr.AddNewSourceBuffer(std::move(BufCopyPtr), SMLoc());
// Record the location of the include directories so that the lexer can find
// it later.
```
Then I just build LLVM with the enabled address sanitizer and run `llvm-mc` on any asm file. For instance:
```
bin/llvm-mc -filetype=obj --triple=riscv32 ~/llvm-project/llvm/test/MC/RISCV/relocations.s -o /dev/null
=================================================================
==49894==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x000109707c5c at pc 0x00010336c270 bp 0x00016d6a3f90 sp 0x00016d6a3f88
READ of size 1 at 0x000109707c5c thread T0
#0 0x10336c26c in llvm::AsmLexer::LexToken() AsmLexer.cpp:755
#1 0x10276a1f0 in llvm::MCAsmLexer::Lex() MCAsmLexer.h:87
#2 0x10337e924 in (anonymous namespace)::AsmParser::Lex() AsmParser.cpp:921
#3 0x103386d10 in (anonymous namespace)::AsmParser::parseStatement((anonymous namespace)::ParseStatementInfo&, llvm::MCAsmParserSemaCallback*) AsmParser.cpp:1792
#4 0x1033748d0 in (anonymous namespace)::AsmParser::Run(bool, bool) AsmParser.cpp:1001
#5 0x102762a70 in AssembleInput(char const*, llvm::Target const*, llvm::SourceMgr&, llvm::MCContext&, llvm::MCStreamer&, llvm::MCAsmInfo&, llvm::MCSubtargetInfo&, llvm::MCInstrInfo&, llvm::MCTargetOptions const&) llvm-mc.cpp:342
#6 0x10275ecc8 in main llvm-mc.cpp:588
#7 0x19793b150 (<unknown module>)
0x000109707c5c is located 0 bytes after 7004-byte region [0x000109706100,0x000109707c5c)
allocated by thread T0 here:
#0 0x1071a5874 in wrap__Znam+0x74 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x61874)
#1 0x10275c198 in main llvm-mc.cpp:387
#2 0x19793b150 (<unknown module>)
SUMMARY: AddressSanitizer: heap-buffer-overflow AsmLexer.cpp:755 in llvm::AsmLexer::LexToken()
Shadow bytes around the buggy address:
0x000109707980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000109707a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000109707a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000109707b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000109707b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x000109707c00: 00 00 00 00 00 00 00 00 00 00 00[04]fa fa fa fa
0x000109707c80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x000109707d00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x000109707d80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x000109707e00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x000109707e80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
```
The change in llvm-mc.cpp is quite simple: I just create one more buffer and copy the original buffer's content. The new buffer is created without a null-terminator, which should be absolutely fine and should not affect (as far as I can see) the AsmParser logic.
https://github.com/llvm/llvm-project/pull/105312
More information about the llvm-commits
mailing list