[clang] [Format] Do not crash on non-null terminated strings (PR #131299)

Ilya Biryukov via cfe-commits cfe-commits at lists.llvm.org
Mon Mar 17 03:03:23 PDT 2025


================
@@ -2382,14 +2382,20 @@ size_t SourceManager::getDataStructureSizes() const {
 
 SourceManagerForFile::SourceManagerForFile(StringRef FileName,
                                            StringRef Content) {
+  // We copy to `std::string` for Context instead of StringRef because the
+  // SourceManager::getBufferData() works only with null-terminated buffers.
+  // And we still want to keep the API convenient.
+  ContentBuffer = Content.str();
----------------
ilya-biryukov wrote:

> How about storing a StringRef and assign Content if it ends with a zero, and only perform the copy if it doesn't?

But we cannot check if `StringRef` is null-terminated because given `StringRef S`, accessing `S[S.length()]` is UB in the general case. If there's a way to do this that's not UB, we could do that.

I think the easiest way to avoid copies would be to switch to `const char *` or `std::string` in the APIs to make it explicit we need null-terminated strings. That requires a refactoring of the callers, but ends up being as efficient as it is now.

> **My IDE (QtCreator) does uses libformat regularly while typing. I would also prefer not to perform copies all over the place.

But what are the actual costs of this particular copy for the overall performance of the IDE?
I expect it to be negligible, even though I fully sympathize with the idea that we want to avoid unnecessary copies. I just don't think we should do this at the expense of exposing API that have UB.

https://github.com/llvm/llvm-project/pull/131299


More information about the cfe-commits mailing list