[PATCH] D115043: [clang][deps] Use MemoryBuffer in minimizing FS

Duncan P. N. Exon Smith via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 6 16:36:12 PST 2021


dexonsmith added inline comments.


================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:64-70
   MinimizedFileContents.push_back('\0');
-  Result.Contents = std::move(MinimizedFileContents);
-  // Now make the null terminator implicit again, so that Clang's lexer can find
-  // it right where the buffer ends.
-  Result.Contents.pop_back();
+  Result.Contents = std::make_unique<llvm::SmallVectorMemoryBuffer>(
+      std::move(MinimizedFileContents), [](SmallVectorImpl<char> &SV) {
+        // Now make the null terminator implicit again, so that Clang's lexer
+        // can find it right where the buffer ends.
+        SV.pop_back();
+      });
----------------
This is a bit simpler:
```
lang=c++
Result.Contents = MinimizedFileContents.isSmall()
                    ? MemoryBuffer::getMemBufferCopy(...)
                    : std::make_unique<llvm::SmallVectorMemoryBuffer>();
```
and doesn't require changing SmallVectorMemoryBuffer.

Another option is:
```
lang=c++
{
  // Move to the heap and ensure null-terminated to ensure the MemoryBuffer works.
  SmallVector<char, 0> Heap(std::move(MinimizedFileContents));
  Heap.push_back(0);
  Heap.pop_back();
  Result.Contents = std::make_unique<llvm::SmallVectorMemoryBuffer>(std::move(Heap));
}
```
The latter seems like reasonable logic for the SmallVectorMemoryBuffer constructor though.


================
Comment at: llvm/include/llvm/Support/SmallVectorMemoryBuffer.h:54
+  /// and invoke the given function right after the move.
+  SmallVectorMemoryBuffer(
+      SmallVectorImpl<char> &&SV,
----------------
jansvoboda11 wrote:
> I'm not happy with introducing new (hacky) constructor.
> 
> But, the best alternative I could come up with is to avoid using `SmallVectorMemoryBuffer`, store the `SmallString` with minimized contents somewhere in the filesystem and refer to it via regular `MemoryBuffer`. This seems needlessly complicated, so hacky constructor it is...
> 
> Side question: is the hassle with implicit null-termination (expected by Clang's lexer) worth the complications? What are the benefits anyway?
It'd be better to add a `RequiresNullTerminator` argument to the constructor, matching the MemoryBuffer factory functions, which (if set) does the `push_back`/`pop_back` dance, and in either case passes it through to `init()`.
```
lang=c++
  SmallVectorMemoryBuffer(SmallVectorImpl<char> &&SV, bool RequiresNullTerminator = true)
      : SV(std::move(SV)), BufferName("<in-memory object>") {
    if (RequiresNullTerminator) {
      this->SV.push_back(0);
      this->SV.pop_back(0);
    }
    init(this->SV.begin(), this->SV.end(), RequiresNullTerminator);
  }
```

If you do that, you should update the (few) existing callers to explicitly pass `false`. Note that `RequiresNullTerminator` defaults to true in all the other `MemoryBuffer` APIs so it'd be confusing to default it to `false` here.

> Side question: is the hassle with implicit null-termination (expected by Clang's lexer) worth the complications? What are the benefits anyway?

Not sure whether it's a micro-optimization for getting better codegen or if it just predates `StringRef`, which made it much easier to reason about a byte range as a single unit (rather than tracking two separate pointers, which is more error-prone). I somewhat doubt it's worth it, which is why when I wrote the minimizer I used `StringRef`, but I don't have data and it'd be a lot of work to change. (The minimizer is fast enough as-is, but it does less work than the full Clang lexer. It's plausible that the optimizations matter there?)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115043/new/

https://reviews.llvm.org/D115043



More information about the llvm-commits mailing list