[PATCH] D115043: [clang][deps] Use MemoryBuffer in minimizing FS
Duncan P. N. Exon Smith via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 6 16:36:12 PST 2021
dexonsmith added inline comments.
================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:64-70
MinimizedFileContents.push_back('\0');
- Result.Contents = std::move(MinimizedFileContents);
- // Now make the null terminator implicit again, so that Clang's lexer can find
- // it right where the buffer ends.
- Result.Contents.pop_back();
+ Result.Contents = std::make_unique<llvm::SmallVectorMemoryBuffer>(
+ std::move(MinimizedFileContents), [](SmallVectorImpl<char> &SV) {
+ // Now make the null terminator implicit again, so that Clang's lexer
+ // can find it right where the buffer ends.
+ SV.pop_back();
+ });
----------------
This is a bit simpler:
```
lang=c++
Result.Contents = MinimizedFileContents.isSmall()
? MemoryBuffer::getMemBufferCopy(...)
: std::make_unique<llvm::SmallVectorMemoryBuffer>();
```
and doesn't require changing SmallVectorMemoryBuffer.
Another option is:
```
lang=c++
{
// Move to the heap and ensure null-terminated to ensure the MemoryBuffer works.
SmallVector<char, 0> Heap(std::move(MinimizedFileContents));
Heap.push_back(0);
Heap.pop_back();
Result.Contents = std::make_unique<llvm::SmallVectorMemoryBuffer>(std::move(Heap));
}
```
The latter seems like reasonable logic for the SmallVectorMemoryBuffer constructor though.
================
Comment at: llvm/include/llvm/Support/SmallVectorMemoryBuffer.h:54
+ /// and invoke the given function right after the move.
+ SmallVectorMemoryBuffer(
+ SmallVectorImpl<char> &&SV,
----------------
jansvoboda11 wrote:
> I'm not happy with introducing new (hacky) constructor.
>
> But, the best alternative I could come up with is to avoid using `SmallVectorMemoryBuffer`, store the `SmallString` with minimized contents somewhere in the filesystem and refer to it via regular `MemoryBuffer`. This seems needlessly complicated, so hacky constructor it is...
>
> Side question: is the hassle with implicit null-termination (expected by Clang's lexer) worth the complications? What are the benefits anyway?
It'd be better to add a `RequiresNullTerminator` argument to the constructor, matching the MemoryBuffer factory functions, which (if set) does the `push_back`/`pop_back` dance, and in either case passes it through to `init()`.
```
lang=c++
SmallVectorMemoryBuffer(SmallVectorImpl<char> &&SV, bool RequiresNullTerminator = true)
: SV(std::move(SV)), BufferName("<in-memory object>") {
if (RequiresNullTerminator) {
this->SV.push_back(0);
this->SV.pop_back(0);
}
init(this->SV.begin(), this->SV.end(), RequiresNullTerminator);
}
```
If you do that, you should update the (few) existing callers to explicitly pass `false`. Note that `RequiresNullTerminator` defaults to true in all the other `MemoryBuffer` APIs so it'd be confusing to default it to `false` here.
> Side question: is the hassle with implicit null-termination (expected by Clang's lexer) worth the complications? What are the benefits anyway?
Not sure whether it's a micro-optimization for getting better codegen or if it just predates `StringRef`, which made it much easier to reason about a byte range as a single unit (rather than tracking two separate pointers, which is more error-prone). I somewhat doubt it's worth it, which is why when I wrote the minimizer I used `StringRef`, but I don't have data and it'd be a lot of work to change. (The minimizer is fast enough as-is, but it does less work than the full Clang lexer. It's plausible that the optimizations matter there?)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D115043/new/
https://reviews.llvm.org/D115043
More information about the llvm-commits
mailing list