[PATCH] D52503: [clangd] Fix bugs with incorrect memory estimate report
Kirill Bobyrev via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Sep 26 06:51:36 PDT 2018
kbobyrev added inline comments.
================
Comment at: clang-tools-extra/clangd/index/dex/Dex.cpp:251
Bytes += InvertedIndex.getMemorySize();
- for (const auto &P : InvertedIndex)
- Bytes += P.second.bytes();
+ for (const auto &TokenToPostingList : InvertedIndex)
+ Bytes += TokenToPostingList.first.Data.size() +
----------------
ioeric wrote:
> kbobyrev wrote:
> > ioeric wrote:
> > > kbobyrev wrote:
> > > > ioeric wrote:
> > > > > Would `InvertedIndex.getMemorySize()` be a better estimate?
> > > > IIUC this is only precise for the objects which do not make any allocations (e.g. `std::vector` and `std::string` memory estimate would not be "correct").
> > > >
> > > > From the documentation
> > > >
> > > > > This is just the raw memory used by DenseMap. If entries are pointers to objects, the size of the referenced objects are not included.
> > > >
> > > > I also have `Bytes += InvertedIndex.getMemorySize();` above, so the purpose of this code would be to take into account the size of these "reference objects".
> > > >
> > > > However, since `PostingList::bytes()` also returns underlying `std::vector` size, this code will probably add these `std::vector` objects size twice (the first one was in `InvertedIndex.getMemorySize()`). I should fix that.
> > > > `If entries are pointers to objects, the size of the referenced objects are not included.`
> > > This applies to *pointers* to objects, but the `PostingList`s in InvertedList are not pointerd? So it seems to me that `InvertedIndex.getMemorySize()` covers everything in the `InvertedIndex`. One way to verify is probably check the size calculated with loop against `InvertedIndex.getMemorySize()`.
> > As discussed offline, pointers and references are an example of objects which would be incorrectly measured by `DesneMap::getMemorySize()`, but `std::vector` and `std::string` are pointers in a way because they also just store a pointer to the underlying storage which is hidden from `DenseMap::getMemorySize()`; thus, it would be more precise to use the proposed scheme for memory estimation.
> I see. I was expecting DenseMap to maintain an arena which could provide memory usage of all owned data, but it seems to only consider `sizeof(KV-Pair)`...
>
> If the posting lists (vectors) are not covered, shouldn't we also also add the size of tokens (string) here?
Yes, I was also adding `TokenToPostingList.first.Data.size()` to the size estimate in the previous revision snapshot, but as @sammccall pointed out, there are multiple string implementations and one of them (IIUC this one is currently pushed in the C++ Standard) utilizes Small Object Optimization (i.e. it behaves like `llvm::SmallVector` rather than `std::vector`). Therefore, it does not perform external allocations before some limit is reached. I don't know whether it is possible to know whether the external allocation happened or not and the summary of our discussion with Sam was that we should just omit this.
https://reviews.llvm.org/D52503
More information about the cfe-commits
mailing list