[PATCH] D52300: [clangd] Implement VByte PostingList compression

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Sep 21 08:28:31 PDT 2018


sammccall added a comment.

In https://reviews.llvm.org/D52300#1241754, @kbobyrev wrote:

> I addressed the easiest issues. I'll try to implement separate storage structure for `Head`s and `Payload`s which would potentially make the implementation cleaner and easier to understand (and also more maintainable since that would be way easier to go for SIMD instructions speedups and other encoding schemes if we do that).


That doesn't sound more maintainable, that sounds like a performance hack that will hurt the layering.
Which is ok :-) but please don't do that until you measure a nontrivial performance improvement from it.

> Also, I'll refine https://reviews.llvm.org/D52047 a little bit and I believe that is should be way easier to understand performance + memory consumption once we have these benchmarks in. Both @ioeric and @ilya-biryukov expressed their concern with regard to the memory consumption "benchmark" and suggested a separate binary. While this seems fine to me, I think it's important to keep performance + memory tracking infrastructure easy to use (in this sense scattering different metrics across multiple binaries makes it less accessible and probably introduce some code duplication) and therefore using this "trick" is OK to me, but I don't have a strong opinion about this. What do you think, @sammccall?

I may be missing some context, but you're talking about index idle memory usage right?
Can't this just be a dexp command?
No objection to annotating benchmark runs with memory usage too, but I wouldn't jump through hoops unless there's a strong reason.



================
Comment at: clang-tools-extra/clangd/index/dex/PostingList.cpp:121
 
+/// Single-byte masks used for VByte compression bit manipulations.
+constexpr uint8_t SevenBytesMask = 0x7f;  // 0b01111111
----------------
kbobyrev wrote:
> sammccall wrote:
> > move to the function where they're used
> But they're used both in `encodeStream()` and `decompress()`. I tried to move as much static constants to functions where they're used, but these masks are useful for both encoding and decoding. Is there something I should do instead (e.g. make them members of `PostingList`)?
You're right.
My real objection here is that these decls are hard to understand here, and the code that uses them is also hard to understand.

I think this is because they aren't very powerful abstractions (the detail abstracted is limited, and it's not strongly abstracted) and the names aren't sufficiently good. (I don't have great ones, though not confusing bits and bytes would be a start :-)

My top suggestion is to inline these everywhere they're used. This is bit twiddling code, not knowing what the bits are can obscure understanding hard, and encourage you to write the code in a falsely general way.

Failing that, SevenBytesMask -> LowBits and ContinuationBit -> More or HighBit?
FourBytesMask isn't needed, you won't be decoding any invalid data.



https://reviews.llvm.org/D52300





More information about the cfe-commits mailing list