[PATCH] D53363: [clangd] Encode Line/Column as a 32-bits integer.
Haojian Wu via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Oct 17 06:32:03 PDT 2018
hokein added a comment.
In https://reviews.llvm.org/D53363#1267628, @sammccall wrote:
> (I think your math is off in the description: 20 bits should be 1M lines, not 4M)
Oops...Update the desccription.
> I think this is a win, as I think truncation will be rare and not terrible. We should document the intentions around truncation though.
>
> Incidentally, this means replacing just the StringRef in SymbolLocation::Position with a char* would save another 13%, because that's also 8 bytes.
Yeah, I have a rough patch for it, using char* will save us ~50MB memory, which will lead to ~300 MB memory usage in total.
> (Obviously we'd probably replace the other StringRefs too, but it gives a lower bound).
================
Comment at: clangd/index/Index.h:41
struct Position {
- uint32_t Line = 0; // 0-based
+ static constexpr int LineBits = 20;
+ static constexpr int ColumnBits = 12;
----------------
sammccall wrote:
> (not sure these constants are needed as it stands - YAML shouldn't use them, see below)
I think we need, for testing, for setters, rather than using magic number.
================
Comment at: clangd/index/Index.h:46
// Using UTF-16 code units.
- uint32_t Column = 0; // 0-based
+ uint32_t Column : ColumnBits; // 0-based
};
----------------
sammccall wrote:
> If we overflow the columns, it would be much easier to recognize if the result is always e.g. 255, rather than an essentially random number from 0-255 (from modulo 256 arithmetic).
>
> This would mean replacing the raw fields with setters/getters, which is obviously a more invasive change. What about a compromise: add the setters, and call them from the most important codepaths: `SymbolCollector` and `Serialization`.
It seems reasonable to use maximum value if we encounter overflows.
> This would mean replacing the raw fields with setters/getters, which is obviously a more invasive change. What about a compromise: add the setters, and call them from the most important codepaths: SymbolCollector and Serialization.
Actually, there is not too much usage of the raw fields in the source code, I'd prefer to use all setters/getters in all places (and it would make the code consistent and more safe). How about this plan?
1. I update the patch to change all raw fields usage to getters/setters, and keep these raw fields public
2. do a cleanup internally
3. make these raw fields private
================
Comment at: clangd/index/YAMLSerialization.cpp:97
-template <> struct MappingTraits<SymbolLocation::Position> {
- static void mapping(IO &IO, SymbolLocation::Position &Value) {
- IO.mapRequired("Line", Value.Line);
- IO.mapRequired("Column", Value.Column);
+struct NormalizedPosition {
+ using Position = clang::clangd::SymbolLocation::Position;
----------------
sammccall wrote:
> I don't think there's any reason to pack the YAML encoding.
The YAML here is a bit tricky, the previous code could not compile because we can not bind bit-field to Non-const references, I added another traits to keep the original YAML encoding (which is more readable).
Repository:
rCTE Clang Tools Extra
https://reviews.llvm.org/D53363
More information about the cfe-commits
mailing list