[PATCH] D53363: [clangd] Encode Line/Column as a 32-bits integer.

Haojian Wu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Oct 17 06:32:03 PDT 2018


hokein added a comment.

In https://reviews.llvm.org/D53363#1267628, @sammccall wrote:

> (I think your math is off in the description: 20 bits should be 1M lines, not 4M)


Oops...Update the desccription.

> I think this is a win, as I think truncation will be rare and not terrible. We should document the intentions around truncation though.
> 
> Incidentally, this means replacing just the StringRef in SymbolLocation::Position with a char* would save another 13%, because that's also 8 bytes.

Yeah, I have a rough patch for it, using char* will save us ~50MB memory, which will lead to ~300 MB memory usage in total.

> (Obviously we'd probably replace the other StringRefs too, but it gives a lower bound).





================
Comment at: clangd/index/Index.h:41
   struct Position {
-    uint32_t Line = 0; // 0-based
+    static constexpr int LineBits = 20;
+    static constexpr int ColumnBits = 12;
----------------
sammccall wrote:
> (not sure these constants are needed as it stands - YAML shouldn't use them, see below)
I think we need, for testing, for setters, rather than using magic number.


================
Comment at: clangd/index/Index.h:46
     // Using UTF-16 code units.
-    uint32_t Column = 0; // 0-based
+    uint32_t Column : ColumnBits; // 0-based
   };
----------------
sammccall wrote:
> If we overflow the columns, it would be much easier to recognize if the result is always e.g. 255, rather than an essentially random number from 0-255 (from modulo 256 arithmetic).
> 
> This would mean replacing the raw fields with setters/getters, which is obviously a more invasive change. What about a compromise: add the setters, and call them from the most important codepaths: `SymbolCollector` and `Serialization`.
It seems reasonable to use maximum value if we encounter overflows.

> This would mean replacing the raw fields with setters/getters, which is obviously a more invasive change. What about a compromise: add the setters, and call them from the most important codepaths: SymbolCollector and Serialization.

Actually, there is not too much usage of the raw fields in the source code, I'd prefer to use all setters/getters in all places (and it would make the code consistent and more safe). How about this plan?

1. I update the patch to change all raw fields usage to getters/setters, and keep these raw fields public
2. do a cleanup internally
3. make these raw fields private


================
Comment at: clangd/index/YAMLSerialization.cpp:97
 
-template <> struct MappingTraits<SymbolLocation::Position> {
-  static void mapping(IO &IO, SymbolLocation::Position &Value) {
-    IO.mapRequired("Line", Value.Line);
-    IO.mapRequired("Column", Value.Column);
+struct NormalizedPosition {
+  using Position = clang::clangd::SymbolLocation::Position;
----------------
sammccall wrote:
> I don't think there's any reason to pack the YAML encoding.

The YAML here is a bit tricky, the previous code could not compile because we can not bind bit-field to Non-const references, I added another traits to keep the original YAML encoding (which is more readable).


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D53363





More information about the cfe-commits mailing list