[PATCH] D53363: [clangd] Encode Line/Column as a 32-bits integer.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Oct 17 23:47:16 PDT 2018


sammccall accepted this revision.
sammccall added a comment.
This revision is now accepted and ready to land.

Nice, just nits!

In https://reviews.llvm.org/D53363#1267721, @hokein wrote:

> Yeah, I have a rough patch for it, using char* will save us ~50MB memory, which will lead to ~300 MB memory usage in total.


For just the StringRef in SymbolLocation::Position, or for all our strings?
If the latter, that seems too low, as we should save strictly more than this patch (unless I'm missing something about alignment/padding)



================
Comment at: clangd/index/Index.h:41
   struct Position {
-    uint32_t Line = 0; // 0-based
+    static constexpr int LineBits = 20;
+    static constexpr int ColumnBits = 12;
----------------
hokein wrote:
> sammccall wrote:
> > (not sure these constants are needed as it stands - YAML shouldn't use them, see below)
> I think we need, for testing, for setters, rather than using magic number.
In that case I'd suggest static constexpr `MaxLine = 1 << 20 - 1`, `MaxColumn = 1 << 12 - 1`, as that's what the tests (and potentially some warning checks in the code later?) need, so we encapsulate a little more detail. But up to you.

(Writing 20 twice a couple of lines apart seems fine to me)


================
Comment at: clangd/index/Index.h:46
     // Using UTF-16 code units.
-    uint32_t Column = 0; // 0-based
+    uint32_t Column : ColumnBits; // 0-based
   };
----------------
hokein wrote:
> sammccall wrote:
> > If we overflow the columns, it would be much easier to recognize if the result is always e.g. 255, rather than an essentially random number from 0-255 (from modulo 256 arithmetic).
> > 
> > This would mean replacing the raw fields with setters/getters, which is obviously a more invasive change. What about a compromise: add the setters, and call them from the most important codepaths: `SymbolCollector` and `Serialization`.
> It seems reasonable to use maximum value if we encounter overflows.
> 
> > This would mean replacing the raw fields with setters/getters, which is obviously a more invasive change. What about a compromise: add the setters, and call them from the most important codepaths: SymbolCollector and Serialization.
> 
> Actually, there is not too much usage of the raw fields in the source code, I'd prefer to use all setters/getters in all places (and it would make the code consistent and more safe). How about this plan?
> 
> 1. I update the patch to change all raw fields usage to getters/setters, and keep these raw fields public
> 2. do a cleanup internally
> 3. make these raw fields private
SGTM!


================
Comment at: clangd/index/Index.h:38
+  // Position is encoded into 32 bits to save space.
+  // If Line/Column is overflow, the value will be their maximum value.
   struct Position {
----------------
nit: no "is", overflow is a verb


================
Comment at: clangd/index/Index.h:41
+    void setLine(uint32_t Line);
+    uint32_t line() const;
+    void setColumn(uint32_t Column);
----------------
nit: define the trivial getters here so the compiler can always inline them.
(Setters seem fine out-of-line)


================
Comment at: clangd/index/Index.h:66
                        const SymbolLocation::Position &R) {
-  return std::tie(L.Line, L.Column) == std::tie(R.Line, R.Column);
+  return std::make_tuple(L.line(), L.column()) ==
+             std::make_tuple(R.Line, R.Column);
----------------
Why only using the getters on one side?


================
Comment at: clangd/index/YAMLSerialization.cpp:98
 
-template <> struct MappingTraits<SymbolLocation::Position> {
-  static void mapping(IO &IO, SymbolLocation::Position &Value) {
-    IO.mapRequired("Line", Value.Line);
-    IO.mapRequired("Column", Value.Column);
+template <> struct MappingTraits<std::pair<uint32_t, uint32_t>> {
+  static void mapping(IO &IO, std::pair<uint32_t, uint32_t> &Value) {
----------------
I don't think specializing the traits for `pair<uint32,uint32>` is OK, specializing for common types we don't own is asking for ODR violations.

prefer to define an custom struct for this (in fact you do, so just map that struct instead of struct->P?)


================
Comment at: clangd/index/YAMLSerialization.cpp:104-105
 };
 
+struct NormalizedPosition {
+  using Position = clang::clangd::SymbolLocation::Position;
----------------
please document why this normalization (YamlIO can't directly map bitfields)


================
Comment at: clangd/index/YAMLSerialization.cpp:109
+  NormalizedPosition(IO &, const Position &Pos) {
+    static_assert(
+        sizeof(Position) == sizeof(uint32_t),
----------------
no need for this assert


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D53363





More information about the cfe-commits mailing list