[PATCH] D45513: [clangd] Add line and column number to the index symbol.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Apr 12 07:27:46 PDT 2018


sammccall added inline comments.


================
Comment at: clangd/index/Index.h:27
 struct SymbolLocation {
+  struct Position {
+    // Line position in a document (zero-based).
----------------
sammccall wrote:
> There are 4 of these per symbol, if we can keep line + character to 32 bits we save 16 bytes per symbol.
> 
> That looks like it might still be ~10% of non-string size. WDYT?
> (12 bits for column and 20 bits for line seems like a reasonable guess)
after some offline discussion, I no longer think this is a good idea.
We should strive to keep memory usage reasonable with the current naive index implementation, but "giant vector of Symbol structs in memory" isn't a case to micro-optimize for in the long run.


================
Comment at: clangd/index/Index.h:32
+    // Character offset on a line in a document (zero-based).
+    int Character = 0;
+  };
----------------
sammccall wrote:
> Column?
> 
> LSP calls this "character" but this is nonstandard and I find it very confusing with offset. 
We should document what this is an offset into: bytes, utf-16 code units, or unicode codepoints. (Or even grid offsets - glyphs and doublewidth are a thing)

Given that we intend to send it over LSP without reading the source, only utf-16 code units is really correct. Unicode codepoints is "nicer" and will give correct results in the BMP, while bytes will be correct for ASCII only.

I'd vote for making this utf-16 code units.

It's OK if the code populating it doesn't get this right (confuses bytes and code units) but add a fixme.


================
Comment at: clangd/index/Index.h:39
   // using half-open range, [StartOffset, EndOffset).
+  // FIXME(hokein): remove these fields in favor of Position.
   unsigned StartOffset = 0;
----------------
hokein wrote:
> sammccall wrote:
> > I don't think we should remove them, we'll just have the same problem in reverse.
> > Could position have have line/col/offset, and have Position Start, End?
> As discussed offline, we decide to remove them as we don't have real use case of them (we could add them back when needed).
> I removed all the references of them in clangd. And remove these fields in a separate patch.
After offline discussion: we don't actually plan to do math on these ever, we just send them to LSP clients.
So removing sounds fine. We can add later if there are clear use cases.


================
Comment at: clangd/index/Index.h:44
+  /// The symbol range, using half-open range [StartPos, EndPos).
+  Position StartPos;
+  Position EndPos;
----------------
Not sure about the abbreviation - maybe just start/end? Since offsets are going away.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D45513





More information about the cfe-commits mailing list