[PATCH] D45513: [clangd] Add line and column number to the index symbol.

Fangrui Song via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Apr 12 14:54:29 PDT 2018


MaskRay added inline comments.


================
Comment at: clangd/index/Index.h:32
+    // Character offset on a line in a document (zero-based).
+    int Character = 0;
+  };
----------------
hokein wrote:
> sammccall wrote:
> > sammccall wrote:
> > > Column?
> > > 
> > > LSP calls this "character" but this is nonstandard and I find it very confusing with offset. 
> > We should document what this is an offset into: bytes, utf-16 code units, or unicode codepoints. (Or even grid offsets - glyphs and doublewidth are a thing)
> > 
> > Given that we intend to send it over LSP without reading the source, only utf-16 code units is really correct. Unicode codepoints is "nicer" and will give correct results in the BMP, while bytes will be correct for ASCII only.
> > 
> > I'd vote for making this utf-16 code units.
> > 
> > It's OK if the code populating it doesn't get this right (confuses bytes and code units) but add a fixme.
> Done. Added FIXME.
I'd vote for Unicode code points.

I haven't looked into this closely. But UTF-8 vs UTF-16 vs Unicode code points should not be a big issue here. Unless you use emojis or some uncommon characters, the usage of UTF-16 code units in LSP does not have a lot of harm.

    //  😹😹👻 anything weird hidden in line comments can be ignored because they don't affect offset calculation

And Microsoft might change the spec one day https://github.com/Microsoft/language-server-protocol/issues/376


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D45513





More information about the cfe-commits mailing list