[clangd-dev] DraftStore::updateDraft compares lsp's rangeLength in UTF16 code units against the computed rangeLength in bytes

Sam McCall via clangd-dev clangd-dev at lists.llvm.org
Mon Oct 22 00:12:04 PDT 2018

Hi Daan,
Glad to meet you, thanks for helping out!

On Sun, Oct 21, 2018 at 8:55 PM Daan De Meyer via clangd-dev <
clangd-dev at lists.llvm.org> wrote:

> I've started using clangd with VSCode and discovered an issue when
> accidentally typing a unicode character and immediately removing it
> afterwards. When removing the unicode character clangd errors with:
> "Change's rangeLength (1) doesn't match the computed range length (3)".
> Afterwards, clangd doesn't work for the current file until it is restarted.
> Since I had to build LLVM anyways to add support for passing relative
> paths to -compile-commands-dir, I tried to debug this issue as well. What
> I've discovered is that when the range length's are compared in
> DraftStore::updateDraft (DraftStore.cpp:80) , the LSP's range length seems
> to be specified in UTF16 code units while clangd's computed range length is
> specified in bytes (see positionToOffset in SourceCode.hpp:83). This causes
> the check to fail which causes the error in VSCode.
That sounds right to me. The use of UTF-16 in the protocol is (IMO)
unfortunate, but it is reality.
We added handling for this in the SourceCode helpers but it's not
surprising there are a few places we're assuming bytes by mistake.
Unfortunately most clangd developers and our tests tend to code in ASCII
only, so we haven't shaken out those bugs.

Thanks for digging!

Two possible fixes I can see are:
> 1. Remove the check (from a quick test clangd still functions perfectly
> after commenting out the check).
> 2. Use the StartIndex and EndIndex to get a substring from the Contents
> variable from which we can then calculate the computed range length in
> UTF16 code units using the utf16Len function (SourceCode.cpp:72). This
> would require lifting out the utf16Len function into SourceCode.h or
> elsewhere since at the moment it's a static function in SourceCode.cpp.
> I also checked the LSP docs and they don't explicitly say the rangeLength
> is specified in UTF16 code units. However, I'm assuming it is since a LSP
> Range consists out of two LSP Positions of which the LSP Character field is
> specified in UTF16 code units.
Yeah, I think this is a safe assumption.

I'm not sure which solution would be preferred so I'm posting here first
> before implementing a fix.
Keeping the sanity check and making it correct seems best to me (i.e.
option 2). Either is better than the status quo, though.

My only concern is that we'll break clients that make the equal and
opposite mistake (sending byte lengths instead of UTF-16 lengths).
Maybe we should also downgrade this error to a log message, since we can
presumably trust the length from the string. What do you think?

Cheers, Sam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/clangd-dev/attachments/20181022/b06aeda9/attachment.html>

More information about the clangd-dev mailing list