[clangd-dev] DraftStore::updateDraft compares lsp's rangeLength in UTF16 code units against the computed rangeLength in bytes

Mon Oct 22 11:02:07 PDT 2018

On Mon, Oct 22, 2018 at 7:27 PM Daan De Meyer <daan.j.demeyer at gmail.com>
wrote:

> After implementing option 2, I've changed my mind and I think we should
> remove the check.
>
> Right now, DraftStore::UpdateDraft takes the start position and end
> position of the LSP message (specified in UTF-16 code units) and calculates
> the length of the range in bytes. What we'd be doing with option 2 is
> converting that length back to UTF-16 code units and checking whether it's
> equal to the rangeLength of the LSP message. This is nothing more than a
> really complicated check to see whether 'LSP.end - LSP.start ==
> LSP.rangeLength'.
>
Yes, but start and end aren't just UTF-16 code unit offsets, they're (line,
col) pairs where col is in UTF-16.

So it's an indirect assertion about the state of the remote buffer (you
can't convert between offsets and line/col without the buffer).
I think there's *some* value in validating that assertion, particularly in
light of the fact that the protocol talks about it in two different
encodings (the actual buffer content is sent as JSON-escaped strings over
UTF-8) and we store as a third (UTF-8).

But it's not critical. Up to you.

Cheers, Sam

> The check could just as easily done without the conversion to bytes, but
> then we're really only checking whether the client calculated the
> rangeLength correctly. Since the rangeLength isn't used by the rest of
> clangd I propose to just ignore it. rangeLength seems to be redundant since
> we just calculate it ourselves. I also found an issue about this in the LSP
> repository: https://github.com/Microsoft/language-server-protocol/issues/9
> .
>
> Regards,
>
> Daan
>
> On Mon, 22 Oct 2018 at 13:48, Sam McCall <sammccall at google.com> wrote:
>
>> On Mon, Oct 22, 2018 at 1:32 PM Daan De Meyer <daan.j.demeyer at gmail.com>
>> wrote:
>>
>>>
>>>> My only concern is that we'll break clients that make the equal and
>>>> opposite mistake (sending byte lengths instead of UTF-16 lengths).
>>>> Maybe we should also downgrade this error to a log message, since we
>>>> can presumably trust the length from the string. What do you think?
>>>>
>>>
>>> I think the check is useful just to verify if the client and server are
>>> on the same page so I'm partial to option 2 as well. Downgrading to a log
>>> message avoids outright breaking existing clients so that seems like the
>>> way to go. As soon as Github is back up I'll make an issue asking to
>>> clarify the unit of rangeLength in the lsp repository as well.
>>>
>> Thanks!
>>
>> Assuming we go with option 2 (including downgrading to an error message),
>>> I have two more questions:
>>>
>>> 1. Which header should I put the utf16Len function in (from
>>> SourceCode.cpp)? I tried searching for an existing header with UTF
>>> functions but I didn't immediately find one.
>>>
>> I think SourceCode.h is a reasonable fit within clangd. (There are some
>> UTF libraries in llvm/support, but it's not clear this belongs there and
>> it'd be a bunch of work).
>>
>> Consider renaming it something like `lspLength()` or so and keeping the
>> bits about UTF-16 in the function comment.
>> That's kind of a low-level detail.
>>
>>
>>> 2. How do I go about downgrading to a log message? Do I just write to
>>> errs()?
>>>
>> We use the log() functions in clangd/Logger.h. (Abstracts the
>> destination, handles locking, does formatting).
>>
>> This one is scary+rare enough it should probably be an error log.
>> The functions use formatv syntax, so something like:
>>   elog("bad TextDocumentContentChangeEvent: rangeLength={0} but computed
>> length {1} for {2}",
>>        *change.rangeLength, Len, change.text);
>>
>> Cheers, Sam
>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/clangd-dev/attachments/20181022/44410ad9/attachment.html>