<div dir="ltr"><div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><br></div><div>My only concern is that we'll break clients that
make the equal and opposite mistake (sending byte lengths instead of
UTF-16 lengths).</div><div>Maybe we should also downgrade this error to a
log message, since we can presumably trust the length from the string.
What do you think?</div></blockquote><div><br></div><div>I think the check is useful just to verify if the client and server are on the same page so I'm partial to option 2 as well. Downgrading to a log message avoids outright breaking existing clients so that seems like the way to go. As soon as Github is back up I'll make an issue asking to clarify the unit of rangeLength in the lsp repository as well.</div><div><br></div><div>Assuming we go with option 2 (including downgrading to an error message), I have two more questions:</div><div><br></div><div>1. Which header should I put the utf16Len function in (from SourceCode.cpp)? I tried searching for an existing header with UTF functions but I didn't immediately find one.</div><div>2. How do I go about downgrading to a log message? Do I just write to errs()?<br></div><div><br></div><div>Regards, <br></div><div><br></div><div>Daan<br></div><div><br></div><div><br></div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Mon, 22 Oct 2018 at 09:12, Sam McCall <<a href="mailto:sammccall@google.com">sammccall@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Hi Daan,</div><div>Glad to meet you, thanks for helping out!</div><div dir="ltr"><br></div><div dir="ltr">On Sun, Oct 21, 2018 at 8:55 PM Daan De Meyer via clangd-dev <<a href="mailto:clangd-dev@lists.llvm.org" target="_blank">clangd-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>I've started using clangd with VSCode and discovered an issue when accidentally typing a unicode character and immediately removing it afterwards. When removing the unicode character clangd errors with: "Change's rangeLength (1) doesn't match the computed range length (3)". Afterwards, clangd doesn't work for the current file until it is restarted.<br></div><div><br></div><div>Since I had to build LLVM anyways to add support for passing relative paths to -compile-commands-dir, I tried to debug this issue as well. What I've discovered is that when the range length's are compared in DraftStore::updateDraft (DraftStore.cpp:80) , the LSP's range length seems to be specified in UTF16 code units while clangd's computed range length is specified in bytes (see positionToOffset in SourceCode.hpp:83). This causes the check to fail which causes the error in VSCode.<br></div></div></blockquote><div>That sounds right to me. The use of UTF-16 in the protocol is (IMO) unfortunate, but it is reality.</div><div>We added handling for this in the SourceCode helpers but it's not surprising there are a few places we're assuming bytes by mistake.</div><div>Unfortunately most clangd developers and our tests tend to code in ASCII only, so we haven't shaken out those bugs.</div><div><br></div><div>Thanks for digging!</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Two possible fixes I can see are:</div><div><br></div><div>1. Remove the check (from a quick test clangd still functions perfectly after commenting out the check).</div><div>2. Use the StartIndex and EndIndex to get a substring from the Contents variable from which we can then calculate the computed range length in UTF16 code units using the utf16Len function (SourceCode.cpp:72). This would require lifting out the utf16Len function into SourceCode.h or elsewhere since at the moment it's a static function in SourceCode.cpp.</div><div><br></div><div>I also checked the LSP docs and they don't explicitly say the rangeLength is specified in UTF16 code units. However, I'm assuming it is since a LSP Range consists out of two LSP Positions of which the LSP Character field is specified in UTF16 code units.</div></div></blockquote><div>Yeah, I think this is a safe assumption.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>I'm not sure which solution would be preferred so I'm posting here first before implementing a fix.</div></div></blockquote><div>Keeping the sanity check and making it correct seems best to me (i.e. option 2). Either is better than the status quo, though.</div><div><br></div><div>My only concern is that we'll break clients that make the equal and opposite mistake (sending byte lengths instead of UTF-16 lengths).</div><div>Maybe we should also downgrade this error to a log message, since we can presumably trust the length from the string. What do you think?</div><div><br></div><div>Cheers, Sam</div></div></div>
</blockquote></div>