<div dir="ltr"><div class="gmail_quote"><div>Hi Daan,</div><div>Glad to meet you, thanks for helping out!</div><div dir="ltr"><br></div><div dir="ltr">On Sun, Oct 21, 2018 at 8:55 PM Daan De Meyer via clangd-dev <<a href="mailto:clangd-dev@lists.llvm.org" target="_blank">clangd-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>I've started using clangd with VSCode and discovered an issue when accidentally typing a unicode character and immediately removing it afterwards. When removing the unicode character clangd errors with: "Change's rangeLength (1) doesn't match the computed range length (3)". Afterwards, clangd doesn't work for the current file until it is restarted.<br></div><div><br></div><div>Since I had to build LLVM anyways to add support for passing relative paths to -compile-commands-dir, I tried to debug this issue as well. What I've discovered is that when the range length's are compared in DraftStore::updateDraft (DraftStore.cpp:80) , the LSP's range length seems to be specified in UTF16 code units while clangd's computed range length is specified in bytes (see positionToOffset in SourceCode.hpp:83). This causes the check to fail which causes the error in VSCode.<br></div></div></blockquote><div>That sounds right to me. The use of UTF-16 in the protocol is (IMO) unfortunate, but it is reality.</div><div>We added handling for this in the SourceCode helpers but it's not surprising there are a few places we're assuming bytes by mistake.</div><div>Unfortunately most clangd developers and our tests tend to code in ASCII only, so we haven't shaken out those bugs.</div><div><br></div><div>Thanks for digging!</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Two possible fixes I can see are:</div><div><br></div><div>1. Remove the check (from a quick test clangd still functions perfectly after commenting out the check).</div><div>2. Use the StartIndex and EndIndex to get a substring from the Contents variable from which we can then calculate the computed range length in UTF16 code units using the utf16Len function (SourceCode.cpp:72). This would require lifting out the utf16Len function into SourceCode.h or elsewhere since at the moment it's a static function in SourceCode.cpp.</div><div><br></div><div>I also checked the LSP docs and they don't explicitly say the rangeLength is specified in UTF16 code units. However, I'm assuming it is since a LSP Range consists out of two LSP Positions of which the LSP Character field is specified in UTF16 code units.</div></div></blockquote><div>Yeah, I think this is a safe assumption.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>I'm not sure which solution would be preferred so I'm posting here first before implementing a fix.</div></div></blockquote><div>Keeping the sanity check and making it correct seems best to me (i.e. option 2). Either is better than the status quo, though.</div><div><br></div><div>My only concern is that we'll break clients that make the equal and opposite mistake (sending byte lengths instead of UTF-16 lengths).</div><div>Maybe we should also downgrade this error to a log message, since we can presumably trust the length from the string. What do you think?</div><div><br></div><div>Cheers, Sam</div></div></div>