[PATCH] D46035: [clangd] Fix unicode handling, using UTF-16 where LSP requires it.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Apr 27 04:33:09 PDT 2018


sammccall marked 3 inline comments as done.
sammccall added a comment.

Thanks!



================
Comment at: clangd/SourceCode.cpp:25
+// Returns true if CB returned true, false if we hit the end of string.
+template <typename Callback>
+bool iterateCodepoints(StringRef U8, const Callback &CB) {
----------------
hokein wrote:
> Can we make it `static`?
> 
> The callback type is function<int, int>, the reason why using template here is mainly to save some keystroke? 
Added static.

The difference between using a template vs `std::function` for a lambda is compile-time vs run-time polymorphism: invoking std::function is a virtual call and (AFAIK) compilers don't manage to inline it well.
With the template, we get one copy of the function for each callsite, with the lambda inlined.

Not sure the performance is a big deal here, but this code is at least plausibly hot I guess? And I think there's very little readability cost to using the template in this case.


================
Comment at: clangd/SourceCode.cpp:72
+  size_t Count = 0;
+  iterateCodepoints(U8, [&](int U8Len, int U16Len) {
+    Count += U16Len;
----------------
hokein wrote:
> Maybe add an `assume` to ensure `iterateCodepoints` always return false (reach the end of the U8)?
I'm not sure there's enough value to this one, it's clear from the local code that this isn't possible, and it doesn't seem likely a bug would manifest this way (abort early even though our function returns false, *and return true* from iterateCodepoints).

The small cost is added noise - I think this needs a new variable, and assert, and a suppression of "unused variable" warning.


================
Comment at: clangd/SourceCode.cpp:137
+    P.character =
+        utf16Len(Code.substr(LocInfo.second - ColumnInBytes, ColumnInBytes));
+  }
----------------
hokein wrote:
> nit: it took me a while to understand what the sub-expression `Code.substr(LocInfo.second - ColumnInBytes, ColumnInBytes)` does, maybe abstract it out with a descriptive name? Also it is not straight-forward to understand what `LocInfo.second` is without navigating to `getDecomposedSpellingLoc`.
Called this `LineSoFar` and decomposed `LocInfo` int named variables.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D46035





More information about the cfe-commits mailing list