[PATCH] D74731: [Clangd] Fixed assertion when processing extended ASCII characters.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Feb 17 11:53:22 PST 2020


sammccall added a comment.

Yeah I think there must be some confusion about what this code is doing. It's specifically iterating over the unicode codepoints of what are supposed to be UTF-8-encoded input bytes.

The input turns out sometimes not to be UTF-8 (e.g. the file on disk is ISO-8859-1 and clang thinks it's UTF-8 and just loads the bytes). We can't give any sort of right answer in these cases - we don't know the actual encoding and we can't even always detect these cases!

What we can do is strengthen the contract: instead of UB, assert in practice, we can say returns some garbage value but doesn't crash.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74731/new/

https://reviews.llvm.org/D74731





More information about the cfe-commits mailing list