[PATCH] D81530: [clangd] Log rather than assert on bad UTF-8.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 10 14:28:53 PDT 2020


sammccall added a comment.

In D81530#2085725 <https://reviews.llvm.org/D81530#2085725>, @jaafar wrote:

> I can confirm that this commit works equally well for the UTF-8 assertion failure. Thank you!
>
> Is there some way to easily modify the source to produce a diagnostic pointing to the issue's source location? I would like to file an appropriate bug in Boost.


In https://www.boost.org/doc/libs/1_73_0/boost/spirit/home/support/char_encoding/iso8859_1.hpp
The line labeled `161  a1`, whose bytes are `"/*  \xa1  161  a1 */   BOOST_CC_PUNCT,"` triggers it because of the `"\xa1"`.

It's not a cut and dried bug, but I do think removing the high bytes from these comments is a good idea, so it's probably worth filing the bug.

The C++ standard doesn't say anything about the encoding of characters on disk (the "input encoding" of bytes -> source character set) - it starts with the source character set, I think. So what do implementations do?

- clang: from reading the code, clang only supports UTF-8. It supports gcc's -finput-charset flag, but setting it to anything other than UTF-8 is an error!
- GCC: supports a variety of input encodings, configured with `-finput-charset` or locale. As such it's not really reasonable for a header designed to be included to require any particular value, as you can't pass a flag for just that header.

In practice this doesn't actually affect compilation because the bad UTF-8 sequence in the comment is never parsed: clang just skips over it looking for `*/`. It probably mostly affects tools that use line/column coordinates (like clangd by virtue of LSP, and clang diagnostics), and tools that extract comment contents (doxygen et al).
But it still seems that it would be clearer to remove the literal characters from the source file, or write them in UTF-8.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81530/new/

https://reviews.llvm.org/D81530





More information about the cfe-commits mailing list