[all-commits] [llvm/llvm-project] c92056: [Clang][C++23] P2071 Named universal character esc...

cor3ntin via All-commits all-commits at lists.llvm.org
Sat Jun 25 10:03:47 PDT 2022


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: c92056d038812c23800131892bee48abb2de7ca0
      https://github.com/llvm/llvm-project/commit/c92056d038812c23800131892bee48abb2de7ca0
  Author: Corentin Jabot <corentinjabot at gmail.com>
  Date:   2022-06-25 (Sat, 25 Jun 2022)

  Changed paths:
    M clang/include/clang/Basic/DiagnosticLexKinds.td
    M clang/include/clang/Lex/Lexer.h
    M clang/lib/Lex/Lexer.cpp
    M clang/lib/Lex/LiteralSupport.cpp
    A clang/test/FixIt/fixit-unicode-named-escape-sequences.c
    M clang/test/Lexer/char-escapes-delimited.c
    M clang/test/Lexer/unicode.c
    M clang/test/Parser/cxx11-user-defined-literals.cpp
    M clang/test/Preprocessor/ucn-pp-identifier.c
    M clang/test/Sema/ucn-identifiers.c
    M llvm/CMakeLists.txt
    M llvm/include/llvm/Support/Unicode.h
    M llvm/lib/Support/CMakeLists.txt
    A llvm/lib/Support/UnicodeNameToCodepoint.cpp
    A llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
    M llvm/unittests/Support/UnicodeTest.cpp
    A llvm/utils/UnicodeData/CMakeLists.txt
    A llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp

  Log Message:
  -----------
  [Clang][C++23] P2071 Named universal character escapes

Implements [[ https://wg21.link/p2071r1  | P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch  not warn in c++23 mode will be done later once this paper is plenary approved (in July).

We add

 * A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)`
 * A set of functions in `Unicode.h` to query that data, including

   * A function to find an exact match of a given Unicode character name
   * A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching
   * A function returning the best matching codepoint for a given string per edit distance

 * Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits
 * Support of `\N{}` as UCN with loose matching diagnostics/fixits.

Loose matching is considered an error to match closely the semantics of P2071.

The generated data contributes to 280kB of data to the binaries.

`UnicodeData.txt` and `NameAliases.txt`  are not committed to the repository in this patch, and regenerating the data is a manual process.

Reviewed By: tahonermann

Differential Revision: https://reviews.llvm.org/D123064




More information about the All-commits mailing list