[all-commits] [llvm/llvm-project] c92056: [Clang][C++23] P2071 Named universal character esc...
cor3ntin via All-commits
all-commits at lists.llvm.org
Sat Jun 25 10:03:47 PDT 2022
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: c92056d038812c23800131892bee48abb2de7ca0
https://github.com/llvm/llvm-project/commit/c92056d038812c23800131892bee48abb2de7ca0
Author: Corentin Jabot <corentinjabot at gmail.com>
Date: 2022-06-25 (Sat, 25 Jun 2022)
Changed paths:
M clang/include/clang/Basic/DiagnosticLexKinds.td
M clang/include/clang/Lex/Lexer.h
M clang/lib/Lex/Lexer.cpp
M clang/lib/Lex/LiteralSupport.cpp
A clang/test/FixIt/fixit-unicode-named-escape-sequences.c
M clang/test/Lexer/char-escapes-delimited.c
M clang/test/Lexer/unicode.c
M clang/test/Parser/cxx11-user-defined-literals.cpp
M clang/test/Preprocessor/ucn-pp-identifier.c
M clang/test/Sema/ucn-identifiers.c
M llvm/CMakeLists.txt
M llvm/include/llvm/Support/Unicode.h
M llvm/lib/Support/CMakeLists.txt
A llvm/lib/Support/UnicodeNameToCodepoint.cpp
A llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
M llvm/unittests/Support/UnicodeTest.cpp
A llvm/utils/UnicodeData/CMakeLists.txt
A llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp
Log Message:
-----------
[Clang][C++23] P2071 Named universal character escapes
Implements [[ https://wg21.link/p2071r1 | P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not warn in c++23 mode will be done later once this paper is plenary approved (in July).
We add
* A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)`
* A set of functions in `Unicode.h` to query that data, including
* A function to find an exact match of a given Unicode character name
* A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching
* A function returning the best matching codepoint for a given string per edit distance
* Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits
* Support of `\N{}` as UCN with loose matching diagnostics/fixits.
Loose matching is considered an error to match closely the semantics of P2071.
The generated data contributes to 280kB of data to the binaries.
`UnicodeData.txt` and `NameAliases.txt` are not committed to the repository in this patch, and regenerating the data is a manual process.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D123064
More information about the All-commits
mailing list