[PATCH] D123064: [Clang][C++23] P2071 Named universal character escapes

Aaron Ballman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 13 11:42:48 PDT 2022


aaron.ballman added inline comments.


================
Comment at: clang/lib/Lex/Lexer.cpp:3139
       Diag(SlashLoc, diag::warn_ucn_not_valid_in_c89);
-    return 0;
+    return {};
   }
----------------
aaron.ballman wrote:
> FWIW, I think using `llvm::None` instead of `{}` is more clear to readers (I doubt we're consistent with this in the code base though).
> 
> (Same comment applies other places that we're making an empty `Optional`.)
I think this comment was missed.


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:509
+      Distance = Match.Distance;
+    if (Distance - Match.Distance > 3)
+      break;
----------------
Can the subtraction of two `unsigned` values cause a wrapping issue here? (If not, can we add an assertion?)


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:537
+  const char *UcnBegin = ThisTokBuf;
+  assert(UcnBegin[1] == 'N');
+  ThisTokBuf += 2;
----------------
Should we assert `UcnBegin[0] == '\\'` as well?


================
Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:31-44
+struct node {
+  bool is_root = false;
+  char32_t value = 0xFFFFFFFF;
+  uint32_t children_offset = 0;
+  bool has_sibling = false;
+  uint32_t size = 0;
+  StringRef name = {};
----------------
aaron.ballman wrote:
> You should rename things to match the usual coding conventions.
It looks like this comment was missed (it applies to this whole file).


================
Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:49
+    std::string s;
+    s.reserve(64);
+    auto n = this;
----------------
aaron.ballman wrote:
> Any particular reason for 64?
Still wondering why 64 bytes specifically.


================
Comment at: llvm/lib/Support/UnicodeNameToCodepoint.cpp:257
+constexpr const char32_t SBase = 0xAC00;
+// constexpr const char32_t LBase = 0x1100;
+// constexpr const char32_t VBase = 0x1161;
----------------
aaron.ballman wrote:
> Plan to remove the commented out code?
Unanswered question here.


================
Comment at: llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp:17-18
+// List of generated names
+// Should be kept in sync with Unicode
+// "Name Derivation Rule Prefix String".
+static bool generated(char32_t c) {
----------------
aaron.ballman wrote:
> Do we have something more direct to point users towards?
Unanswered question here. May be a good place for a link like Tom had mentioned.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123064/new/

https://reviews.llvm.org/D123064



More information about the llvm-commits mailing list