[PATCH] D114149: [clang-tidy] Fix pr48613: "llvm-header-guard uses a reserved identifier"

Salman Javed via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Nov 18 15:30:57 PST 2021


salman-javed-nz marked an inline comment as not done.
salman-javed-nz added inline comments.


================
Comment at: clang-tools-extra/clang-tidy/utils/HeaderGuard.cpp:28
+/// Replace invalid characters in the identifier with '_'.
+static std::string replaceInvalidChars(StringRef Identifier) {
+  std::string Fixed{Identifier};
----------------
aaron.ballman wrote:
> salman-javed-nz wrote:
> > aaron.ballman wrote:
> > > This fixes some issues but I think it introduces new issues too. The old code would accept Unicode characters and pass them along as (valid) identifier characters. The new code will replace all the Unicode characters with `_` which means some people's header guards may go from useful to `_______________`. We should definitely add test cases for Unicode file names.
> > > 
> > > Ultimately, I think this functionality will want to use `UnicodeCharSets.h` to implement something along the lines of what's done in `isAllowedInitiallyIDChar()` and `isAllowedIDChar()` from Lexer.cpp.
> > `isAllowed*IDChar()`does exactly what I want, but is a static function in Lexer.cpp.
> > Should I be changing its declaration so that it's exposed via Lexer.h?
> > I don't know what the bar is to warrant changing core public Clang API. 
> > 
> > If I can just call the function directly from llvm-header-guard, I can avoid code duplication.
> > 
> > 
> We could expose the interface from `Lexer` if we feel that's the right approach. I was going back and forth on whether it was the right approach and eventually thought that we don't want to expose it directly because I don't think we want clang-tidy to care about things like `LangOpts.AsmPreprocessor` or even `LangOpts.DollarIdents`.
> 
> So my recommendation is to put this into LexerUtils.h/.cpp in clang-tidy rather than changing the Clang Lexer interface. WDYT?
//(For brevity, when I say `isAllowedIDChar`, I mean both `isAllowedIDChar` and `isAllowedInitiallyIDChar`)//

I see two possible directions:

If I expose the interface from Lexer.h, then in Clang-Tidy, it's only a matter of calling 
`isAllowedIDChar(C, Context->getLangOpts())`. Clang-Tidy doesn't need to care about the contents of `LangOpts` - it just needs to pass them on as-is to the `isAllowedIDChar()` function.
`getLangOpts()` is already used all over the Clang-Tidy code base.

The other way is to create a copy of `isAllowedIDChar()` in the Clang-Tidy project and take out the parts to do with `LangOpts.AsmPreprocessor` and `LangOpts.DollarIdents`.  Now we have an implementation tailored to just the things Clang-Tidy cares about. I will need to add `clangLex` to CMakeLists.txt to gain visibility of clang/lib/Lex/UnicodeCharSets.h's character tables.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114149/new/

https://reviews.llvm.org/D114149



More information about the cfe-commits mailing list