[PATCH] D96744: clang-format IncludeBlocks: Regroup determination of "main" for framework-style includes fix

Conrad Poelman via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Feb 26 12:46:05 PST 2021


poelmanc added a comment.

In D96744#2564828 <https://reviews.llvm.org/D96744#2564828>, @MyDeveloperDay wrote:

> Does this need to be an option?

It's easy to add an option, but there are already two //main include//-related options, so before adding a third I wanted to give this some thought. As someone new to `IncludeCategories`, I was genuinely impressed at how easy it was to use and how it gave me such complete control over the grouping via regular expressions. Yet in comparison the determination of //main// headers was less clear and more hard-coded; I had to examine the source code to figure out that the comparison is case-insensitive, it doesn't consider `<>` includes, only file stems are considered (e.g. the `foo/bar` in `foo/bar/baz.h` is ignored), and the behaviors of `IncludeIsMainSourceRegex` and `IncludeIsMainRegex` were a bit murky.

That's all fixable with a patch and minor documentation tweaks, but I wanted to consider some alternatives. Users of the `IncludeCategories` feature are comfortable with regular expressions, so imagine eliminating special handling of //main headers// entirely, and instead enabling users to write their own `IncludeCategories` rules for putting main headers first? We'd need to give them some bits of the source file path to use in their `Regex`, say called `${SourceStem}` and `${SourcePath}`.

Users unconcerned with formatting `foo_test.cc` or `foo_proto.cc` files could get the current functionality by including a simple rule like:

  - Regex:           '"(.*\/)?${SourceStem}(\..*)?"'
    Priority:        0
    CaseSensitive:   false

I want case sensitivity, matching at least one component of the path, and angle brackets, so I'd use something like:

  - Regex:           '[<"]${SourcePath:1}${SourceStem}\..*[>"]'
    Priority:        0
    CaseSensitive:   true

Then adding a generally-useful `ApplyToSourcesRegex` feature to apply any category regex only to certain source files, and an ability to trim a regular expression from the end of `${SourceStem}`, gives users full control, including mimicking current `isMainHeader()` behavior with rules like:

  - Regex:           '"(.*\/)?${SourceStem-<insert-old-IncludeIsMainRegex>}(\..*)?"'
    Priority:        0
    CaseSensitive:   false
    ApplyToSourcesRegex: `((.*\.(c|cc|cpp|c\+\+|cxx|m|mm))|<insert-old-IncludeIsMainSourceRegex>)`

For backwards-compatibility we'd automatically add the above rule if no special `${Source` tokens appear in any rules, and we can deprecate `IncludeIsMainRegex` and `IncludeIsMainSourceRegex` at any point. The special tokens for use in `Regex` are defined as:

- `${SourcePath:<n>}` is a regular expression matching `n` or more trailing components of the source directory file path, using forward slashes and including any trailing slash (`0 <= n < 10`)
- `${SourceStem}` is a string starting after the last `/` in the source file path and includes up to but excluding the first dot
- `${SourceStem-<re>}` is `${SourceStem}` with any trailing characters matching regular expression `re` removed

I have this coded up and it passes the ToolingTests, but wanted to run the idea by others. Thoughts? Should I update this patch with these changes? Start a separate issue? Stick with a new option `IncludeIsMainAllowBraces`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96744/new/

https://reviews.llvm.org/D96744



More information about the cfe-commits mailing list