[PATCH] D134733: [clang-format][chore] transparent #include name regex

Konrad Wilhelm Kleine via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Sep 28 03:11:15 PDT 2022


kwk added a comment.

@MyDeveloperDay please see my potentially uneducated comments.



================
Comment at: clang/lib/Tooling/Inclusions/HeaderIncludes.cpp:400
+llvm::Regex getCppIncludeRegex() {
+  static const char CppIncludeRegexPattern[] =
+      R"(^[\t\ ]*#[\t\ ]*(import|include)[^"<]*(["<][^">]*[">]))";
----------------
MyDeveloperDay wrote:
> What if this was overridable via the Style we could experiment with changing the regex
Well, right now that was never a requirement and to be honest it isn't very worth exploring this because its not easy to determine which the correct matching group is for the include name.

For example, let's say you would allow this to be configurable from the config file and a user enters the regex from my previous patch (D121370):

```
^[\t\ ]*[@#][\t\ ]*(import|include)([^"/]*("[^"]+")|[^</]*(<[^>]+>)|[\t/\ ]*([^;]+;))
```

How would you determine which is the correct matching group? I first thought about taking the last one matching but then I found that my regex was not able to cope with trailing comments. And even in my case there's more than one matching group for the include name. All in all I really don't see any value in outsourcing this to the config file is not something I see value in.

The only thing valuable would be to have a regex for multiple languages. Sorry if this is what you intended. I just wouldn't want to move this out to the config file for the end-user to play with. 


================
Comment at: clang/lib/Tooling/Inclusions/HeaderIncludes.cpp:407
+    const llvm::SmallVectorImpl<llvm::StringRef> &Matches) {
+  if (Matches.size() >= 3) {
+    return Matches[2];
----------------
MyDeveloperDay wrote:
> ‘>= 2’
@MyDeveloperDay correct me if I'm wrong but if an array has size `3` it has indexes `0`, `1` and `2`. And an array of size `2` only has `0` and `1`.  So the case `=2`, which is implied by your suggested`>=2`, is actually an out of bounds access when going `Matches[2]`.  Because that is effectively accessing the third element. The only valid change would be to check for `Matches.size() > 2` IMHO and that is the same as `Matches.size() >= 3`.

I must admit that I had to look at the regex a few times only to realize that it has two non-optional matching groups `(...)`. The third matching group at index `0` is the whole line. So in theory `>=3` isn't possible with the current regex. I wanted to give my best to have this logic "survive" a change to the regex in which for example something is added after the matching group of the include name; let's say a comment or something.

I hope I haven't made myself a complete fool.  


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134733/new/

https://reviews.llvm.org/D134733



More information about the cfe-commits mailing list