[PATCH] D46180: [clang-format] Refactor #include insertion/deletion functionality into a class.

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Apr 27 05:45:57 PDT 2018


ioeric marked 7 inline comments as done.
ioeric added a comment.

Thanks for reviewing this!

In https://reviews.llvm.org/D46180#1080822, @ilya-biryukov wrote:

> This change LG as an extraction of the helper functionality to be reused in clang, clang-tidy, etc.
>  However, I feel there are potential improvements both to the underlying code and the new APIs that we could make.
>
> I left some comments, trying to focus on interface of the class and keeping the changes that would be required to address them to be purely cosmetic. As chatted offline, any non-trivial changes to the underlying implementation are out of scope of this patch.
>
> > preserver sorted #includes.
>
> A typo in the change description:
>  s/preserver/preserve
>
> > This is best effort - only works when the
> >  block is already sorted.
>
> Could we describe behavior for the unsorted case in a sensible way? E.g. is it added to the end of the include list, added after closely related headers, added to a totally random place (but deterministic) place, etc? 
>  It seems like an important case, and I believe we shouldn't completely ignore it and describe what the "best effort" means in that case.


Sure. I added more comments to the API.

> 
> 
>> When inserting multiple #includes to the end of a file which doesn't
>>  end with a "\n" character, a "\n" will be prepended to each #include.
>>  This is a special and rare case that was previously handled. This is now
>>  relaxed to avoid complexity as it's rare in practice.
> 
> I don't quite get it, could you please elaborate on what has changed here exactly? Maybe give an example?

This is really a corner cases that users might not need to know about. But an example is:
Code: `"#include <a>"` (without \n at the end). After inserting <x>, `#include <a>\n#include <x>\n` (this is good). However, if you insert another <y>, the code would become `"#include <a>\n#include <x>\n\n#include <y>\n"` (note the double newline!).



================
Comment at: lib/Format/Format.cpp:1691
   // 0. Otherwise, returns the priority of the matching category or INT_MAX.
-  int getIncludePriority(StringRef IncludeName, bool CheckMainHeader) {
+  int getIncludePriority(StringRef IncludeName, bool CheckMainHeader) const {
     int Ret = INT_MAX;
----------------
ilya-biryukov wrote:
> I would personally keep the function non-const and not use mutable fields here.
> Even though it's logically const, I would strive towards keeping the things `const` only if there are actually immutable
> One particular problem that could be avoided is accidentally calling the const methods concurrently on different threads.
> 
> But up to you if you actually want to make this change.
Making this non-const would prevent us from making `HeaderIncludes::insert` const, which I do want to keep. Added a mutex to prevent race-condition... 


================
Comment at: lib/Format/Format.cpp:1995
+  /// Inserts an #include directive of \p IncludeName into the code. If \p
+  /// IncludeName is quoted e.g. <...> or "...", it will be #included as is;
+  /// by default,  it is quoted with "".
----------------
ilya-biryukov wrote:
> Maybe make an API more explicit about the style of includes that should be used?
> Make it literally impossible to misuse the API:
> ```
> enum class IncludeStyle { Quoted, Angle };
> /// \p IncludeName is the include path to be inserted, \p Style specifies whether the included path should use quotes or angle brackets.
> Optional<Replacement> insert(StringRef IncludeName, IncludeStyle Style = IncludeStyle::Quoted) const;
> ```
> 
I think the risk is pretty low, so I'm not sure if it'd be worth spending another variable. From experience, the current callers would likely need to either carry another variable for each include or use the following pattern (as the IncludeName is often already quoted):
```
 ... = insert(Include.trime("\"<>"), QuoteStyle);
```


================
Comment at: lib/Format/Format.cpp:2013
+
+  /// Removes all existing #includes of \p Header. If \p Header is not quoted
+  /// (e.g. without <> or ""), #include of \p Header with both quotations will
----------------
ilya-biryukov wrote:
> Have you considered changing the API slightly to allow iterating over all includes in the file and APIs to remove includes pointed by the iterators?
> It would give more flexibility, allowing the clients to implement useful things like:
> - Remove all #includes that start with a prefix, e.g. `clang/Basic/....`.
> - Find and remove all duplicate #include directives.
> - etc, etc.
> 
> The current implementation seems tied to a specific use-case that we have in clang-format, i.e. "preprocess once, remove headers in batches". 
> It feels wasteful to pay overhead of `StringMap<vector<Include>>` sometimes, e.g. if the code wants to insert just a single header or do other things, like removing duplicate headers.
> 
> I.e. I propose to extract the code that parses the source code and produces a list of #include directives with their respective ranges and filenames into a separate function/class.
> This would. arguably, improve readability of the include insertion code.
> 
> That might not be a trivial change, though, since it requires rewriting some of the code used here, while the current patch seems to focus on simply extracting useful functionality from `clang-format` for reuse in clangd, clang-tidy, etc.
> Let me know what you think.
I haven't thought think about these use cases. 

These use cases don't seem to be very common though.  Generally, I try to make the APIs easy to use for the use cases we have instead designing the APIs for all potential uses. An over-generalized API might be awkward to use for major use cases, and designing for the future also has opportunity cost...

That said, I think when the use cases do come up, it wouldn't be hard to add APIs for that (and change implementation if necessary). But I also think the current APIs would still be useful for users who only care about #include insertion/deletion.


================
Comment at: lib/Format/Format.cpp:2016
+  /// be removed.
+  tooling::Replacements remove(llvm::StringRef Header) const;
 
----------------
ilya-biryukov wrote:
> Maybe also add a param of type `Optional<IncludeStyle>` and 
>   - only remove includes with the specified style, if specified the value is specified
>   - remove both styles the value is `llvm::None`.
Again, I'm not quite sure if it's worth another parameter for this.


================
Comment at: lib/Format/Format.cpp:2023
+    // An include header quoted with either <> or "".
+    std::string Name;
+    // The range of the whole line of include directive including any eading
----------------
ilya-biryukov wrote:
> Maybe use `StringRef` here?
We need to worry about life-cycle of the underlying string if we use `StringRef`, and I think it's not trivial in this case, so I think `std::string` would be a better fit here.


================
Comment at: lib/Format/Format.cpp:2031
 
-  llvm::Regex IncludeRegex(IncludeRegexPattern);
-  llvm::Regex DefineRegex(R"(^[\t\ ]*#[\t\ ]*define[\t\ ]*[^\\]*$)");
-  SmallVector<StringRef, 4> Matches;
+  StringRef FileName;
+  StringRef Code;
----------------
ilya-biryukov wrote:
> What do we use the `FileName` for?
> On a higher level, it seems we shouldn't need it to implement what `HeaderIncludes` does.
`FileName` is needed to decide whether a header is a main header.


Repository:
  rC Clang

https://reviews.llvm.org/D46180





More information about the cfe-commits mailing list