[PATCH] D46180: [clang-format] Refactor #include insertion/deletion functionality into a class.

Ilya Biryukov via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Apr 27 03:46:52 PDT 2018


ilya-biryukov added a comment.

This change LG as an extraction of the helper functionality to be reused in clang, clang-tidy, etc.
However, I feel there are potential improvements both to the underlying code and the new APIs that we could make.

I left some comments, trying to focus on interface of the class and keeping the changes that would be required to address them to be purely cosmetic. As chatted offline, any non-trivial changes to the underlying implementation are out of scope of this patch.

> preserver sorted #includes.

A typo in the change description:
s/preserver/preserve

> This is best effort - only works when the
>  block is already sorted.

Could we describe behavior for the unsorted case in a sensible way? E.g. is it added to the end of the include list, added after closely related headers, added to a totally random place (but deterministic) place, etc? 
It seems like an important case, and I believe we shouldn't completely ignore it and describe what the "best effort" means in that case.

> When inserting multiple #includes to the end of a file which doesn't
>  end with a "\n" character, a "\n" will be prepended to each #include.
>  This is a special and rare case that was previously handled. This is now
>  relaxed to avoid complexity as it's rare in practice.

I don't quite get it, could you please elaborate on what has changed here exactly? Maybe give an example?



================
Comment at: lib/Format/Format.cpp:1691
   // 0. Otherwise, returns the priority of the matching category or INT_MAX.
-  int getIncludePriority(StringRef IncludeName, bool CheckMainHeader) {
+  int getIncludePriority(StringRef IncludeName, bool CheckMainHeader) const {
     int Ret = INT_MAX;
----------------
I would personally keep the function non-const and not use mutable fields here.
Even though it's logically const, I would strive towards keeping the things `const` only if there are actually immutable
One particular problem that could be avoided is accidentally calling the const methods concurrently on different threads.

But up to you if you actually want to make this change.


================
Comment at: lib/Format/Format.cpp:1988
 
-bool isDeletedHeader(llvm::StringRef HeaderName,
-                     const std::set<llvm::StringRef> &HeadersToDelete) {
-  return HeadersToDelete.count(HeaderName) ||
-         HeadersToDelete.count(HeaderName.trim("\"<>"));
-}
+/// Generates replacements for inserting or deleting #include headers in a file.
+class HeaderIncludes {
----------------
Maybe s/#include headers/#include directives/?
This is how they called in terminology of preprocessing.


================
Comment at: lib/Format/Format.cpp:1995
+  /// Inserts an #include directive of \p IncludeName into the code. If \p
+  /// IncludeName is quoted e.g. <...> or "...", it will be #included as is;
+  /// by default,  it is quoted with "".
----------------
Maybe make an API more explicit about the style of includes that should be used?
Make it literally impossible to misuse the API:
```
enum class IncludeStyle { Quoted, Angle };
/// \p IncludeName is the include path to be inserted, \p Style specifies whether the included path should use quotes or angle brackets.
Optional<Replacement> insert(StringRef IncludeName, IncludeStyle Style = IncludeStyle::Quoted) const;
```



================
Comment at: lib/Format/Format.cpp:2013
+
+  /// Removes all existing #includes of \p Header. If \p Header is not quoted
+  /// (e.g. without <> or ""), #include of \p Header with both quotations will
----------------
Have you considered changing the API slightly to allow iterating over all includes in the file and APIs to remove includes pointed by the iterators?
It would give more flexibility, allowing the clients to implement useful things like:
- Remove all #includes that start with a prefix, e.g. `clang/Basic/....`.
- Find and remove all duplicate #include directives.
- etc, etc.

The current implementation seems tied to a specific use-case that we have in clang-format, i.e. "preprocess once, remove headers in batches". 
It feels wasteful to pay overhead of `StringMap<vector<Include>>` sometimes, e.g. if the code wants to insert just a single header or do other things, like removing duplicate headers.

I.e. I propose to extract the code that parses the source code and produces a list of #include directives with their respective ranges and filenames into a separate function/class.
This would. arguably, improve readability of the include insertion code.

That might not be a trivial change, though, since it requires rewriting some of the code used here, while the current patch seems to focus on simply extracting useful functionality from `clang-format` for reuse in clangd, clang-tidy, etc.
Let me know what you think.


================
Comment at: lib/Format/Format.cpp:2015
+  /// (e.g. without <> or ""), #include of \p Header with both quotations will
+  /// be removed.
+  tooling::Replacements remove(llvm::StringRef Header) const;
----------------
We should mention that remove only deletes exact matches and won't try to resolve filenames, patch up path separators, etc.


================
Comment at: lib/Format/Format.cpp:2016
+  /// be removed.
+  tooling::Replacements remove(llvm::StringRef Header) const;
 
----------------
Maybe also add a param of type `Optional<IncludeStyle>` and 
  - only remove includes with the specified style, if specified the value is specified
  - remove both styles the value is `llvm::None`.


================
Comment at: lib/Format/Format.cpp:2023
+    // An include header quoted with either <> or "".
+    std::string Name;
+    // The range of the whole line of include directive including any eading
----------------
Maybe use `StringRef` here?


================
Comment at: lib/Format/Format.cpp:2031
 
-  llvm::Regex IncludeRegex(IncludeRegexPattern);
-  llvm::Regex DefineRegex(R"(^[\t\ ]*#[\t\ ]*define[\t\ ]*[^\\]*$)");
-  SmallVector<StringRef, 4> Matches;
+  StringRef FileName;
+  StringRef Code;
----------------
What do we use the `FileName` for?
On a higher level, it seems we shouldn't need it to implement what `HeaderIncludes` does.


================
Comment at: lib/Format/Format.cpp:2040
+  std::unordered_map<int, llvm::SmallVector<const Include *, 8>>
+      IncludesByPriority;
 
----------------
Could you please add a comment explaining what are those priorities and what this map is used for?


================
Comment at: lib/Format/Format.cpp:2046
+  // Code with the first MinInsertOffset characters trimmed.
+  StringRef TrimmedCode;
+  // Max insertion offset in the original code.
----------------
Is this field always equal to `Code.drop_front(MinInsertOffset)`?
If so, maybe compute it when needed and don't store explicitly?


================
Comment at: lib/Format/Format.cpp:2048
+  // Max insertion offset in the original code.
+  unsigned MaxInsertOffset;
+  IncludeCategoryManager Categories;
----------------
Could you add a comment explaining how min and max offsets are calculated? I assume minoffset goes after the file comment and maxoffset is where non-preprocessor code starts, but it would be nice to have a short comment to know for sure.


Repository:
  rC Clang

https://reviews.llvm.org/D46180





More information about the cfe-commits mailing list