[PATCH] D46180: [clang-format] Refactor #include insertion/deletion functionality into a class.

Ilya Biryukov via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed May 2 03:12:50 PDT 2018


ilya-biryukov added inline comments.


================
Comment at: lib/Format/Format.cpp:1691
   // 0. Otherwise, returns the priority of the matching category or INT_MAX.
-  int getIncludePriority(StringRef IncludeName, bool CheckMainHeader) {
+  int getIncludePriority(StringRef IncludeName, bool CheckMainHeader) const {
     int Ret = INT_MAX;
----------------
ioeric wrote:
> ilya-biryukov wrote:
> > I would personally keep the function non-const and not use mutable fields here.
> > Even though it's logically const, I would strive towards keeping the things `const` only if there are actually immutable
> > One particular problem that could be avoided is accidentally calling the const methods concurrently on different threads.
> > 
> > But up to you if you actually want to make this change.
> Making this non-const would prevent us from making `HeaderIncludes::insert` const, which I do want to keep. Added a mutex to prevent race-condition... 
Actually keeping things const and mentioning in the class comments that the class is not thread-safe because it uses regexes seems like a better choice.
Most of the clients shouldn't pay for the mutex operations for the sake of the rare multi-threaded clients.
WDYT?


================
Comment at: lib/Format/Format.cpp:1995
+  /// Inserts an #include directive of \p IncludeName into the code. If \p
+  /// IncludeName is quoted e.g. <...> or "...", it will be #included as is;
+  /// by default,  it is quoted with "".
----------------
ioeric wrote:
> ilya-biryukov wrote:
> > Maybe make an API more explicit about the style of includes that should be used?
> > Make it literally impossible to misuse the API:
> > ```
> > enum class IncludeStyle { Quoted, Angle };
> > /// \p IncludeName is the include path to be inserted, \p Style specifies whether the included path should use quotes or angle brackets.
> > Optional<Replacement> insert(StringRef IncludeName, IncludeStyle Style = IncludeStyle::Quoted) const;
> > ```
> > 
> I think the risk is pretty low, so I'm not sure if it'd be worth spending another variable. From experience, the current callers would likely need to either carry another variable for each include or use the following pattern (as the IncludeName is often already quoted):
> ```
>  ... = insert(Include.trime("\"<>"), QuoteStyle);
> ```
I still don't feel encoding things in the strings is the right approach here.
It raises some questions that we should handle/assert: what if users don't quote the strings at all? what is the leading and trailing quotes don't match? and so on and so forth.
Being explicit about such things seems like a better choice.

> From experience, the current callers would likely need to either carry another variable
It's not surprising, since the class is literally extracted from the middle of the include insertion implementation in clang-format.


================
Comment at: lib/Format/Format.cpp:2013
+
+  /// Removes all existing #includes of \p Header. If \p Header is not quoted
+  /// (e.g. without <> or ""), #include of \p Header with both quotations will
----------------
ioeric wrote:
> ilya-biryukov wrote:
> > Have you considered changing the API slightly to allow iterating over all includes in the file and APIs to remove includes pointed by the iterators?
> > It would give more flexibility, allowing the clients to implement useful things like:
> > - Remove all #includes that start with a prefix, e.g. `clang/Basic/....`.
> > - Find and remove all duplicate #include directives.
> > - etc, etc.
> > 
> > The current implementation seems tied to a specific use-case that we have in clang-format, i.e. "preprocess once, remove headers in batches". 
> > It feels wasteful to pay overhead of `StringMap<vector<Include>>` sometimes, e.g. if the code wants to insert just a single header or do other things, like removing duplicate headers.
> > 
> > I.e. I propose to extract the code that parses the source code and produces a list of #include directives with their respective ranges and filenames into a separate function/class.
> > This would. arguably, improve readability of the include insertion code.
> > 
> > That might not be a trivial change, though, since it requires rewriting some of the code used here, while the current patch seems to focus on simply extracting useful functionality from `clang-format` for reuse in clangd, clang-tidy, etc.
> > Let me know what you think.
> I haven't thought think about these use cases. 
> 
> These use cases don't seem to be very common though.  Generally, I try to make the APIs easy to use for the use cases we have instead designing the APIs for all potential uses. An over-generalized API might be awkward to use for major use cases, and designing for the future also has opportunity cost...
> 
> That said, I think when the use cases do come up, it wouldn't be hard to add APIs for that (and change implementation if necessary). But I also think the current APIs would still be useful for users who only care about #include insertion/deletion.
Ok, I'm happy with leaving the current API as is, even though I do feel some refactoring would help both the readability and allow us to reuse the code for more use-cases.
But does seem out of scope of this patch, so we could wait for an actual use-case to pop up.


================
Comment at: lib/Format/Format.cpp:2016
+  /// be removed.
+  tooling::Replacements remove(llvm::StringRef Header) const;
 
----------------
ioeric wrote:
> ilya-biryukov wrote:
> > Maybe also add a param of type `Optional<IncludeStyle>` and 
> >   - only remove includes with the specified style, if specified the value is specified
> >   - remove both styles the value is `llvm::None`.
> Again, I'm not quite sure if it's worth another parameter for this.
The parameter in this case seems to be even more useful, since we have two different behaviors for quoted vs unquoted strings.
So it's really easy to misuse the API and make a mistake, thinking it does the right thing for you.


================
Comment at: lib/Format/Format.cpp:2023
+    // An include header quoted with either <> or "".
+    std::string Name;
+    // The range of the whole line of include directive including any eading
----------------
ioeric wrote:
> ilya-biryukov wrote:
> > Maybe use `StringRef` here?
> We need to worry about life-cycle of the underlying string if we use `StringRef`, and I think it's not trivial in this case, so I think `std::string` would be a better fit here.
We used `StringRef`for `Code`, `Filename`, etc before.
Are they different? Is the lifetime of `Include` more complicated? When would `Include` point outside `Code`?


================
Comment at: lib/Format/Format.cpp:2031
 
-  llvm::Regex IncludeRegex(IncludeRegexPattern);
-  llvm::Regex DefineRegex(R"(^[\t\ ]*#[\t\ ]*define[\t\ ]*[^\\]*$)");
-  SmallVector<StringRef, 4> Matches;
+  StringRef FileName;
+  StringRef Code;
----------------
ioeric wrote:
> ilya-biryukov wrote:
> > What do we use the `FileName` for?
> > On a higher level, it seems we shouldn't need it to implement what `HeaderIncludes` does.
> `FileName` is needed to decide whether a header is a main header.
Can we pass `bool IsMainHeader` instead? Can we not store it as a field? It seems to be only needed in the constructor, right?


Repository:
  rC Clang

https://reviews.llvm.org/D46180





More information about the cfe-commits mailing list