[PATCH] Initial clang-tidy architecture

Wed Jul 3 01:46:37 PDT 2013

  LGTM, I think this should go on in because I want to see most of my comments addressed in follow-up commits. This is largely a discussion of design and API which will certainly iterate and change over the next several months. Many of the comments I have now I suspect will ultimately revolve around the move toward Clang's diagnostic library being factored appropriately, and being capable if interoperating with tooling concepts such as replacements, etc., etc.... None of this makes sense to try to handle right now and before code gets committed when the feature is nicely isolated from the core libraries in a clear development area.

  The only things that I think would make some sense to fix before committing:

  1) Add FIXME comments where particularly helpful based on comments below. Probably only a few of these, most talking about a FIXME comment.
  2) Remove  dead code bits (weirdly public interface, dead context member, dead typedef)
  3) Fix the naming inconsistency, or other minor issues.

================
Comment at: clang-tidy/ClangTidy.h:29
@@ +28,3 @@
+///
+/// This is used as an intermediate format to transport Diagnostics without a
+/// dependency on a SourceManager. The goal is to make Diagnostics flexible
----------------
Manuel Klimek wrote:
> FIXME
Yea, please make this a FIXME.

================
Comment at: clang-tidy/ClangTidy.h:46
@@ +45,3 @@
+
+typedef SmallVector<ClangTidyError, 16> ClangTidyErrors;
+
----------------
This is still here...

================
Comment at: clang-tidy/ClangTidy.h:55
@@ +54,3 @@
+  /// This is still under heavy development and will likely change towards using
+  /// tablegen'd diagnostic IDs.
+  DiagnosticBuilder Diag(SourceLocation Loc, StringRef Message) {
----------------
Manuel Klimek wrote:
> Not sure how we'd want to manage ID spaces...
See Doug's comments about the organization of the checks.

I think we should plan for 90% of the check logic to be in a common place inside this tool, and then have conventions compose and configure them in particular ways. Even for deeply specific checking logic, I'm fine using naming conventions etc to manage them.

Then I think we should plan for the possibility of someone to mix a tree of potentially internal checkers which use getCustomDiagID or similar to deal with the lack of fixed IDs for their checks.

================
Comment at: clang-tidy/ClangTidy.h:61-64
@@ +60,6 @@
+
+  /// \brief Store a \c ClangTidyError.
+  ///
+  /// This should not be used directly.
+  void storeError(const ClangTidyError &Error) { Errors.push_back(Error); }
+
----------------
Why is this public?

================
Comment at: clang-tidy/ClangTidy.h:103-107
@@ +102,7 @@
+
+// FIXME: Implement confidence levels for displaying/fixing errors.
+//
+/// \brief Displays the found \p Errors to the users. If \p Fix is true, \p
+/// Errors containing fixes are automatically applied.
+void handleErrors(ClangTidyErrors &Errors, bool Fix);
+
----------------
This interface doesn't make much since to me at this stage, but that's probably OK. I'm just commetning that I think there is more that needs to be understood here than just the FIXME you have.

================
Comment at: clang-tidy/ClangTidy.h:48-49
@@ +47,4 @@
+
+/// \brief Every \c ClangTidyCheck reports errors through a \c DiagnosticEngine
+/// provided by this context.
+class ClangTidyContext {
----------------
Essentially every doxygen comment for an interface is this class feels really minimalist to me. Too much so. These interfaces are clearly intended to be used, potentially by clients that don't know what this is and are learning it by looking at these docs. I think we need to do more than just a brief summary. Usage examples and other reasonable documentation should be added to all of these reasonably public interfaces.

Note that ClangTidyError is actually the best documented, despite potentially going away in the long term. ;]

================
Comment at: clang-tidy/ClangTidy.h:82
@@ +81,3 @@
+
+  void SetContext(ClangTidyContext *Ctx) { Context = Ctx; }
+  virtual void registerPPCallbacks(CompilerInstance &Compiler) {}
----------------
setContext

================
Comment at: clang-tidy/ClangTidy.h:80
@@ +79,3 @@
+public:
+  virtual ~ClangTidyCheck() {}
+
----------------
Please provide anchors for these base classes with vtables.

================
Comment at: clang-tidy/ClangTidy.cpp:84
@@ +83,3 @@
+  SmallVectorImpl<NamedCheck> &Checks;
+  ClangTidyContext &Context;
+};
----------------
This member is dead...

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.h:26
@@ +25,3 @@
+
+class ClangTidyDiagnosticConsumer : public DiagnosticConsumer {
+public:
----------------
As I can't imagine any really useful way to re-use this, I wonder about making it a file-local class inside ClangTidy.cpp?

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.h:38-44
@@ +37,9 @@
+                                const Diagnostic &Info) {
+    tooling::Replacements Replacements;
+    SourceManager &SourceMgr = Info.getSourceManager();
+    for (unsigned i = 0, e = Info.getNumFixItHints(); i != e; ++i) {
+      Replacements.insert(tooling::Replacement(
+          SourceMgr, Info.getFixItHint(i).RemoveRange.getBegin(), 0,
+          Info.getFixItHint(i).CodeToInsert));
+    }
+    SmallString<100> Buf;
----------------
Unifying the concept of Replacements and FixItHints seems like a great thing to pull out into its own FIXME and to form one of the obvious steps along the path of moving all of this into a generically more useful diagnostic library.

================
Comment at: clang-tidy/ClangTidyModule.h:21-26
@@ +20,8 @@
+
+struct NamedCheck {
+  NamedCheck(StringRef Name, ClangTidyCheck *Check)
+      : Name(Name), Check(Check) {}
+  std::string Name;
+  ClangTidyCheck *Check;
+};
+
----------------
Wait, no documentation at all? ;] But this is like *the* API that you expect folks to be extending all the time.

So, what is this? How does one use it? Who owns the Check?

================
Comment at: clang-tidy/google/GoogleTidyModule.cpp:29
@@ +28,3 @@
+ExplicitConstructorCheck::registerMatchers(ast_matchers::MatchFinder *Finder) {
+  Finder->addMatcher(constructorDecl().bind("construct"), this);
+}
----------------
Some stuff that I see  how you probably want to do, but feel like would be really nice to include early on in the API documentation so that folks don't have to figure it out and to make sure everyone ends up on the same page:

- What happens if I add multiple matchers? Why might i want to do that? Why might i not want to do that?

- How can I get different callbacks to fire for different matchers? (The answer might be "you can't" rather than adding a way to do this, but then it should be justified.)

- I have several different matched bits of the AST and diagnostics produced for them that are logically and importantly part of the same check. How do I implement that?

- What is the lifetime of the Check? If I have a member that I use to keep track of state using AST node pointers, does that work? How does it break?

================
Comment at: clang-tidy/google/GoogleTidyModule.cpp:48-49
@@ +47,4 @@
+  virtual void addChecks(SmallVectorImpl<NamedCheck> &Checks) {
+    Checks.push_back(NamedCheck("google-explicit-constructor",
+                                new ExplicitConstructorCheck()));
+  }
----------------
Ahh, this is just an associative wrapper.

Any particular reasons to not use an associative container? Or to hide the association from the type system by using thin wrappers around the container that expose methods with two arguments?

================
Comment at: clang-tidy/google/GoogleTidyModule.cpp:49
@@ +48,3 @@
+    Checks.push_back(NamedCheck("google-explicit-constructor",
+                                new ExplicitConstructorCheck()));
+  }
----------------
Here is an interesting question for me long-term: why are checks allocated on the heap once in the module when the module runs over many TUs potentially? Is that the right design?

I can imagine a reasonable number of checks that will want per-TU or per-AST state and other persistent behavior. To me, it would make more sense for the Check object to have a lifetime that starts after we have the AST that we're checking, and ends before we move on to the next AST. Then to have a CheckFactory of some kind that is used to build them. And here, we don't register the check itself, we register the factory.

Finally, for any registry thing like this, I would always prefer value semantics where we store the thing by value rather than by a raw pointer.

================
Comment at: unittests/clang-tidy/ClangTidyTest.cpp:16-17
@@ +15,4 @@
+
+TEST(ClangTidyTest, Basic) {
+}
+
----------------
Without some context, this doesn't make much sense. ;]

================
Comment at: unittests/clang-tidy/GoogleModuleTest.cpp:16-17
@@ +15,4 @@
+TEST_F(ExplicitConstructorCheckTest, Basic) {
+  EXPECT_EQ("class C { explicit C(int i); };",
+            runCheckOn("class C { C(int i); };"));
+}
----------------
Ugh, this testing machinery is ... pretty gross.

We should really break down and pull GoogleMock into Clang and LLVM so we can use matchers instead of burying the logic. Specifically, it would be much nicer to just call the function to run the checks on the code, get the object back, and have the match select whether it tested by applying the fixes or by inspecting the diagnostic messages, etc.

As it is, there is no really good way to test the diagnostic messages produced, only the fixes applied to the source code.

I think eventually you'll need a better testing system than this. I think the ideal testing system would be something like what we use for clang -fixit where we first run the diagnostic verifier which uses one set of comments in the code to validate diagnostics produced, and then we run again and hand the result to FileCheck which uses the CHECK comments to make assertions about the transformed source.

I think the command line tool should already work to write the second kind of test, and it would be equivalent to this but (imo) somewhat more in keeping with the general Clang testing processes. The first should be easy to add to the command line tool when the diagnostic engine is factored out of Clang's innards and made into a stand-alone useful library as part of that will also make the verifier available. Then it's just minor wiring to have a --verify flag.

================
Comment at: unittests/clang-tidy/LLVMModuleTest.cpp:8
@@ +7,3 @@
+
+typedef ClangTidyTest<NamespaceCommentCheck> NamespaceCommentCheckTest;
+
----------------
No tests for the include order stuff yet? Guess that's fine, I'd rather see the better test infrastructure.

http://llvm-reviews.chandlerc.com/D884