[cfe-commits] [PATCH] Implements support to run standalone tools

Wed Mar 14 05:35:01 PDT 2012

Hi Manuel,

I finally found some free cycles to dig into this. 

I love the functionality provided by the tooling infrastructure, and it's really nice to be able to easily create a single process that churns through all of the translation units in a program to perform some analysis/refactoring/whatever. There are also some little gems hidden in here (such as being able to easily syntax-check code supplied by a string).

My major concern with the tooling patch is that it's tailored for the case of building tools that run over the JSON database emitted by CMake. It's great to make that case work smoothly, but the tooling infrastructure as presented does not adequately support users who aren't already using CMake. That immediately eliminates the vast majority of potential users of the tooling infrastructure, because while CMake is popular, it's nowhere near dominant, and the barrier to entry for switching one's project over to CMake is quite possibly higher than the barrier to entry for building a tool w/o the tooling infrastructure that will work on non-CMake projects.

I suggest a more general architecture that separates out the "program build" database from the code that parses code and runs the action on that code. To be a bit more concrete, I'm suggesting splitting ClangTool into two subsystems:

	- An abstract class CompilationDatabase that allows one to iterate over the CompileCommands that went into a particular build (or apply some function object to each CompileCommand), along with:
		- A subclass JSONCompilationDatabase that retrieves CompileCommands from the JSON output produced by CMake
		- A subclass SimpleCompilationDatabase where one programmatically adds CompileCommands
		- Note: it should be possible for us to create additional compilation database kinds, e.g., by parsing makefiles directly
		- A function that makes it easy to take argc/argv an interpret it either as a use of JSONCompilationDatabase or SimpleCompilationDatabase. This should make it easy to create a unified command-line interface to Tooling-based Clang tools, so it's easy for end users to run different tools on their projects
		- A function that writes a CompilationDatabase in JSON format, so that it can be loaded again. (Not immediately critical, but useful nonetheless)

	- A class ("Tooling"? "Tool"?) that encapsulates the environment in which the tool will be executed along with the ability to actually do the execution, e.g.,
		- mapVirtualFiles, as in ClangTool
		- a run function that runs a single CompileCommand with a frontend action factory
		- a run function that runs every CompileCommand in the given program database with a frontend action factory
		- a run function that compiles a given code string (+ command-line options, presumably) with a frontend action factory, like runSyntaxOnlyToolOnCode does

Now, there's no reason why we can't still have two-line syntax checkers with this architecture. It might look like this:

  int main(int argc, char **argv) {
    llvm::OwningPtr<CompilationDatabase> Database(loadCompilationDatabase(argc, argv));
    return Tool().run(newFrontendActionFactory<clang::SyntaxOnlyAction>(), *Database);
  }

Now, to actually make tooling useful to non-CMake users, we'd want to make it possible to create a JSON compilation database without involving CMake at all. I suggest that we add a Clang command-line flag "-fcompilation-database=<filename>" and have the driver update <filename> with each compilation command it is invoked with. That way, one could do a normal project build with, e.g.,

	make CXX=clang++ CXXFLAGS="-fcompilation-database=compile_commands.json"

and then, later, run a tool over compile_commands.json. That way, the only barrier to entry is the need to compile with Clang, but I think that's perfectly acceptable. This last bit isn't strictly required for adding tooling into Clang, but without it, tooling will still only serve a relatively small fraction of its potential user base.

A few more detailed comments:

+/// \brief Interface to generate clang::FrontendActions.
+class FrontendActionFactory {
+public:
+  virtual ~FrontendActionFactory();
+
+  /// \brief Returns a new clang::FrontendAction.
+  ///
+  /// The caller takes ownership of the returned action.
+  virtual clang::FrontendAction *create() = 0;
+};

Do we even need this class? I think I'd rather that the various run() functions take a function object that returns a FrontendAction*, so that users can just pass in a lambda:

	return Tool().run([] { return new clang::SyntaxOnlyAction; }, *Database);

Naturally, we'll have to do some kind of std::function-like trick to avoid making every "run" function into a template, but that's relatively simple.

+std::vector<std::string> unescapeJsonCommandLine(
+    StringRef JsonEscapedCommandLine);

This is essentially just an internal utility function, and probably doesn't belong in Tooling.h.

+CompileCommand findCompileArgsInJsonDatabase(
+    StringRef FileName, StringRef JsonDatabase,
+    std::string &ErrorMessage);

Seems like a good query function for JSONCompilationDatabase.

+  /// \brief Map virtual files to be used while running the tool.
+  ///
+  /// \param FileContents A map from file names to the file content of the
+  /// mapped virtual files.
+  void mapVirtualFiles(const std::map<StringRef, StringRef> &FileContents);

I'm not a big fan of the use of std::map header. Why not just have a function

	void mapVirtualFile(StringRef FileName, StringRef Contents);

and have callers iterate, rather than having callers build a special data structure?

+  /// \brief Returns the file manager used in the tool.
+  ///
+  /// The file manager is shared between all translation units.
+  FileManager &getFiles() { return Files; }

Given that the FileManager is exposed, should mapVirtualFile deal in FileEntry*'s? (Or, at least, have an overload that uses a FileEntry*?)

+  // Create the compilers actual diagnostics engine.
+  Compiler.createDiagnostics(CC1Args.size(),
+                             const_cast<char**>(CC1Args.data()));

Should the tooling infrastructure provide a way to deal with diagnostics?

+using clang::tooling::ClangTool;
+using clang::tooling::newFrontendActionFactory;
+
+int main(int argc, char **argv) {
+  ClangTool Tool(argc, argv);
+  return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
+}

I feel like the tooling infrastructure would benefit from a slightly more involved example, especially one that benefits from running over all files in a project within the same process. Perhaps that isn't easy until we have an AST matching framework?

--- /dev/null
+++ b/unittests/Tooling/ToolingTest.cpp
@@ -0,0 +1,289 @@

THANK YOU!

We should settle on the interfaces first, but there should be a Tooling tutorial on clang.llvm.org to help users get started with the tooling infrastructure.

	- Doug

On Mar 7, 2012, at 7:20 AM, Manuel Klimek wrote:

> On Tue, Mar 6, 2012 at 11:22 PM, Marshall Clow <mclow.lists at gmail.com> wrote:
>> On Feb 13, 2012, at 1:05 AM, Manuel Klimek wrote:
>> 
>>> Ping.
>>> 
>>> Rebased patch + fixed some nits from Nick.
>> 
>> 
>> Two concerns in Tooling.cpp:
>> 
>> 1)
>>>       /// \brief Converts a string vector representing a Command line into a C
>>>       /// string vector representing the Argv (including the trailing NULL).
>>>       std::vector<char*> commandLineToArgv(ArrayRef<std::string> Command) {
>>>         std::vector<char*> Result(Command.size() + 1);
>>>         for (std::vector<char*>::size_type I = 0, E = Command.size(); I != E; ++I) {
>>>           Result[I] = const_cast<char*>(Command[I].c_str());
>>>         }
>>>         Result[Command.size()] = NULL;
>>>         return Result;
>>>       }
>>> 
>> This code does two things that bother me.
>> a) It calls c_str() on a bunch of strings, and then saves the pointers. The pointers that result from calling c_str() are only valid as long as the underlying strings are unchanged.
>> b) Then the code casts away the const-ness of the pointers. If someone attempts to change those strings, that's undefined behavior.
> 
> The main thing that bothered me about that code is that it was
> actually only needed in one place, so I inlined it :)
> 
>> 2)
>>>       bool ToolInvocation::runInvocation(
>>>           const char *BinaryName,
>>>           clang::driver::Compilation *Compilation,
>>>           clang::CompilerInvocation *Invocation,
>>>           const clang::driver::ArgStringList &CC1Args,
>>>           clang::FrontendAction *ToolAction) {
>>> 
>>>         llvm::OwningPtr<clang::FrontendAction> ScopedToolAction(ToolAction);
>>>         // Show the invocation, with -v.
>>>         if (Invocation->getHeaderSearchOpts().Verbose) {
>>>           llvm::errs() << "clang Invocation:\n";
>>>           Compilation->PrintJob(llvm::errs(), Compilation->getJobs(), "\n", true);
>>>           llvm::errs() << "\n";
>>>         }
>>> 
>> 
>> 
>> When I am reading (and writing) code, and I see an object passed as a pointer, I figure that it's optional; it can be NULL - otherwise it would be passed by reference.
> 
> Interesting. When I am reading code and see an object passed as a
> pointer, I see that as a sign that the pointer is retained past the
> call of the function. When using references in that case it's
> unfortunately not visible from the call site that the reference is
> retained.
> 
> Please find an updated patch attached...
> 
> Cheers,
> /Manuel
> 
>> 
>> Other than that, this patch LGTM!
>> 
>> -- Marshall
>> 
>> Marshall Clow     Idio Software   <mailto:mclow.lists at gmail.com>
>> 
>> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait).
>>        -- Yu Suzuki
>> 
> <tooling.patch>_______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits