[cfe-commits] [PATCH] Implements support to run standalone tools

Wed Mar 21 06:25:01 PDT 2012

So, I've looked through the design ideas and I have a few follow up
questions / remarks.

On Wed, Mar 14, 2012 at 1:35 PM, Douglas Gregor <dgregor at apple.com> wrote:
> I suggest a more general architecture that separates out the "program build" database from the code that parses code and runs the action on that code. To be a bit more concrete, I'm suggesting splitting ClangTool into two subsystems:

Just to reiterate: I think this is a great idea.

>        - An abstract class CompilationDatabase that allows one to iterate over the CompileCommands that went into a particular build (or apply some function object to each CompileCommand), along with:
>                - A subclass JSONCompilationDatabase that retrieves CompileCommands from the JSON output produced by CMake
>                - A subclass SimpleCompilationDatabase where one programmatically adds CompileCommands
>                - Note: it should be possible for us to create additional compilation database kinds, e.g., by parsing makefiles directly
>                - A function that makes it easy to take argc/argv an interpret it either as a use of JSONCompilationDatabase or SimpleCompilationDatabase. This should make it easy to create a unified command-line interface to Tooling-based Clang tools, so it's easy for end users to run different tools on their projects

I'm not sure why we'd need argc/argv here. I'd propose a function like
(stealing from what you wrote further down, slightly changing it):
/// \brief Loads a compilation database from a build directory.
///
/// Looks at the specified 'BuildDirectory' and creates a compilation
database that allows to query compile commands for source files in the
corresponding source tree.
/// Returns NULL if we were not able to build up a compilation
database for the build directory.
/// FIXME: Currently only supports JSON compilation databases, which
are named 'compile_commands.json' in the given directory.
CompilationDatabase *loadCompilationDatabase(StringRef BuildDirectory);

>                - A function that writes a CompilationDatabase in JSON format, so that it can be loaded again. (Not immediately critical, but useful nonetheless)
>
>        - A class ("Tooling"? "Tool"?) that encapsulates the environment in which the tool will be executed along with the ability to actually do the execution, e.g.,
>                - mapVirtualFiles, as in ClangTool
>                - a run function that runs a single CompileCommand with a frontend action factory
>                - a run function that runs every CompileCommand in the given program database with a frontend action factory
>                - a run function that compiles a given code string (+ command-line options, presumably) with a frontend action factory, like runSyntaxOnlyToolOnCode does

Why is ClangTool not exactly that (apart from ClangTool doing too much
in its constructor currently)?

> Now, there's no reason why we can't still have two-line syntax checkers with this architecture. It might look like this:
>
>  int main(int argc, char **argv) {
>    llvm::OwningPtr<CompilationDatabase> Database(loadCompilationDatabase(argc, argv));
>    return Tool().run(newFrontendActionFactory<clang::SyntaxOnlyAction>(), *Database);
>  }

Ah, here we're missing which files to run over, which I think should
not be the responsibility of the CompilationDatabase. I actually also
dislike the current handling of command line parameters by the
ClangTool, as it makes it hard for a tool to add its own special
parameters. I'd want tools in the clang code base to use the llvm
command line handling stuff. In the end I want to write something like
this:

cl::opt<std::string> BuildDirectory(cl::Positional, cl::desc("..."));
cl::list<string> FileNames(cl::Positional, cl::desc("..."), cl::OneOrMore);
int main(int argc, char **argv) {
  cl::ParseCommandLineOptions(argc, argv, ...);
  llvm::OwningPtr<CompilationDatabase>
Database(loadCompilationDatabase(BuildDirectory));
  // Question: how do we do error handling here? For example, nice
output why the compilation database couldn't be loaded?
  ClangTool Tool(Database.get(), FileNames);
  Tool.Run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
}

Thoughts?

> +  /// \brief Returns the file manager used in the tool.
> +  ///
> +  /// The file manager is shared between all translation units.
> +  FileManager &getFiles() { return Files; }
>
> Given that the FileManager is exposed, should mapVirtualFile deal in FileEntry*'s? (Or, at least, have an overload that uses a FileEntry*?)

I looked at why I exposed the FileManager. The reason is that later
we'll have a refactoring framework that I'd like to look something
like this:
/// \brief A tool to run refactorings.
///
/// This is a refactoring specific version of \see ClangTool.
/// All text replacements added to GetReplacements() during the run of the
/// tool will be applied and saved after all translation units have been
/// processed.
class RefactoringTool {
public:
  /// \brief Returns a set of replacements. All replacements added during the
  /// run of the tool will be applied after all translation units have been
  /// processed.
  Replacements &GetReplacements();

  /// \see ClangTool::Run.
  int Run(FrontendActionFactory *ActionFactory);

private:
  ClangTool Tool;
  Replacements Replace;
};

If we're fine with having things like this as a subclass of ClangTool
(I'm fine with that, had it that way, and one of my
inheritance-avoiding coworkers made me change it) we can make GetFiles
protected. The reason we need it is that for refactorings we'll need
to create a SourceManager in the end that's used just for the
refactorings (SourceManagers are otherwise TU specific). If you have
ideas for different strategies how to solve this I'm all ears :)

Thanks,
/Manuel