[cfe-commits] [PATCH] Implements support to run standalone tools

Sun Mar 25 22:17:40 PDT 2012

Sent from my iPhone

On Mar 21, 2012, at 6:25 AM, Manuel Klimek <klimek at google.com> wrote:

> So, I've looked through the design ideas and I have a few follow up
> questions / remarks.
> 
> On Wed, Mar 14, 2012 at 1:35 PM, Douglas Gregor <dgregor at apple.com> wrote:
>> I suggest a more general architecture that separates out the "program build" database from the code that parses code and runs the action on that code. To be a bit more concrete, I'm suggesting splitting ClangTool into two subsystems:
> 
> Just to reiterate: I think this is a great idea.
> 
>>       - An abstract class CompilationDatabase that allows one to iterate over the CompileCommands that went into a particular build (or apply some function object to each CompileCommand), along with:
>>               - A subclass JSONCompilationDatabase that retrieves CompileCommands from the JSON output produced by CMake
>>               - A subclass SimpleCompilationDatabase where one programmatically adds CompileCommands
>>               - Note: it should be possible for us to create additional compilation database kinds, e.g., by parsing makefiles directly
>>               - A function that makes it easy to take argc/argv an interpret it either as a use of JSONCompilationDatabase or SimpleCompilationDatabase. This should make it easy to create a unified command-line interface to Tooling-based Clang tools, so it's easy for end users to run different tools on their projects
> 
> I'm not sure why we'd need argc/argv here. I'd propose a function like
> (stealing from what you wrote further down, slightly changing it):
> /// \brief Loads a compilation database from a build directory.
> ///
> /// Looks at the specified 'BuildDirectory' and creates a compilation
> database that allows to query compile commands for source files in the
> corresponding source tree.
> /// Returns NULL if we were not able to build up a compilation
> database for the build directory.
> /// FIXME: Currently only supports JSON compilation databases, which
> are named 'compile_commands.json' in the given directory.
> CompilationDatabase *loadCompilationDatabase(StringRef BuildDirectory);

Discussed elsewhere, but I agree this is a reasonable interface for JSON compilation databases. For my understanding, why is this a directory rather than a single file for the JSON case?

>>               - A function that writes a CompilationDatabase in JSON format, so that it can be loaded again. (Not immediately critical, but useful nonetheless)
>> 
>>       - A class ("Tooling"? "Tool"?) that encapsulates the environment in which the tool will be executed along with the ability to actually do the execution, e.g.,
>>               - mapVirtualFiles, as in ClangTool
>>               - a run function that runs a single CompileCommand with a frontend action factory
>>               - a run function that runs every CompileCommand in the given program database with a frontend action factory
>>               - a run function that compiles a given code string (+ command-line options, presumably) with a frontend action factory, like runSyntaxOnlyToolOnCode does
> 
> Why is ClangTool not exactly that (apart from ClangTool doing too much
> in its constructor currently)?

I think this is just ClangTool, with the compilation database separated out. 

>> Now, there's no reason why we can't still have two-line syntax checkers with this architecture. It might look like this:
>> 
>> int main(int argc, char **argv) {
>>   llvm::OwningPtr<CompilationDatabase> Database(loadCompilationDatabase(argc, argv));
>>   return Tool().run(newFrontendActionFactory<clang::SyntaxOnlyAction>(), *Database);
>> }
> 
> Ah, here we're missing which files to run over, which I think should
> not be the responsibility of the CompilationDatabase.

Yes, I agree. The database can tell how to "build" each file, and enumerate them, but something else should decide which files matter. 

> I actually also
> dislike the current handling of command line parameters by the
> ClangTool, as it makes it hard for a tool to add its own special
> parameters. I'd want tools in the clang code base to use the llvm
> command line handling stuff. In the end I want to write something like
> this:
> 
> cl::opt<std::string> BuildDirectory(cl::Positional, cl::desc("..."));
> cl::list<string> FileNames(cl::Positional, cl::desc("..."), cl::OneOrMore);
> int main(int argc, char **argv) {
> cl::ParseCommandLineOptions(argc, argv, ...);
> llvm::OwningPtr<CompilationDatabase>
> Database(loadCompilationDatabase(BuildDirectory));
> // Question: how do we do error handling here? For example, nice
> output why the compilation database couldn't be loaded?
> ClangTool Tool(Database.get(), FileNames);
> Tool.Run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
> }
> 
> Thoughts?

Nothing leaps to mind (sorry), but I agree that it's important to make it easy to add these arguments and that it makes sense to use LLVM's facilities. 

>> +  /// \brief Returns the file manager used in the tool.
>> +  ///
>> +  /// The file manager is shared between all translation units.
>> +  FileManager &getFiles() { return Files; }
>> 
>> Given that the FileManager is exposed, should mapVirtualFile deal in FileEntry*'s? (Or, at least, have an overload that uses a FileEntry*?)
> 
> I looked at why I exposed the FileManager. The reason is that later
> we'll have a refactoring framework that I'd like to look something
> like this:
> /// \brief A tool to run refactorings.
> ///
> /// This is a refactoring specific version of \see ClangTool.
> /// All text replacements added to GetReplacements() during the run of the
> /// tool will be applied and saved after all translation units have been
> /// processed.
> class RefactoringTool {
> public:
> /// \brief Returns a set of replacements. All replacements added during the
> /// run of the tool will be applied after all translation units have been
> /// processed.
> Replacements &GetReplacements();
> 
> /// \see ClangTool::Run.
> int Run(FrontendActionFactory *ActionFactory);
> 
> private:
> ClangTool Tool;
> Replacements Replace;
> };
> 
> If we're fine with having things like this as a subclass of ClangTool
> (I'm fine with that, had it that way, and one of my
> inheritance-avoiding coworkers made me change it) we can make GetFiles
> protected. The reason we need it is that for refactorings we'll need
> to create a SourceManager in the end that's used just for the
> refactorings (SourceManagers are otherwise TU specific). If you have
> ideas for different strategies how to solve this I'm all ears :)

Unless users will typically be expected to subclass ClangTool (which doesn't seem like part of your design), I think it's better to just expose the FileManager.