[cfe-commits] [PATCH] Implements support to run standalone tools

Mon Mar 26 01:25:23 PDT 2012

On Mon, Mar 26, 2012 at 7:17 AM, Douglas Gregor <dgregor at apple.com> wrote:
> Sent from my iPhone
>
> On Mar 21, 2012, at 6:25 AM, Manuel Klimek <klimek at google.com> wrote:
>
>> So, I've looked through the design ideas and I have a few follow up
>> questions / remarks.
>>
>> On Wed, Mar 14, 2012 at 1:35 PM, Douglas Gregor <dgregor at apple.com> wrote:
>>> I suggest a more general architecture that separates out the "program build" database from the code that parses code and runs the action on that code. To be a bit more concrete, I'm suggesting splitting ClangTool into two subsystems:
>>
>> Just to reiterate: I think this is a great idea.
>>
>>>       - An abstract class CompilationDatabase that allows one to iterate over the CompileCommands that went into a particular build (or apply some function object to each CompileCommand), along with:
>>>               - A subclass JSONCompilationDatabase that retrieves CompileCommands from the JSON output produced by CMake
>>>               - A subclass SimpleCompilationDatabase where one programmatically adds CompileCommands
>>>               - Note: it should be possible for us to create additional compilation database kinds, e.g., by parsing makefiles directly
>>>               - A function that makes it easy to take argc/argv an interpret it either as a use of JSONCompilationDatabase or SimpleCompilationDatabase. This should make it easy to create a unified command-line interface to Tooling-based Clang tools, so it's easy for end users to run different tools on their projects
>>
>> I'm not sure why we'd need argc/argv here. I'd propose a function like
>> (stealing from what you wrote further down, slightly changing it):
>> /// \brief Loads a compilation database from a build directory.
>> ///
>> /// Looks at the specified 'BuildDirectory' and creates a compilation
>> database that allows to query compile commands for source files in the
>> corresponding source tree.
>> /// Returns NULL if we were not able to build up a compilation
>> database for the build directory.
>> /// FIXME: Currently only supports JSON compilation databases, which
>> are named 'compile_commands.json' in the given directory.
>> CompilationDatabase *loadCompilationDatabase(StringRef BuildDirectory);
>
> Discussed elsewhere, but I agree this is a reasonable interface for JSON compilation databases. For my understanding, why is this a directory rather than a single file for the JSON case?

Hm, I had hoped that my comment would explain that, but apparently
there's something missing. Not sure how to rephrase. The idea is that
when we implement other ways to create compilation databases we'll
automatically detect them from the build directory.

If we're only talking JSON, then yes, I'd expect an interface like
JSONCompilationDatabase *loadJSONCompilationDatabase(StringRef
DatabaseFileName) (with some way of expressing errors).

>>>               - A function that writes a CompilationDatabase in JSON format, so that it can be loaded again. (Not immediately critical, but useful nonetheless)
>>>
>>>       - A class ("Tooling"? "Tool"?) that encapsulates the environment in which the tool will be executed along with the ability to actually do the execution, e.g.,
>>>               - mapVirtualFiles, as in ClangTool
>>>               - a run function that runs a single CompileCommand with a frontend action factory
>>>               - a run function that runs every CompileCommand in the given program database with a frontend action factory
>>>               - a run function that compiles a given code string (+ command-line options, presumably) with a frontend action factory, like runSyntaxOnlyToolOnCode does
>>
>> Why is ClangTool not exactly that (apart from ClangTool doing too much
>> in its constructor currently)?
>
> I think this is just ClangTool, with the compilation database separated out.
>
>>> Now, there's no reason why we can't still have two-line syntax checkers with this architecture. It might look like this:
>>>
>>> int main(int argc, char **argv) {
>>>   llvm::OwningPtr<CompilationDatabase> Database(loadCompilationDatabase(argc, argv));
>>>   return Tool().run(newFrontendActionFactory<clang::SyntaxOnlyAction>(), *Database);
>>> }
>>
>> Ah, here we're missing which files to run over, which I think should
>> not be the responsibility of the CompilationDatabase.
>
> Yes, I agree. The database can tell how to "build" each file, and enumerate them, but something else should decide which files matter.
>
>> I actually also
>> dislike the current handling of command line parameters by the
>> ClangTool, as it makes it hard for a tool to add its own special
>> parameters. I'd want tools in the clang code base to use the llvm
>> command line handling stuff. In the end I want to write something like
>> this:
>>
>> cl::opt<std::string> BuildDirectory(cl::Positional, cl::desc("..."));
>> cl::list<string> FileNames(cl::Positional, cl::desc("..."), cl::OneOrMore);
>> int main(int argc, char **argv) {
>> cl::ParseCommandLineOptions(argc, argv, ...);
>> llvm::OwningPtr<CompilationDatabase>
>> Database(loadCompilationDatabase(BuildDirectory));
>> // Question: how do we do error handling here? For example, nice
>> output why the compilation database couldn't be loaded?
>> ClangTool Tool(Database.get(), FileNames);
>> Tool.Run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
>> }
>>
>> Thoughts?
>
> Nothing leaps to mind (sorry), but I agree that it's important to make it easy to add these arguments and that it makes sense to use LLVM's facilities.
>
>>> +  /// \brief Returns the file manager used in the tool.
>>> +  ///
>>> +  /// The file manager is shared between all translation units.
>>> +  FileManager &getFiles() { return Files; }
>>>
>>> Given that the FileManager is exposed, should mapVirtualFile deal in FileEntry*'s? (Or, at least, have an overload that uses a FileEntry*?)
>>
>> I looked at why I exposed the FileManager. The reason is that later
>> we'll have a refactoring framework that I'd like to look something
>> like this:
>> /// \brief A tool to run refactorings.
>> ///
>> /// This is a refactoring specific version of \see ClangTool.
>> /// All text replacements added to GetReplacements() during the run of the
>> /// tool will be applied and saved after all translation units have been
>> /// processed.
>> class RefactoringTool {
>> public:
>> /// \brief Returns a set of replacements. All replacements added during the
>> /// run of the tool will be applied after all translation units have been
>> /// processed.
>> Replacements &GetReplacements();
>>
>> /// \see ClangTool::Run.
>> int Run(FrontendActionFactory *ActionFactory);
>>
>> private:
>> ClangTool Tool;
>> Replacements Replace;
>> };
>>
>> If we're fine with having things like this as a subclass of ClangTool
>> (I'm fine with that, had it that way, and one of my
>> inheritance-avoiding coworkers made me change it) we can make GetFiles
>> protected. The reason we need it is that for refactorings we'll need
>> to create a SourceManager in the end that's used just for the
>> refactorings (SourceManagers are otherwise TU specific). If you have
>> ideas for different strategies how to solve this I'm all ears :)
>
> Unless users will typically be expected to subclass ClangTool (which doesn't seem like part of your design), I think it's better to just expose the FileManager.

The subclassing wouldn't be so much about users as about extensions of
functionality (like building refactoring infrastructure on top of the
ClangTool, like a RefactoringTool that allows you to add changes which
you can later apply).