[cfe-commits] [PATCH] Implements support to run standalone tools

Wed Mar 14 11:05:50 PDT 2012

On Mar 14, 2012, at 8:35 AM, Manuel Klimek wrote:

> Hi Doug,
> 
> first, a big thank you for finding the time to review. I completely
> agree with everything you write. Especially getting the tooling
> infrastructure into a more non-cmake centric shape is something that
> was already high on the TODO list, but I didn't want to make further
> changes before getting first feedback, because I figured reviewing
> something that still constantly changes is no fun for you ... Which
> brings me to a central question - which of the changes would you want
> to see finished before getting a first version in? Especially making
> the command line handling more orthogonal is something I'd like to do
> in lock-step with our google-internal build system handling (which is
> not cmake), and thus I think would be a great use case to make sure we
> don't miss anything. Of course I totally understand if you're not
> comfortable checking in the current design, and I guess that assuring
> you that getting the tooling infrastructure into a shape that the
> community is happy with is my top priority might not help here ;) On
> the other hand, for various reasons iterating inside the repository
> would make me quite a bit more productive, so I'll still ask :)

I'd like to see ClangTool split into CompilationDatabase and Tool (or whatever they're called) before it goes in, because I don't like committing something with an architecture that we know will change. Anything else (-fcompilation-database, simple compilation database, convenience functions, change from FrontendActionFactory to function object) can go in later.

> Some more comments inline...
> 
> On Wed, Mar 14, 2012 at 1:35 PM, Douglas Gregor <dgregor at apple.com> wrote:
>> Hi Manuel,
>> 
>> I finally found some free cycles to dig into this.
>> 
>> I love the functionality provided by the tooling infrastructure, and it's really nice to be able to easily create a single process that churns through all of the translation units in a program to perform some analysis/refactoring/whatever. There are also some little gems hidden in here (such as being able to easily syntax-check code supplied by a string).
>> 
>> My major concern with the tooling patch is that it's tailored for the case of building tools that run over the JSON database emitted by CMake. It's great to make that case work smoothly, but the tooling infrastructure as presented does not adequately support users who aren't already using CMake. That immediately eliminates the vast majority of potential users of the tooling infrastructure, because while CMake is popular, it's nowhere near dominant, and the barrier to entry for switching one's project over to CMake is quite possibly higher than the barrier to entry for building a tool w/o the tooling infrastructure that will work on non-CMake projects.
>> 
>> I suggest a more general architecture that separates out the "program build" database from the code that parses code and runs the action on that code. To be a bit more concrete, I'm suggesting splitting ClangTool into two subsystems:
>> 
>>        - An abstract class CompilationDatabase that allows one to iterate over the CompileCommands that went into a particular build (or apply some function object to each CompileCommand), along with:
>>                - A subclass JSONCompilationDatabase that retrieves CompileCommands from the JSON output produced by CMake
>>                - A subclass SimpleCompilationDatabase where one programmatically adds CompileCommands
>>                - Note: it should be possible for us to create additional compilation database kinds, e.g., by parsing makefiles directly
>>                - A function that makes it easy to take argc/argv an interpret it either as a use of JSONCompilationDatabase or SimpleCompilationDatabase. This should make it easy to create a unified command-line interface to Tooling-based Clang tools, so it's easy for end users to run different tools on their projects
>>                - A function that writes a CompilationDatabase in JSON format, so that it can be loaded again. (Not immediately critical, but useful nonetheless)
>> 
>>        - A class ("Tooling"? "Tool"?) that encapsulates the environment in which the tool will be executed along with the ability to actually do the execution, e.g.,
>>                - mapVirtualFiles, as in ClangTool
>>                - a run function that runs a single CompileCommand with a frontend action factory
>>                - a run function that runs every CompileCommand in the given program database with a frontend action factory
>>                - a run function that compiles a given code string (+ command-line options, presumably) with a frontend action factory, like runSyntaxOnlyToolOnCode does
> 
> I'm not sure I understand this correctly - compiling a given code
> string is done by mapping it in as a virtual file with a specific name
> and adding some special command default command line arguments.
> Perhaps you are imagining a different use case, but for me
> runSyntaxOnlyToolOnCode is first and foremost a unit testing help.

It's fairly common for someone to ask, "how do I compile some code that I have in a std::string?" runSyntaxOnlyToolOnCode starts to solve that problem. I mentioned command-line arguments simply because one will need to be able to specify (at least) language dialect options for this parse, since we have no file name to go on.

>> Now, there's no reason why we can't still have two-line syntax checkers with this architecture. It might look like this:
>> 
>>  int main(int argc, char **argv) {
>>    llvm::OwningPtr<CompilationDatabase> Database(loadCompilationDatabase(argc, argv));
>>    return Tool().run(newFrontendActionFactory<clang::SyntaxOnlyAction>(), *Database);
>>  }
>> 
>> Now, to actually make tooling useful to non-CMake users, we'd want to make it possible to create a JSON compilation database without involving CMake at all. I suggest that we add a Clang command-line flag "-fcompilation-database=<filename>" and have the driver update <filename> with each compilation command it is invoked with. That way, one could do a normal project build with, e.g.,
>> 
>>        make CXX=clang++ CXXFLAGS="-fcompilation-database=compile_commands.json"
> 
> I actually already have written something similar (well, it's
> currently a python wrapper which might be a bad idea, but it was
> really simple). The design I have is that we output a little JSON
> snippet next to each .o file. The problem with handing the compilation
> database to each compile step is concurrent runs - how would you
> propose to sync on the database? I can't imagine implementing the sync
> would be worth the cost, but I'm happy to be convinced otherwise…

POSIX file locks are fine; as I noted in the conversion with Chandler, most developers have projects small enough that the overhead won't matter to them, but simplicity and extremely important.

>> +  /// \brief Returns the file manager used in the tool.
>> +  ///
>> +  /// The file manager is shared between all translation units.
>> +  FileManager &getFiles() { return Files; }
>> 
>> Given that the FileManager is exposed, should mapVirtualFile deal in FileEntry*'s? (Or, at least, have an overload that uses a FileEntry*?)
> 
> /me wonders why he exposed the FileManager at all - one interesting
> part that's missing from the current CL is one of the next steps that
> allows us to do refactorings (if you're interested, the current
> incarnation is at:
> http://llvm.org/viewvc/llvm-project/cfe/branches/tooling/include/clang/Tooling/Refactoring.h?view=markup
> with an example tool at
> http://llvm.org/viewvc/llvm-project/cfe/branches/tooling/tools/remove-cstr-calls/RemoveCStrCalls.cpp?view=markup)
> 
> I'll investigate. I don't want users to have to deal with
> FileEntry*'s, and rather keep the interface as simple as possible.
> Unless you have a specific use case?

No use case in mind. I'd be just as happy (heck, probably happier) if the FileManager wasn't exposed at all.

>> +  // Create the compilers actual diagnostics engine.
>> +  Compiler.createDiagnostics(CC1Args.size(),
>> +                             const_cast<char**>(CC1Args.data()));
>> 
>> Should the tooling infrastructure provide a way to deal with diagnostics?
> 
> I have not really thought of that. We probably want to be able to
> inject diagnostics at some point (we have some diagnostic mapreduces
> internally when we iterate on implementing new diagnostics).

Okay. 

>> +using clang::tooling::ClangTool;
>> +using clang::tooling::newFrontendActionFactory;
>> +
>> +int main(int argc, char **argv) {
>> +  ClangTool Tool(argc, argv);
>> +  return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
>> +}
>> 
>> I feel like the tooling infrastructure would benefit from a slightly more involved example, especially one that benefits from running over all files in a project within the same process. Perhaps that isn't easy until we have an AST matching framework?
> 
> The AST matching framework will actually not help much here - this
> becomes very interesting when you do real refactorings (see the
> example I cited above), because then we need to deduplicate edits at
> the end of running over all translation units.

I was thinking of something a bit more simple. For example, finding all references to a given global variable within a project.

	- Doug