[cfe-commits] [PATCH] Implements support to run standalone tools

Manuel Klimek klimek at google.com
Wed Mar 14 08:35:53 PDT 2012

Hi Doug,

first, a big thank you for finding the time to review. I completely
agree with everything you write. Especially getting the tooling
infrastructure into a more non-cmake centric shape is something that
was already high on the TODO list, but I didn't want to make further
changes before getting first feedback, because I figured reviewing
something that still constantly changes is no fun for you ... Which
brings me to a central question - which of the changes would you want
to see finished before getting a first version in? Especially making
the command line handling more orthogonal is something I'd like to do
in lock-step with our google-internal build system handling (which is
not cmake), and thus I think would be a great use case to make sure we
don't miss anything. Of course I totally understand if you're not
comfortable checking in the current design, and I guess that assuring
you that getting the tooling infrastructure into a shape that the
community is happy with is my top priority might not help here ;) On
the other hand, for various reasons iterating inside the repository
would make me quite a bit more productive, so I'll still ask :)

Some more comments inline...

On Wed, Mar 14, 2012 at 1:35 PM, Douglas Gregor <dgregor at apple.com> wrote:
> Hi Manuel,
> I finally found some free cycles to dig into this.
> I love the functionality provided by the tooling infrastructure, and it's really nice to be able to easily create a single process that churns through all of the translation units in a program to perform some analysis/refactoring/whatever. There are also some little gems hidden in here (such as being able to easily syntax-check code supplied by a string).
> My major concern with the tooling patch is that it's tailored for the case of building tools that run over the JSON database emitted by CMake. It's great to make that case work smoothly, but the tooling infrastructure as presented does not adequately support users who aren't already using CMake. That immediately eliminates the vast majority of potential users of the tooling infrastructure, because while CMake is popular, it's nowhere near dominant, and the barrier to entry for switching one's project over to CMake is quite possibly higher than the barrier to entry for building a tool w/o the tooling infrastructure that will work on non-CMake projects.
> I suggest a more general architecture that separates out the "program build" database from the code that parses code and runs the action on that code. To be a bit more concrete, I'm suggesting splitting ClangTool into two subsystems:
>        - An abstract class CompilationDatabase that allows one to iterate over the CompileCommands that went into a particular build (or apply some function object to each CompileCommand), along with:
>                - A subclass JSONCompilationDatabase that retrieves CompileCommands from the JSON output produced by CMake
>                - A subclass SimpleCompilationDatabase where one programmatically adds CompileCommands
>                - Note: it should be possible for us to create additional compilation database kinds, e.g., by parsing makefiles directly
>                - A function that makes it easy to take argc/argv an interpret it either as a use of JSONCompilationDatabase or SimpleCompilationDatabase. This should make it easy to create a unified command-line interface to Tooling-based Clang tools, so it's easy for end users to run different tools on their projects
>                - A function that writes a CompilationDatabase in JSON format, so that it can be loaded again. (Not immediately critical, but useful nonetheless)
>        - A class ("Tooling"? "Tool"?) that encapsulates the environment in which the tool will be executed along with the ability to actually do the execution, e.g.,
>                - mapVirtualFiles, as in ClangTool
>                - a run function that runs a single CompileCommand with a frontend action factory
>                - a run function that runs every CompileCommand in the given program database with a frontend action factory
>                - a run function that compiles a given code string (+ command-line options, presumably) with a frontend action factory, like runSyntaxOnlyToolOnCode does

I'm not sure I understand this correctly - compiling a given code
string is done by mapping it in as a virtual file with a specific name
and adding some special command default command line arguments.
Perhaps you are imagining a different use case, but for me
runSyntaxOnlyToolOnCode is first and foremost a unit testing help.

> Now, there's no reason why we can't still have two-line syntax checkers with this architecture. It might look like this:
>  int main(int argc, char **argv) {
>    llvm::OwningPtr<CompilationDatabase> Database(loadCompilationDatabase(argc, argv));
>    return Tool().run(newFrontendActionFactory<clang::SyntaxOnlyAction>(), *Database);
>  }
> Now, to actually make tooling useful to non-CMake users, we'd want to make it possible to create a JSON compilation database without involving CMake at all. I suggest that we add a Clang command-line flag "-fcompilation-database=<filename>" and have the driver update <filename> with each compilation command it is invoked with. That way, one could do a normal project build with, e.g.,
>        make CXX=clang++ CXXFLAGS="-fcompilation-database=compile_commands.json"

I actually already have written something similar (well, it's
currently a python wrapper which might be a bad idea, but it was
really simple). The design I have is that we output a little JSON
snippet next to each .o file. The problem with handing the compilation
database to each compile step is concurrent runs - how would you
propose to sync on the database? I can't imagine implementing the sync
would be worth the cost, but I'm happy to be convinced otherwise...

> and then, later, run a tool over compile_commands.json. That way, the only barrier to entry is the need to compile with Clang, but I think that's perfectly acceptable. This last bit isn't strictly required for adding tooling into Clang, but without it, tooling will still only serve a relatively small fraction of its potential user base.
> A few more detailed comments:
> +/// \brief Interface to generate clang::FrontendActions.
> +class FrontendActionFactory {
> +public:
> +  virtual ~FrontendActionFactory();
> +
> +  /// \brief Returns a new clang::FrontendAction.
> +  ///
> +  /// The caller takes ownership of the returned action.
> +  virtual clang::FrontendAction *create() = 0;
> +};
> Do we even need this class? I think I'd rather that the various run() functions take a function object that returns a FrontendAction*, so that users can just pass in a lambda:
>        return Tool().run([] { return new clang::SyntaxOnlyAction; }, *Database);
> Naturally, we'll have to do some kind of std::function-like trick to avoid making every "run" function into a template, but that's relatively simple.
> +std::vector<std::string> unescapeJsonCommandLine(
> +    StringRef JsonEscapedCommandLine);
> This is essentially just an internal utility function, and probably doesn't belong in Tooling.h.
> +CompileCommand findCompileArgsInJsonDatabase(
> +    StringRef FileName, StringRef JsonDatabase,
> +    std::string &ErrorMessage);
> Seems like a good query function for JSONCompilationDatabase.
> +  /// \brief Map virtual files to be used while running the tool.
> +  ///
> +  /// \param FileContents A map from file names to the file content of the
> +  /// mapped virtual files.
> +  void mapVirtualFiles(const std::map<StringRef, StringRef> &FileContents);
> I'm not a big fan of the use of std::map header. Why not just have a function
>        void mapVirtualFile(StringRef FileName, StringRef Contents);
> and have callers iterate, rather than having callers build a special data structure?
> +  /// \brief Returns the file manager used in the tool.
> +  ///
> +  /// The file manager is shared between all translation units.
> +  FileManager &getFiles() { return Files; }
> Given that the FileManager is exposed, should mapVirtualFile deal in FileEntry*'s? (Or, at least, have an overload that uses a FileEntry*?)

/me wonders why he exposed the FileManager at all - one interesting
part that's missing from the current CL is one of the next steps that
allows us to do refactorings (if you're interested, the current
incarnation is at:
with an example tool at

I'll investigate. I don't want users to have to deal with
FileEntry*'s, and rather keep the interface as simple as possible.
Unless you have a specific use case?

> +  // Create the compilers actual diagnostics engine.
> +  Compiler.createDiagnostics(CC1Args.size(),
> +                             const_cast<char**>(CC1Args.data()));
> Should the tooling infrastructure provide a way to deal with diagnostics?

I have not really thought of that. We probably want to be able to
inject diagnostics at some point (we have some diagnostic mapreduces
internally when we iterate on implementing new diagnostics).

> +using clang::tooling::ClangTool;
> +using clang::tooling::newFrontendActionFactory;
> +
> +int main(int argc, char **argv) {
> +  ClangTool Tool(argc, argv);
> +  return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
> +}
> I feel like the tooling infrastructure would benefit from a slightly more involved example, especially one that benefits from running over all files in a project within the same process. Perhaps that isn't easy until we have an AST matching framework?

The AST matching framework will actually not help much here - this
becomes very interesting when you do real refactorings (see the
example I cited above), because then we need to deduplicate edits at
the end of running over all translation units.

> --- /dev/null
> +++ b/unittests/Tooling/ToolingTest.cpp
> @@ -0,0 +1,289 @@
> We should settle on the interfaces first, but there should be a Tooling tutorial on clang.llvm.org to help users get started with the tooling infrastructure.

I'll be happy to write one!


>        - Doug
> On Mar 7, 2012, at 7:20 AM, Manuel Klimek wrote:
>> On Tue, Mar 6, 2012 at 11:22 PM, Marshall Clow <mclow.lists at gmail.com> wrote:
>>> On Feb 13, 2012, at 1:05 AM, Manuel Klimek wrote:
>>>> Ping.
>>>> Rebased patch + fixed some nits from Nick.
>>> Two concerns in Tooling.cpp:
>>> 1)
>>>>       /// \brief Converts a string vector representing a Command line into a C
>>>>       /// string vector representing the Argv (including the trailing NULL).
>>>>       std::vector<char*> commandLineToArgv(ArrayRef<std::string> Command) {
>>>>         std::vector<char*> Result(Command.size() + 1);
>>>>         for (std::vector<char*>::size_type I = 0, E = Command.size(); I != E; ++I) {
>>>>           Result[I] = const_cast<char*>(Command[I].c_str());
>>>>         }
>>>>         Result[Command.size()] = NULL;
>>>>         return Result;
>>>>       }
>>> This code does two things that bother me.
>>> a) It calls c_str() on a bunch of strings, and then saves the pointers. The pointers that result from calling c_str() are only valid as long as the underlying strings are unchanged.
>>> b) Then the code casts away the const-ness of the pointers. If someone attempts to change those strings, that's undefined behavior.
>> The main thing that bothered me about that code is that it was
>> actually only needed in one place, so I inlined it :)
>>> 2)
>>>>       bool ToolInvocation::runInvocation(
>>>>           const char *BinaryName,
>>>>           clang::driver::Compilation *Compilation,
>>>>           clang::CompilerInvocation *Invocation,
>>>>           const clang::driver::ArgStringList &CC1Args,
>>>>           clang::FrontendAction *ToolAction) {
>>>>         llvm::OwningPtr<clang::FrontendAction> ScopedToolAction(ToolAction);
>>>>         // Show the invocation, with -v.
>>>>         if (Invocation->getHeaderSearchOpts().Verbose) {
>>>>           llvm::errs() << "clang Invocation:\n";
>>>>           Compilation->PrintJob(llvm::errs(), Compilation->getJobs(), "\n", true);
>>>>           llvm::errs() << "\n";
>>>>         }
>>> When I am reading (and writing) code, and I see an object passed as a pointer, I figure that it's optional; it can be NULL - otherwise it would be passed by reference.
>> Interesting. When I am reading code and see an object passed as a
>> pointer, I see that as a sign that the pointer is retained past the
>> call of the function. When using references in that case it's
>> unfortunately not visible from the call site that the reference is
>> retained.
>> Please find an updated patch attached...
>> Cheers,
>> /Manuel
>>> Other than that, this patch LGTM!
>>> -- Marshall
>>> Marshall Clow     Idio Software   <mailto:mclow.lists at gmail.com>
>>> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait).
>>>        -- Yu Suzuki
>> <tooling.patch>_______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

More information about the cfe-commits mailing list