[cfe-commits] [PATCH] Implement a better plugin API for clang

Jordan Rose jordan_rose at apple.com
Fri Jun 22 10:17:03 PDT 2012


Hi, Joshua. First off, I really like that you submitted a /patch/ for a better plugin API, including all the documentation and example changes that would have otherwise been necessary to put into an e-mail. It shows that your proposal is feasible and well-defined already, and gives us something to work with.

> Why would you need two copies of Basic/Lex/etc.? Clang is already built with all of this stuff, so the plugin would be able to use clang's version of everything here in normal codegen anyways, unless I'm misunderstanding your statement.


What I meant was that if we wanted to run some action instead of CodeGen, we would need to use libTooling instead...but that means building a separate executable with everything in it again. But I forgot about -fsyntax-only, and plugins could require this in their clang_plugin_begin_file check.

(It might be nice if they could check this sooner, though...maybe they can get the CompilerInvocation in clang_plugin_init, possibly even as non-const! -fsyntax-only isn't the only interesting setting.)

On the flip side, it might be desirable for plugins to be able to produce errors and a nonzero result code. Whether or not these plugin diagnostics can halt CodeGen is something to consider as well.

Remaining comments inline (both on the spec/documentation and the implementation).

On Jun 20, 2012, at 2:58 PM, Joshua Cranmer wrote:

> -<p>The main difference from writing normal FrontendActions is that you can
> -handle plugin command line options. The
> -PluginASTAction base class declares a ParseArgs method which you have to
> -implement in your plugin.
> -</p>
> +<p>The first function called when a plugin is loaded is clang_plugin_init. This
> +function returns a boolean indicating whether or not the plugin can run on the
> +code. As arguments, this function is given the arguments of the plugin as well
> +as the version of clang being run. An example looks like this:</p>
> <pre>
> -  bool ParseArgs(const CompilerInstance &CI,
> -                 const std::vector<std::string>& args) {
> -    for (unsigned i = 0, e = args.size(); i != e; ++i) {
> -      if (args[i] == "-some-arg") {
> -        // Handle the command line argument.
> +  bool clang_plugin_init(int argc, plugin::PluginArg *argv,
> +      plugin::Version *version) {
> +    // Check the version of clang
> +    if (version->major != CLANG_VERSION_MAJOR ||
> +        version->minor != CLANG_VERSION_MINOR)
> +      return false;
> +  
> +    for (int i = 0; i < argc; ++i) {
> +      if (!strcmp(argv[i].name, "-some-arg")) {
> +        // Handle the command line argument
>       }
>     }
>     return true;
>   }
> </pre>
> 

Hooray for using a designated entry point instead of a static constructor! This is something that was a problem on Windows, IIRC.

That said, I'd go even further with the version checks, and /require/ that the major and minor versions match, at the very least. I don't think we can trust objects to be layout-compatible or functions to have all the same arguments even between SVN revisions or point fixes, let alone major/minor versions -- only libclang is stable enough for that. I implemented something like this for static analyzer plugins by requiring plugins to export a symbol that just contains the version string they were compiled against. You can see how it works in CheckerRegistry.h and CheckerRegistration.cpp.

Also, you might want to mention here that these entry points are extern "C", in case someone decides to ignore the advice to include Plugin.h.


> +<p>To run a plugin, you merely need to specify the plugin to be loaded using the
> +-fplugin command line options. Additional parameters for the plugins can be passed with -fplugin-arg-<plugin-name>-<arg>=<value> or
> +-fplugin-arg<plugin-name>-<arg>. These options work the same when both run
> +with the normal driver and the cc1 process.</p>

The second one is missing a dash between the literal 'arg' and the plugin name. Also, personally I would switch the order of these two to put the flag-only version first, since it looks simpler.

This is STILL rather unwieldy, but I guess the idea was that if you really want to write your own command line, you're supposed to use libTooling? Or a wrapper script like the one that appeared awhile back.


> diff --git a/examples/PrintFunctionNames/PrintFunctionNames.exports b/examples/PrintFunctionNames/PrintFunctionNames.exports
> --- a/examples/PrintFunctionNames/PrintFunctionNames.exports
> +++ b/examples/PrintFunctionNames/PrintFunctionNames.exports
> @@ -1,1 +1,1 @@
> -_ZN4llvm8Registry*
> +clang_plugin_*

:-)


> +#ifndef LLVM_CLANG_BASIC_PLUGIN_H
> +#define LLVM_CLANG_BASIC_PLUGIN_H
> +
> +#include "llvm/ADT/StringRef.h"
> +
> +namespace llvm {
> +  class StringRef;
> +}

You already imported StringRef, so this is unnecessary. If you'd like to use StringRef without the llvm prefix, import "clang/Basic/LLVM.h".


> +namespace clang {
> +
> +  class ASTConsumer;
> +  class CompilerInstance;
> +  class DiagnosticConsumer;
> +  class PPCallbacks;

We typically don't indent forward-declared classes, just like we don't indent regular classes.


> +namespace plugin {
> +
> +/**
> + * An individual argument for a plugin.
> + *
> + * Plugin arguments are specified as -fplugin-arg-<plugin>-<name>=<value> or
> + * -fplugin-arg-<plugin>-<name> attributes. In the latter case, the value
> + * attribute would be an empty string.
> + */
> +struct PluginArg {
> +  const char *name;
> +  const char *value;
> +};

Any reason why 'const char *' and not StringRef?


> +// The following functions are not implemented within clang itself, but are
> +// prototypes of the various functions that may be implemented by plugins.
> +// Including it in a header file allows these hooks to be documented with
> +// doxygen, and the use of extern "C" on declarations allows people who forget
> +// to define these functions with extern "C" to still have them work.
> +extern "C" {
> +/**
> + * Callback to initialize the plugin.
> + *
> + * \note This is the only function whose ABI compatibility is guaranteed to
> + *       remain stable. All plugins should verify that the version passed into
> + *       this function is the same version that they were compiled with and
> + *       abort if this is not the case. To detect the version compiled with,
> + *       include clang/Basic/Version.h and check CLANG_VERSION_MAJOR and
> + *       CLANG_VERSION_MINOR.
> + *
> + * \arg argc      The number of arguments in the argv array
> + * \arg argv      An array of the arguments passed to this plugin from the
> + *                command line.
> + * \arg version   The version of the compiler being called.
> + * \return Whether or not the plugin initialization succeeded. If initialization
> + *         fails, the compilation process is terminated.
> + */
> +extern bool clang_plugin_init(int argc, clang::plugin::PluginArg *argv,
> +  clang::plugin::Version *version);

Suggestion: replace argc/argv with ArrayRef.

Very nice doc comment. Some coding standards issues here, to match up with the rest of LLVM:

- We prefer C++-style comments even for Doxygen blocks, even for extern "C" blocks (unless the entire file is C-compatible): ///

- Anything that's a noun (types, variables, and parameters) should start with an uppercase letter. The exceptions are types that look like standard library types. Functions and methods should start with lowercase letters. This convention is fairly new and so there are large chunks of the codebase that don't look like this, but we're trying to be consistent going forwards.

- Parameter lists that are too long should line up under the open paren.

Most of these are already in http://llvm.org/docs/CodingStandards.html; we should probably update the indentation bit as well.

I'm on the fence about whether or not the namespaces should be included. On the one hand, we don't explicitly write "clang" /anywhere/ in our header files (exaggerating but only slightly). On the other, this is going to be referenced by plugin writers who might not know to write "using namespace clang". And probably won't be "using namespace clang::plugin".

> +/**
> + * Callback that is called before a file is compiled.
> + *
> + * Before this function is called, the callbacks object will have all of its
> + * pointers set to null. All non-null callbacks will be deleted by the compiler
> + * internally, so one should always return a new pointer for every file.
> + *
> + * \arg fileName  The name of the function being called.
> + * \arg CI        The compiler instance, which contains information about how
> + *                the file is being compiled and other utility classes.
> + * \arg callbacks The pointer to the callbacks that this plugin will define for
> + *                this file.
> + */
> +extern void clang_plugin_begin_file(llvm::StringRef fileName,
> +  const clang::CompilerInstance &CI,
> +  clang::plugin::PluginFileCallbacks *callbacks);

Since the 'callbacks' argument is assured to be present, it should be a reference. (This actually applies to all plugin entry points.)

I know I for one feel a little squeamish about putting reference types in extern "C" functions, but there's nothing in the standard that prohibits it, and you already have "const CompilerInstance &CI". 

Finally, I would suggest changing "file" to "compilation" or "translation unit", since this isn't called for includes and such.


> +/**
> + * Callback that is called after a file has been compiled.
> + *
> + * \arg fileName  The name of the file that was just compiled.
> + */
> +extern void clang_plugin_end_file(llvm::StringRef fileName);

Does this need the file name? It should always be the same as the matching clang_plugin_begin_file.


> diff --git a/include/clang/Driver/Options.td b/include/clang/Driver/Options.td
> --- a/include/clang/Driver/Options.td
> +++ b/include/clang/Driver/Options.td
> @@ -586,16 +586,18 @@ def fpack_struct_EQ : Joined<"-fpack-str
>   HelpText<"Specify the default maximum struct packing alignment">;
> def fpascal_strings : Flag<"-fpascal-strings">, Group<f_Group>, Flags<[CC1Option]>,
>   HelpText<"Recognize and construct Pascal-style string literals">;
> def fpch_preprocess : Flag<"-fpch-preprocess">, Group<f_Group>;
> def fpic : Flag<"-fpic">, Group<f_Group>;
> def fno_pic : Flag<"-fno-pic">, Group<f_Group>;
> def fpie : Flag<"-fpie">, Group<f_Group>;
> def fno_pie : Flag<"-fno-pie">, Group<f_Group>;
> +def fplugin : Joined<"-fplugin=">, Group<f_Group>, Flags<[CC1Option]>;
> +def fplugin_arg : Joined<"-fplugin-arg-">, Group<f_Group>, Flags<[CC1Option]>;

Take a look at the old plugin and plugin_arg flags in CC1Options.td for examples on how to make these look nicer. (In particular, fplugin should probably be fplugin_EQ, and fplugin_arg can be more descriptive.) Also, you'll have to decide whether we want to accept a split "-fplugin Plugin".


> --- a/include/clang/Frontend/FrontendOptions.h
> +++ b/include/clang/Frontend/FrontendOptions.h
> @@ -160,16 +160,23 @@ public:
>   std::vector<std::string> AddPluginActions;
> 
>   /// Args to pass to the additional plugins
>   std::vector<std::vector<std::string> > AddPluginArgs;

Presumably these are deprecated? And would go away eventually?


>  /// The list of plugins to load.
>   std::vector<std::string> Plugins;

But this probably has to stay, since it includes LLVM plugins.


> +  /// The list of compiler plugins to load.
> +  std::vector<std::string> CompilerPlugins;
> +
> +  /// Argments for the above compiler plugins
> +  std::vector<std::vector<std::pair<std::string, std::string> > >
> +    CompilerPluginArgs;

Yuck. If these two vectors are supposed to be kept in parallel, can we just make a struct like this?

struct Plugin {
	std::string Name;
	SmallVector<std::pair<std::string, std::string> >, 4> Args;
};

That may not be how the old way did it, but it's a lot nicer IMHO.

> --- a/lib/Frontend/CompilerInvocation.cpp
> +++ b/lib/Frontend/CompilerInvocation.cpp
> @@ -1476,16 +1476,56 @@ static InputKind ParseFrontendArgs(Front
>   for (int i = 0, e = Opts.AddPluginActions.size(); i != e; ++i) {
>     for (arg_iterator it = Args.filtered_begin(OPT_plugin_arg),
>            end = Args.filtered_end(); it != end; ++it) {
>       if ((*it)->getValue(Args, 0) == Opts.AddPluginActions[i])
>         Opts.AddPluginArgs[i].push_back((*it)->getValue(Args, 1));
>     }
>   }
> 
> +  Opts.CompilerPlugins = Args.getAllArgValues(OPT_fplugin);
> +  Opts.CompilerPluginArgs.resize(Opts.CompilerPlugins.size());
> +
> +  // Parse the plugin arguments
> +  std::vector<std::pair<std::string, std::string> > ParsedPluginArgs;
> +  for (arg_iterator it = Args.filtered_begin(OPT_fplugin_arg),
> +      end = Args.filtered_end(); it != end; ++it) {
> +    std::string value = (*it)->getValue(Args);

This should probably use StringRef instead of std::string, so that we don't copy the value just yet. Also, you can probably just say it->getValue; most of our smart pointers will do the right thing here.


> +  // Add the plugin arguments to a parsedPluginArgs
> +  for (int i = 0, e = Opts.CompilerPlugins.size(); i != e; ++i) {
> +    std::string base = llvm::sys::path::stem(Opts.CompilerPlugins[i]);

Ditto StringRef.

> +    for (std::vector<std::pair<std::string, std::string> >::iterator
> +           it = ParsedPluginArgs.begin(), end = ParsedPluginArgs.end();
> +           it != end; ++it) {
> +      if (it->first == base) {
> +        std::string argpair = it->second;

Ditto StringRef. The only times we use std::string is when we actually need to own the string.

> namespace {
> +  // A class that holds data associated with a single plugin.
> +  class Plugin {

No indents for namespaces...we run out of our 80 chars too fast. :-)


> +    llvm::sys::DynamicLibrary PluginLibrary;
> +    clang::plugin::PluginFileCallbacks callbacks;

Here you definitely should not include the 'clang' namespace. plugin::PluginFileCallbacks is fine. I'd even be okay with "using llvm::sys::DynamicLibrary".


> +  public:
> +    Plugin(llvm::sys::DynamicLibrary library) : PluginLibrary(library) {}
> +    clang::plugin::PluginFileCallbacks *getCallbacks() { return &callbacks; }

Return by reference, please.


> +    intptr_t getFunctionCall(const char *name) {
> +      return (intptr_t)PluginLibrary.getAddressOfSymbol(name);
> +    }

Why is this an intptr_t? That's not really any better than void * for casting to and from function pointers. If we're on a platform where function pointers are bigger than object pointers, getAddressOfSymbol is already going to break.


> +/// Helper function to call a given function in a plugin.
> +#define CALL_PLUGIN_FUNCTION(plugin, fname, args) \
> +    do { \
> +      Plugin::fname func_ = (Plugin::fname)(plugin)->getFunctionCall(#fname); \
> +      if (func_) \
> +        func_ args; \
> +    } while (0)

Technically not a function. :-) More importantly, please show example usage; this macro is not at all straightfoward.

Also, I would like if the macro took a reference rather than a pointer, but that's just me. (That way it could in theory be called for plugin objects on the stack...but that probably never happens, huh.)


> +    // These typedefs must have the same signatures as the ones in
> +    // clang/Basic/Plugin.h, including the same name (it makes the helper macro
> +    // above work properly).
> +    typedef void (*clang_plugin_begin_file)(StringRef, const CompilerInstance &,
> +      clang::plugin::PluginFileCallbacks*);
> +    typedef void (*clang_plugin_end_file)(StringRef);
> +    typedef void (*clang_plugin_destroy)();
> +  };

Why no init? Ah, the boolean flag.


> +  // The wrapper action that manages all plugin invocations. This is what we use
> +  // as the main FrontendAction for the CompilerInstance.
> +  class PluginWrapperAction : public WrapperFrontendAction {
> +    std::vector<Plugin> plugins;
> +    std::string CurrentFile;

Again, I'm in favor of plugins each keeping CurrentFile around if they need it; it's a StringRef that's safe to store, IIRC. The master copy of this information is held in FrontendAction anyway.

> +  public:
> +    PluginWrapperAction(FrontendAction *WrappedAction,
> +      std::vector<Plugin> &plugins)

ArrayRef, since it's free here.


> +  for (std::vector<Plugin>::iterator it = plugins.begin(); it != plugins.end();
> +      ++it) {
> +    // Retrieve the callbacks, and clear all of the values to NULL
> +    clang::plugin::PluginFileCallbacks *callbacks = it->getCallbacks();
> +    memset(callbacks, 0, sizeof(clang::plugin::PluginFileCallbacks));

This is silly. Plugin should just default-initialize the struct when it's created. (Empty parens will do the right thing.)


> +    // Get the callbacks from the plugin
> +    CALL_PLUGIN_FUNCTION(it, clang_plugin_begin_file, (Filename, CI, callbacks));
> +
> +    // ASTConsumer is handled by GetASTConsumer; others are handled now
> +    if (callbacks->ppCallback)
> +      CI.getPreprocessor().addPPCallbacks(callbacks->ppCallback);
> +    if (callbacks->diagnostics)
> +    {
> +      // The front end has already run this, so let's do this now.
> +      callbacks->diagnostics->BeginSourceFile(CI.getLangOpts(),
> +        &CI.getPreprocessor());
> +      CI.getDiagnostics().setClient(new ChainedDiagnosticConsumer(
> +        &CI.getDiagnosticClient(), callbacks->diagnostics));
> +    }
> +  }

If we initialize plugins before calling the underlying action, is this necessary?

> +  return true;

No way for plugins to signal an error?


> +
> +ASTConsumer *PluginWrapperAction::CreateASTConsumer(CompilerInstance &CI,
> +                                                    StringRef InFile) {
> +  std::vector<ASTConsumer*> Consumers(1,
> +    WrapperFrontendAction::CreateASTConsumer(CI, InFile));
> +  for (std::vector<Plugin>::iterator it = plugins.begin(); it != plugins.end();
> +      ++it) {
> +    if (ASTConsumer *consumer = it->getCallbacks()->astConsumer) {
> +      Consumers.push_back(consumer);
> +    }
> +  }
> +  return new MultiplexConsumer(Consumers);
> +}

TinyPtrVector or SmallVector? Most of the time we will not have plugins, and even when we do there are probably very few.

(This is not a performance-sensitive piece of code, I know, but we might as well get it right.)


> +    // Construct the arguments
> +    std::vector<std::pair<std::string, std::string> > &pluginArgs =
> +      Clang->getFrontendOpts().CompilerPluginArgs[i];

ArrayRef...?

> +    plugin::PluginArg *args = new plugin::PluginArg[pluginArgs.size()];

SmallVector<PluginArg, 4> makes sense here -- it's a vector that has a certain number of slots allocated on the stack, but switches to acting like std::vector if it needs to grow. Also, no forgetting to delete[] if this gets refactored some day.

And finally, if you do decide to allow plugins to mess with the command line options, this will have to happen sooner. :-)


Really I think this is a great refactoring. I do have my concerns about the amount of control plugins can leverage over the compilation, but maybe that will turn out to be immaterial. Thanks for working on this.

Jordan




More information about the cfe-commits mailing list