[cfe-commits] [PATCH] Implement a better plugin API for clang

Joshua Cranmer pidgeot18 at gmail.com
Fri Jun 22 13:31:11 PDT 2012


On 6/22/2012 1:17 PM, Jordan Rose wrote:
> Hi, Joshua. First off, I really like that you submitted a /patch/ for a better plugin API, including all the documentation and example changes that would have otherwise been necessary to put into an e-mail. It shows that your proposal is feasible and well-defined already, and gives us something to work with.
>
>> Why would you need two copies of Basic/Lex/etc.? Clang is already built with all of this stuff, so the plugin would be able to use clang's version of everything here in normal codegen anyways, unless I'm misunderstanding your statement.
>
> What I meant was that if we wanted to run some action instead of CodeGen, we would need to use libTooling instead...but that means building a separate executable with everything in it again. But I forgot about -fsyntax-only, and plugins could require this in their clang_plugin_begin_file check.
>
> (It might be nice if they could check this sooner, though...maybe they can get the CompilerInvocation in clang_plugin_init, possibly even as non-const! -fsyntax-only isn't the only interesting setting.)
>
> On the flip side, it might be desirable for plugins to be able to produce errors and a nonzero result code. Whether or not these plugin diagnostics can halt CodeGen is something to consider as well.

I expose CompilerInstance, from which you can get the DiagnosticsEngine 
[1] and emit diagnostics. The diagnostics engine even makes it easy to 
create custom diagnostics quickly, unlike, say, attributes.
> Hooray for using a designated entry point instead of a static constructor! This is something that was a problem on Windows, IIRC.
>
> That said, I'd go even further with the version checks, and /require/ that the major and minor versions match, at the very least. I don't think we can trust objects to be layout-compatible or functions to have all the same arguments even between SVN revisions or point fixes, let alone major/minor versions -- only libclang is stable enough for that. I implemented something like this for static analyzer plugins by requiring plugins to export a symbol that just contains the version string they were compiled against. You can see how it works in CheckerRegistry.h and CheckerRegistration.cpp.
The intent is that all plugins do the check, but this allows the really 
clever people who absolutely need multiple version support to be able to 
do so. Somewhere, I have documentation that emphasizes that the check 
out to be done.
> Also, you might want to mention here that these entry points are extern "C", in case someone decides to ignore the advice to include Plugin.h.
I can do that. I expect it ought to be obvious to most people since:
1. Plugins will fail to load if it doesn't work very quickly.
2. C++ name mangling is known to be problematically different.
3. You need to include Plugin.h to be able to use plugin::PluginArg and 
plugin::Version in the first place.

>> +<p>To run a plugin, you merely need to specify the plugin to be loaded using the
>> +-fplugin command line options. Additional parameters for the plugins can be passed with -fplugin-arg-<plugin-name>-<arg>=<value> or
>> +-fplugin-arg<plugin-name>-<arg>. These options work the same when both run
>> +with the normal driver and the cc1 process.</p>
> The second one is missing a dash between the literal 'arg' and the plugin name. Also, personally I would switch the order of these two to put the flag-only version first, since it looks simpler.
>
> This is STILL rather unwieldy, but I guess the idea was that if you really want to write your own command line, you're supposed to use libTooling? Or a wrapper script like the one that appeared awhile back.

The argument syntax I derived from gcc (see 
<http://gcc.gnu.org/onlinedocs/gccint/Plugins.html>). Given clang's 
argument parsing structure, I figured that an argument syntax similar to 
this was the best way to pass through arguments to plugins easily.
>> +namespace plugin {
>> +
>> +/**
>> + * An individual argument for a plugin.
>> + *
>> + * Plugin arguments are specified as -fplugin-arg-<plugin>-<name>=<value> or
>> + * -fplugin-arg-<plugin>-<name> attributes. In the latter case, the value
>> + * attribute would be an empty string.
>> + */
>> +struct PluginArg {
>> +  const char *name;
>> +  const char *value;
>> +};
> Any reason why 'const char *' and not StringRef?

Since clang_plugin_init is the function which decides whether or not the 
plugin is usable, I wanted to make sure that it would be as ABI-stable 
as possible. Using const char * here means that this works even if LLVM 
decides to later modify the StringRef structure layout. In all other 
functions, I'm much less concerned with ABI and API stability. It also 
allows us to reuse this for a C plugin API if we later decide to make 
such an option available.
>> +// The following functions are not implemented within clang itself, but are
>> +// prototypes of the various functions that may be implemented by plugins.
>> +// Including it in a header file allows these hooks to be documented with
>> +// doxygen, and the use of extern "C" on declarations allows people who forget
>> +// to define these functions with extern "C" to still have them work.
>> +extern "C" {
>> +/**
>> + * Callback to initialize the plugin.
>> + *
>> + * \note This is the only function whose ABI compatibility is guaranteed to
>> + *       remain stable. All plugins should verify that the version passed into
>> + *       this function is the same version that they were compiled with and
>> + *       abort if this is not the case. To detect the version compiled with,
>> + *       include clang/Basic/Version.h and check CLANG_VERSION_MAJOR and
>> + *       CLANG_VERSION_MINOR.
>> + *
>> + * \arg argc      The number of arguments in the argv array
>> + * \arg argv      An array of the arguments passed to this plugin from the
>> + *                command line.
>> + * \arg version   The version of the compiler being called.
>> + * \return Whether or not the plugin initialization succeeded. If initialization
>> + *         fails, the compilation process is terminated.
>> + */
>> +extern bool clang_plugin_init(int argc, clang::plugin::PluginArg *argv,
>> +  clang::plugin::Version *version);
> Suggestion: replace argc/argv with ArrayRef.
>
> Very nice doc comment. Some coding standards issues here, to match up with the rest of LLVM:
>
> - We prefer C++-style comments even for Doxygen blocks, even for extern "C" blocks (unless the entire file is C-compatible): ///
>
> - Anything that's a noun (types, variables, and parameters) should start with an uppercase letter. The exceptions are types that look like standard library types. Functions and methods should start with lowercase letters. This convention is fairly new and so there are large chunks of the codebase that don't look like this, but we're trying to be consistent going forwards.
>
> - Parameter lists that are too long should line up under the open paren.
>
> Most of these are already in http://llvm.org/docs/CodingStandards.html; we should probably update the indentation bit as well.
>
> I'm on the fence about whether or not the namespaces should be included. On the one hand, we don't explicitly write "clang" /anywhere/ in our header files (exaggerating but only slightly). On the other, this is going to be referenced by plugin writers who might not know to write "using namespace clang". And probably won't be "using namespace clang::plugin".
In order to make the 'work even if you forget extern "C"' function, I 
need to put this is in the global namespace. I originally had these in 
the clang namespace, at which point i made plugin::PluginArg and 
plugin::Version.
>
>> +/**
>> + * Callback that is called before a file is compiled.
>> + *
>> + * Before this function is called, the callbacks object will have all of its
>> + * pointers set to null. All non-null callbacks will be deleted by the compiler
>> + * internally, so one should always return a new pointer for every file.
>> + *
>> + * \arg fileName  The name of the function being called.
>> + * \arg CI        The compiler instance, which contains information about how
>> + *                the file is being compiled and other utility classes.
>> + * \arg callbacks The pointer to the callbacks that this plugin will define for
>> + *                this file.
>> + */
>> +extern void clang_plugin_begin_file(llvm::StringRef fileName,
>> +  const clang::CompilerInstance &CI,
>> +  clang::plugin::PluginFileCallbacks *callbacks);
> Since the 'callbacks' argument is assured to be present, it should be a reference. (This actually applies to all plugin entry points.)
>
> I know I for one feel a little squeamish about putting reference types in extern "C" functions, but there's nothing in the standard that prohibits it, and you already have "const CompilerInstance &CI".
>
> Finally, I would suggest changing "file" to "compilation" or "translation unit", since this isn't called for includes and such.

clang_plugin_begin_file is already getting a bit unwieldy for a function 
name, so maybe clang_plugin_begin_tu would work better?
>> +/**
>> + * Callback that is called after a file has been compiled.
>> + *
>> + * \arg fileName  The name of the file that was just compiled.
>> + */
>> +extern void clang_plugin_end_file(llvm::StringRef fileName);
> Does this need the file name? It should always be the same as the matching clang_plugin_begin_file.

I honestly can't remember why I added the fileName argument here...
>
>> diff --git a/include/clang/Driver/Options.td b/include/clang/Driver/Options.td
>> --- a/include/clang/Driver/Options.td
>> +++ b/include/clang/Driver/Options.td
>> @@ -586,16 +586,18 @@ def fpack_struct_EQ : Joined<"-fpack-str
>>    HelpText<"Specify the default maximum struct packing alignment">;
>> def fpascal_strings : Flag<"-fpascal-strings">, Group<f_Group>, Flags<[CC1Option]>,
>>    HelpText<"Recognize and construct Pascal-style string literals">;
>> def fpch_preprocess : Flag<"-fpch-preprocess">, Group<f_Group>;
>> def fpic : Flag<"-fpic">, Group<f_Group>;
>> def fno_pic : Flag<"-fno-pic">, Group<f_Group>;
>> def fpie : Flag<"-fpie">, Group<f_Group>;
>> def fno_pie : Flag<"-fno-pie">, Group<f_Group>;
>> +def fplugin : Joined<"-fplugin=">, Group<f_Group>, Flags<[CC1Option]>;
>> +def fplugin_arg : Joined<"-fplugin-arg-">, Group<f_Group>, Flags<[CC1Option]>;
> Take a look at the old plugin and plugin_arg flags in CC1Options.td for examples on how to make these look nicer. (In particular, fplugin should probably be fplugin_EQ, and fplugin_arg can be more descriptive.) Also, you'll have to decide whether we want to accept a split "-fplugin Plugin".

The argument handling here is pretty much cargo-culted from 
plugin/plugin_arg stuff, although I originally wrote this patch a fair 
amount of time ago, so some things may have changed without me realizing 
it.
>> --- a/include/clang/Frontend/FrontendOptions.h
>> +++ b/include/clang/Frontend/FrontendOptions.h
>> @@ -160,16 +160,23 @@ public:
>>    std::vector<std::string> AddPluginActions;
>>
>>    /// Args to pass to the additional plugins
>>    std::vector<std::vector<std::string> > AddPluginArgs;
> Presumably these are deprecated? And would go away eventually?

My intent is to kill the old plugin architecture ASAP, and it sounds 
like I have some agreement with dgregor in this regard, so yes.
>>   /// The list of plugins to load.
>>    std::vector<std::string> Plugins;
> But this probably has to stay, since it includes LLVM plugins.

I haven't tested this (yet), but it should still be possible to use LLVM 
plugins with the -fplugin= structure, since static initializers will 
still run.
>> +    llvm::sys::DynamicLibrary PluginLibrary;
>> +    clang::plugin::PluginFileCallbacks callbacks;
> Here you definitely should not include the 'clang' namespace. plugin::PluginFileCallbacks is fine. I'd even be okay with "using llvm::sys::DynamicLibrary".
I wasn't sure if the style desired was to always do the full namespace 
or do partial namespaces.
>> +    intptr_t getFunctionCall(const char *name) {
>> +      return (intptr_t)PluginLibrary.getAddressOfSymbol(name);
>> +    }
> Why is this an intptr_t? That's not really any better than void * for casting to and from function pointers. If we're on a platform where function pointers are bigger than object pointers, getAddressOfSymbol is already going to break.

IIRC, going from void* to void(*)() causes some clang warning. In 
interest of shutting up warnings, a round-trip through intptr_t stops it 
from warning.
>
>
>> +/// Helper function to call a given function in a plugin.
>> +#define CALL_PLUGIN_FUNCTION(plugin, fname, args) \
>> +    do { \
>> +      Plugin::fname func_ = (Plugin::fname)(plugin)->getFunctionCall(#fname); \
>> +      if (func_) \
>> +        func_ args; \
>> +    } while (0)
> Technically not a function. :-) More importantly, please show example usage; this macro is not at all straightfoward.
>
> Also, I would like if the macro took a reference rather than a pointer, but that's just me. (That way it could in theory be called for plugin objects on the stack...but that probably never happens, huh.)

I expect that anyone who would use this would call it from an iterator, 
where "acting" like a pointer is better.
>> +    // These typedefs must have the same signatures as the ones in
>> +    // clang/Basic/Plugin.h, including the same name (it makes the helper macro
>> +    // above work properly).
>> +    typedef void (*clang_plugin_begin_file)(StringRef, const CompilerInstance &,
>> +      clang::plugin::PluginFileCallbacks*);
>> +    typedef void (*clang_plugin_end_file)(StringRef);
>> +    typedef void (*clang_plugin_destroy)();
>> +  };
> Why no init? Ah, the boolean flag.

init gets called before this object is created.
>> +  public:
>> +    PluginWrapperAction(FrontendAction *WrappedAction,
>> +      std::vector<Plugin> &plugins)
> ArrayRef, since it's free here.

I probably wrote this before llvm got ArrayRef.
>
>
>> +  for (std::vector<Plugin>::iterator it = plugins.begin(); it != plugins.end();
>> +      ++it) {
>> +    // Retrieve the callbacks, and clear all of the values to NULL
>> +    clang::plugin::PluginFileCallbacks *callbacks = it->getCallbacks();
>> +    memset(callbacks, 0, sizeof(clang::plugin::PluginFileCallbacks));
> This is silly. Plugin should just default-initialize the struct when it's created. (Empty parens will do the right thing.)

Some notes:
1. This works even if a plugin doesn't define the method (e.g., a plugin 
just wants to do some LLVM hooking)
2. This has some footgun-prevention effects for plugin writers
>
>
>> +    // Get the callbacks from the plugin
>> +    CALL_PLUGIN_FUNCTION(it, clang_plugin_begin_file, (Filename, CI, callbacks));
>> +
>> +    // ASTConsumer is handled by GetASTConsumer; others are handled now
>> +    if (callbacks->ppCallback)
>> +      CI.getPreprocessor().addPPCallbacks(callbacks->ppCallback);
>> +    if (callbacks->diagnostics)
>> +    {
>> +      // The front end has already run this, so let's do this now.
>> +      callbacks->diagnostics->BeginSourceFile(CI.getLangOpts(),
>> +        &CI.getPreprocessor());
>> +      CI.getDiagnostics().setClient(new ChainedDiagnosticConsumer(
>> +        &CI.getDiagnosticClient(), callbacks->diagnostics));
>> +    }
>> +  }
> If we initialize plugins before calling the underlying action, is this necessary?

This is necessary if we do plugins via an ASTAction-wrapper approach. If 
plugins get attached to a CompilerInvocation or CompilerInstance 
instead, it wouldn't be necessary.
>> +  return true;
> No way for plugins to signal an error?

This doesn't really signal an error, it just tells the compiler to not 
process the file. Real errors probably need to go through the 
DiagnosticsEngine instead.
>
>
>> +
>> +ASTConsumer *PluginWrapperAction::CreateASTConsumer(CompilerInstance &CI,
>> +                                                    StringRef InFile) {
>> +  std::vector<ASTConsumer*> Consumers(1,
>> +    WrapperFrontendAction::CreateASTConsumer(CI, InFile));
>> +  for (std::vector<Plugin>::iterator it = plugins.begin(); it != plugins.end();
>> +      ++it) {
>> +    if (ASTConsumer *consumer = it->getCallbacks()->astConsumer) {
>> +      Consumers.push_back(consumer);
>> +    }
>> +  }
>> +  return new MultiplexConsumer(Consumers);
>> +}
> TinyPtrVector or SmallVector? Most of the time we will not have plugins, and even when we do there are probably very few.
>
> (This is not a performance-sensitive piece of code, I know, but we might as well get it right.)
>
>
>> +    // Construct the arguments
>> +    std::vector<std::pair<std::string, std::string> > &pluginArgs =
>> +      Clang->getFrontendOpts().CompilerPluginArgs[i];
> ArrayRef...?
>
>> +    plugin::PluginArg *args = new plugin::PluginArg[pluginArgs.size()];
> SmallVector<PluginArg, 4> makes sense here -- it's a vector that has a certain number of slots allocated on the stack, but switches to acting like std::vector if it needs to grow. Also, no forgetting to delete[] if this gets refactored some day.
>
> And finally, if you do decide to allow plugins to mess with the command line options, this will have to happen sooner. :-)

To me, the basic concept of plugins is that they are things which attach 
to the normal compilation process, so that saying CXX="clang++ 
-fplugin=/path/to/plugin.so" wouldn't cause any build to fail (modulo 
the plugin adding in strict warnings/errors, of course) that would have 
succeeded with clang in the first place. In that vein, letting plugins 
mess with command line options is not the right thing to do. Now 
granted, there are probably specific things normally associated with the 
command line that are likely desirable (adding libraries to link and 
messing with LLVM parameters both come to mind), but I suspect that most 
of those desires could be satisfied with more explicit and less powerful 
hooks.

[1] ARRGGH. Diagnostics has got to be the word I have the least chance 
of spelling correctly on my first try :-(

-- 
Joshua Cranmer
News submodule owner
DXR coauthor




More information about the cfe-commits mailing list