[cfe-commits] [PATCH] Implement a better plugin API for clang, v3

Sean Silva silvas at purdue.edu
Mon Jul 16 22:10:04 PDT 2012


> committed. This API is useful enough to be committed on its own (IMO)
> without adding extra plugin-AST matcher goodies.

I apologize if I came off as suggesting that you should put any effort
into "extra plugin-AST matcher goodies". My mention of the AST
matchers was mostly to justify the belief that it is a feasible (IMO
desirable) reality that your target audience cannot be assumed to be
people who primarily hack on clang and know the ins and outs of it,
how to build it, etc.

> Just apt-get install clang won't be enough IMO--note that Debian uses
> gcc-plugin-dev for GCC's plugin API headers (and clang-dev for Clang).

Sorry, I didn't communicate myself well. The important part there was
"proverbial `sudo
apt-get install ...`". That is, that these interfaces can be gotten
with the package manager and used with the system's regular clang
installation, rather than needing to have a full LLVM/Clang checkout,
build it yourself, mess with a bunch of compiler flags to point your
compiler to your build directory, etc.

> I was aiming for minimal modification of the clang plugins directory here. It's very easy to
> discuss running it over a hypothetical source file named "example.cpp"

please do.

> The name/value are reusing the names of the fields in the struct.

Ah, ok that makes sense. Nonetheless, I did miss that while reading,
so it's probably worth being explicit and saying "where <name>
corresponds to the `name` member and <value> corresponds to the
`value` member". Alternatively, if doxygen has a way to mark up member
names so that <name> and <value> get linked to the corresponding
member, that would work too.

> I wanted to make sure that the first function called was ABI/API-frozen. Reusing
> VersionTuple.h exposes plugins to the risk that developers decide to modify the
> VersionTuple ABI and API.

Ok, I guess the doxygen comment does hint at that. Are there other
"accessors that clang exposes" then, besides VersionTuple?

> It's documenting a group of functions; I'm not entirely sure how to do this in doxygen.

Then please ask on the list, doxygen IRC, etc. In particular I would
CC James Dennett <jdennett at google.com> if you mail the list since he
has been actively working on the doxygen docs. There's nothing worse
than misleading documentation.

> Now for the API comments. To put it simply, I'm still struggling to come up
> with a decent API. One problem I discovered after writing the patch is that
> the design of the driver architecture means that if your command line is:
> $ clang a.c b.c
> the plugin will be loaded twice, and the start/end file methods only called
> once per "global" instantiation, instead of being initialized once and
> having the start/end methods be called once. On the other hand, I think
> there are good use cases for letting plugins modify in particular the linker
> invocation and possibly add passes in the LTO plugin, which means that the
> same plugin might be loaded by three distinct processes. This is definitely
> counterintuitive to how most people would expect things to work if they
> don't realize that the command line actually invokes several processes, but
> I'm not sure the best way to remedy the situation. Ripping out the begin
> file/end file methods might be for the best.

Ah, that is a sticky situation.

> I've also found that growing the monolithic PluginCallbacks struct seems to
> work best in the long run. One feature that I think is necessary for plugins
> is to require plugins to only have to do things for the callbacks they want
> to use: if we assume that projects will be building plugins to run on their
> codebase, then we need some moderate API stability, since tying projects to
> a single version of Clang is not exactly tenable. For that reason, I'm
> disinclined to use a class with virtual methods.
>
> Thoughts?

Correct me if I'm wrong about this, but to my understanding, adding
virtual methods (at the end, and with a stub implementation) of an
interface class is API/ABI stable (also, just ran into this
<http://techbase.kde.org/Policies/Binary_Compatibility_Issues_With_C++>,
which seems to agree). Regardless, the design that I'm about to sketch
out can be done with an "ops struct" as well. I'll describe it in
terms of virtual functions though.

The design is the following:
Each class inside clang that wants to export callbacks available to
plugins (and I guess other clients) can have a FooCallbacks object,
similar to how PPCallbacks works now (including how the "chaining" is
done). This gives a uniform and clear point of extensibility for
making new things available to plugins.

This pattern already seems be present, e.g.
+    if (Callbacks.ppCallback)
+      CI.getPreprocessor().addPPCallbacks(Callbacks.ppCallback);
+    if (Callbacks.diagnostics)
+      CI.getDiagnostics().setClient(new ChainedDiagnosticConsumer(
+        &CI.getDiagnosticClient(), Callbacks.diagnostics));
+  }
internally, addPPCallbacks is doing exactly the same kind of chaining
that you are doing here with ChainedDiagnosticConsumer. ASTConsumer is
similar, except MultiplexConsumer uses an array rather than forming a
"linked list" (which doesn't really matter since all "iteration" we're
doing for the multiplexing is "forward iterator" iteration). Maybe it
would pay off to first refactor all of these classes to expose their
commonality, and then document this pattern so that we can soon see
"SemaCallbacks/SemaConsumer", "ParserCallbacks/ParserConsumer" etc.

The plugin interface then consists of a single `extern "C"` "factory
function" returning a pointer to an other interface, say
`PluginCallbacks`, which has virtual functions "MakePPCallbacks",
"MakeDriverCallbacks", etc each of which is itself a factory function
(taking appropriate arguments) that creates the appropriate callbacks
object. Each factory function can then be documented to indicate under
what circumstances it is called (e.g. in what process), so that users
have a clear model of when their code is being executed.

Actually, now that I think of it, this is highly reminiscent of the
"problem" solved by "signal/slots" in GUI frameworks (among other
places). As such, it might pay off to look at existing implementations
of signal/slots for inspiration on how best to structure this.
libsigc++ <http://libsigc.sourceforge.net/> is the one used by GNOME's
C++ bindings, Qt has its own
<http://qt-project.org/doc/qt-5.0/signalsandslots.html>; GNOME's
native C ones use a completely different C-based API that is probably
worth at least a cursory glance
<http://developer.gnome.org/gobject/stable/signal.html>. These are all
battle-hardened, production-quality solutions to the problem of
dynamically plugging in functionality which fires in response to
certain events (and doing so in a binary compatible way). Boost also
appears to have an implementation
<http://www.boost.org/doc/libs/1_50_0/doc/html/signals.html>. While we
don't need anything as full-blown as any of these, they should serve
as good examples of where this line of thinking ends; in particular,
they should provide strong hints regarding how to solve this problem
in a coherent, unified way (hence, being easy to document and apply to
other places in the clang codebase).

--Sean Silva

On Mon, Jul 16, 2012 at 7:50 PM, Joshua Cranmer <pidgeot18 at gmail.com> wrote:
> On 7/16/2012 7:25 PM, Sean Silva wrote:
>>
>> Sorry this is rather long.
>>
>> First off, are you assuming that people who write plugins are "clang
>> hackers" (e.g., they are definitely on the mailing list, probably have
>> committed to clang, etc.)? I think this is too stringent. I think
>> plugins would make a good "first brush" with clang. With the addition
>> of the AST matchers, it is pretty reasonable to be able to do useful
>> stuff without being a "clang hacker". In fact, the tooling docs
>> <http://clang.llvm.org/docs/Tooling.html> basically flat-out say that
>> plugins are targeted at users who are almost by definition *not* clang
>> hackers; e.g. one of the "canonical examples" it gives is "special
>> lint-style warnings or errors for **your** project [emphasis
>> mine]"---the assumption is that the clients of plugins are primarily
>> hackers on some other project. Hence, I think the plugin API should be
>> targeted at being able to be useful with just a proverbial `sudo
>> apt-get install clang`; if this is technically possible, then it
>> should be done, as it increases the usefulness/audience of clang
>> plugins by _at least_ an order of magnitude.
>
>
> I definitely want plugins to be on the easier side to implement. I also
> don't want the perfect to be the enemy of the good here--I've been working
> on versions of this patch for almost a year.  I think using AST matchers is
> desirable here, but remember that I wrote this patch before they were
> committed. This API is useful enough to be committed on its own (IMO)
> without adding extra plugin-AST matcher goodies.
>
> Just apt-get install clang won't be enough IMO--note that Debian uses
> gcc-plugin-dev for GCC's plugin API headers (and clang-dev for Clang).
>
>> +  // It is dangerous to use plugins with versions that it wasn't compiled
>> for.
>> +  // This causes us to bail out before we can do things like crash clang.
>> +  if (version->major != CLANG_VERSION_MAJOR ||
>> +      version->minor != CLANG_VERSION_MINOR)
>> +    return false;
>>
>> Maybe this checking can be turned upside down, where the plugin
>> exports the version it is compiled for, and clang does the checking.
>> That would allow for nice clang-style diagnostics and would guarantee
>> that the check is always performed. Regardless, there has to be a way
>> to guarantee that a diagnostic is emitted instead of just silently not
>> working.
>
>
> The plugin loading code emits a diagnostic if loading the plugin (including
> failing the version check) fails.
>
>> At a different level, this check doesn't seem meaningful since the
>> plugin is presumably going to use other APIs inside clang, and hence
>> the interfaces are only stable within a single revision number. Maybe
>> this is intended as just a "sanity check"? If so, it should be
>> documented as such. I think it would be wise to check against the
>> exact revision number in order to guarantee that the plugin won't
>> spontaneously crash or misbehave.
>
>
> I copied the version stuff from GCC's plugin API. There definitely needs to
> be a check to guard against hard ABI incompatibility. The advantage of
> having the plugin do the check itself is that there might be some valid use
> cases where a single plugin is ABI-compatible with multiple versions of
> Clang, though I agree that (for now, at least) these are very marginal use
> cases and probably not worth supporting.
>
>> +passed with -fplugin-arg-<plugin-name>-<arg>=<value> or
>> +-fplugin-arg-<plugin-name>-<arg>. These options work the same when
>> run
>>
>> I don't think you have yet defined what exactly the "plugin name" is.
>> Without that, this description is not too meaningful.
>>
>> +-fplugin-arg-<plugin-name>-<arg>. These options work the same when
>> run
>> +both with the normal driver and the cc1 process.</p>
>>
>> I suspect that a lot of people who might want to write plugins aren't
>> aware of the distinction between the `driver` and the `cc1 process`.
>> This seems irrelevant to the new plugin API anyways since it "just
>> works" with the driver (does anybody ever actually directly invoke
>> cc1?). Maybe note it as a comparison with the previous API.
>
>
> The reworded text now looks like:
> <p>To run a plugin, you merely need to specify the plugin to be loaded using
> the
> -fplugin command line options. Additional parameters for the plugins can be
>
> passed with -fplugin-arg-<plugin-name>-<arg>=<value> or
> -fplugin-arg-<plugin-name>-<arg>. The plugin name is the basename of
> the
> file, so for a plugin whose library is <tt>libplugin.so</tt>, the name to
> use
> would be <tt>libplugin</tt>. These options do not require the use of the
> <tt>-cc1</tt> or <tt>-Xclang</tt> options to work properly.</p>
>
>
>>     $ clang++ -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS \
>>           -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -D_GNU_SOURCE \
>>           -I$BD/tools/clang/include -Itools/clang/include -I$BD/include
>> -Iinclude \
>>           tools/clang/tools/clang-check/ClangCheck.cpp -fsyntax-only \
>>
>> I think the need for this complexity seems to be that you are running
>> it over one of the source file in the LLVM/Clang tree.
>> Maybe you could just use Foo.cpp? Or show a simple file to them (e.g.
>> "put this into a file Foo.cpp"). Regardless, I don't think that a
>> developer should have to have LLVM/Clang checked out in order to write
>> or use plugins, so there shouldn't be a reference to the LLVM/Clang
>> source tree in the documentation.
>
>
> I was aiming for minimal modification of the clang plugins directory here.
> It's very easy to discuss running it over a hypothetical source file named
> "example.cpp"
>
>>
>> + * Plugin arguments are specified as -fplugin-arg-<plugin>-<name>=<value>
>> or
>> + * -fplugin-arg-<plugin>-<name> attributes. In the latter case, the value
>> + * attribute would be an empty string.
>>
>> I don't think it is entirely clear from context what <plugin> and
>> <name> should be here.
>
> The name/value are reusing the names of the fields in the struct.
>
>
>>
>> +/**
>> + * A struct representing the version of clang.
>> + *
>> + * This version is different from other accessors that clang exposes
>> in order to
>> + * retain ABI compatibility.
>> + */
>> +struct Version {
>> +  /// The major version of clang (corresponds to CLANG_VERSION_MAJOR)
>> +  unsigned major;
>> +  /// The minor version of clang (corresponds to CLANG_VERSION_MINOR)
>> +  unsigned minor;
>> +  /// The patch level version of clang, if present. This value shouldn't
>> affect
>> +  /// API or ABI-level compatibility with clang for most uses. This
>> corresponds
>> +  /// to CLANG_VERSION_PATCHLEVEL if it is defined.
>> +  unsigned subminor;
>> +};
>>
>> what is the relationship between this and `include/clang/VersionTuple.h`?
>
>
> I wanted to make sure that the first function called was ABI/API-frozen.
> Reusing VersionTuple.h exposes plugins to the risk that developers decide to
> modify the VersionTuple ABI and API.
>
>> +// The following functions are not implemented within clang itself, but
>> are
>> +// prototypes of the various functions that may be implemented by
>> plugins.
>> +// Including it in a header file allows these hooks to be documented with
>> +// doxygen, and the use of extern "C" on declarations allows people who
>> forget
>> +// to define these functions with extern "C" to still have them work.
>>
>> Maybe this should somehow show up in the doxygen documentation?
>> Particularly "The following functions are not implemented within clang
>> itself"; things like this that violate users assumptions ("if it's in
>> the doxygen docs, then it is implemented by clang") need to be clearly
>> marked as such.
>
> It's documenting a group of functions; I'm not entirely sure how to do this
> in doxygen.
>
>>
>> +/// Before this function is called, the callbacks object will have all of
>> its
>>
>> Here you assume that the reader knows that "callbacks object" is
>> `PluginFileCallbacks`; is this a valid assumption? I had to "figure it
>> out". Save the reader the hassle and just say "the passed-in
>> PluginFileCallbacks object".
>
>
> I originally had the argument name of callbacks lower-case; I forgot to
> change the documentation after renormalizing the capitalization.
>
> Now for the API comments. To put it simply, I'm still struggling to come up
> with a decent API. One problem I discovered after writing the patch is that
> the design of the driver architecture means that if your command line is:
> $ clang a.c b.c
> the plugin will be loaded twice, and the start/end file methods only called
> once per "global" instantiation, instead of being initialized once and
> having the start/end methods be called once. On the other hand, I think
> there are good use cases for letting plugins modify in particular the linker
> invocation and possibly add passes in the LTO plugin, which means that the
> same plugin might be loaded by three distinct processes. This is definitely
> counterintuitive to how most people would expect things to work if they
> don't realize that the command line actually invokes several processes, but
> I'm not sure the best way to remedy the situation. Ripping out the begin
> file/end file methods might be for the best.
>
> I've also found that growing the monolithic PluginCallbacks struct seems to
> work best in the long run. One feature that I think is necessary for plugins
> is to require plugins to only have to do things for the callbacks they want
> to use: if we assume that projects will be building plugins to run on their
> codebase, then we need some moderate API stability, since tying projects to
> a single version of Clang is not exactly tenable. For that reason, I'm
> disinclined to use a class with virtual methods.
>
> Thoughts?
>
> --
> Joshua Cranmer
> News submodule owner
> DXR coauthor
>



More information about the cfe-commits mailing list