[LLVMdev] [RFC] Removing static initializers for command line options

Tue Aug 26 18:18:56 PDT 2014

I count 15 occurrences of "-backend-option" in clang/lib/Driver/Tools.cpp and most (all? didn't look at them real hard) of those aren't debugging options.  Presumably those can all be solved by adding more plumbing, but it would be *really* nice to make that plumbing easier to do.
--paulr

From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Chandler Carruth
Sent: Tuesday, August 26, 2014 10:42 AM
To: Chris Bieneman
Cc: LLVM Dev
Subject: Re: [LLVMdev] [RFC] Removing static initializers for command line options

A bunch of IRC discussions etc have taken place, and I wanted to surface some of the points from them here. Also, I wanted to give a more fully thought-through set of my own perspectives.

First, I'd like to re-state the goals as I think those have gotten muddied.

1) Remove most of LLVM's static initializers (that is, dynamic initialization code for static storage duration objects) by moving away from global objects with non-trivial constructors of 'llvm::cl::opt' types in the LLVM *libraries* (they remain fine in the tools of course).

2) Support separate LLVM users in a single address space having different values for the debugging knobs currently managed through the 'llvm::cl::opt' command line flags mechanism.

3) (related to #2) Support some library API for setting the debugging knobs currently managed through the 'llvm::cl::opt' command line flags mechanism.

Now, I want to change my position on one point because after talking to Chris, Jim, and others I understand the problem significantly better. I am now convinced that it is impossible to fix even #1 without making a change for every single option in LLVM. Why? Because we have to *register* all of the flags, and this registration inherently cannot be done from a global's constructor without introducing the undesirable static initializers[1].

I also think that in order to address #2 we would need to make some change to essentially every option in LLVM as we would have to connect it to non-global storage.

Given that, I think it does make sense to try to address #1 and #2 at the same time -- trying to limit the change to just one or the other would very likely result in a significantly different API for creating, registering, and using these options, and I would rather not make all of the LLVM developers re-learn how to do this 2 times if we can instead do it 1 time.

However, doing #2 only really makes sense if we also do #3 to let people *use* this new freedom. =] Owen convinced me that there really is a good reason to support #3 as part of the LLVM libraries, however I wanted to raise my primary concern about this on the mailing list because it hasn't really been discussed.

These "debugging knobs" or library level options which *aren't* exposed in the library's APIs are very risky to expose in a side-channel API: they are often completely unsupported or unsupportable options that were never intended for "production" usage. Once we expose these options via some API to library users, we run a very serious risk of getting locked into supporting behaviors or debugging misbehavior that were never expected or planned for...

However, the debugging use case is real and important. So I want to propose that we guard many of the paths to setting these options with NDEBUG so that in non-asserts builds, library users can't override these options. This would mean the C++ API for setting this in LLVMContext (and the C API if we ever expose it there) would only support setting these in asserts builds. We could even sink the checking into the registry. The name should probably have "debug" in it somewhere. It would then have some InternalUnsupportedScaryBad API which is not actually in the header files but is used in LLVM's tools to maintain their current behavior without documenting a public API. It's not bulletproof, but it seemed that it would be sufficient for the debugging use cases and is a strong statement of "unsupported" IMO.

The only other observation I would like to make is that if we're going to do this, we really should take the time to try to sort out a really *good* API. Hopefully we'll be slightly less wrong this time, and maybe the next transition can be more incremental.

Ok, on to the more low-level parts of this email, here some specific thoughts for how to design the API for further discussion:

- I really like the idea of using a 'void*' derived from some static tag as a way to identify uniquely each option. I also think we can use some minimal "reflection" techniques to significantly simplify this.

- The need for a global registry is... distasteful, but I agree inevitable.

- I'd like to avoid a separate "option store". My current suggestion is to use the LLVMContext, and to add a context to the layers which don't have an LLVMContext and where these kinds of options make sense.

- I would actually use a simple 'void*' -> string mapping in the "store" of the context. (Maybe we can actually use StringMap?) I would still provide the type to the registry and use that to validate options when parsing. When getting the value out of the "store" in the context, we can parse it into the data-type and assert fail on parse errors. I don't feel strongly about this, but without "std::any" it seems simpler than rolling our own with a void* or a union.

- I would provide a simple API on the LLVMContext to get an "option" string for a given key. This API could be exposed by any context, and we can write the option value extraction library to accept a generic T which supports such a query method. That way there is *very* loose coupling between these and it is easy to support any context object at any layer.

- The registry should essentially provide an interface to parse an argv string list into such a map from 'void*' -> string. The LLVMContext (or any other context) can then be populated by the result of this parse. Each library can parse whatever argv it wants.

- I don't think we should keep piling on the interface design of the existing cl::opt stuff. I would make a clean break to a new header / library, and just build some code to map these into the cl::opt parsing code from the registry (we already do stuff like this for pass names). This will also help the two systems co-exist but stay separate during the (inevitably long) transition period.

- As a specific example of what I would change from the cl::opt stuff: I think we should make the required arguments just be required and positional, and the optional arguments use a fluent-style interface (yea, I know, but I think it works well here) rather than the weird variadic-ish thing...

Once there is a patch that seems like a reasonable start at this, I'd actually like to get the new library added and just one pass (the one you're using seems great) converted. I'd then like to try a few experiments to see if we can improve some of the nitty-gritty details of the API before we roll it out widely.

Seem reasonable? I'm actually happy to help with some of the API experiments and the rollout here.
-Chandler

[1]: I'd actually like to understand why on some platforms the explicit global initialization routines are so much better than global constructors... But that's probably a topic for on off-list or IRC conversation. I don't want to derail this thread with that.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140827/5c9234e4/attachment.html>