[PATCH] D143418: [libclang] Add API to override preamble storage path
Igor Kushnir via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Mar 1 10:10:00 PST 2023
vedgy marked 2 inline comments as done.
vedgy added inline comments.
================
Comment at: clang/include/clang-c/Index.h:329
+ * CXIndexOptions Opts = { sizeof(CXIndexOptions),
+ * clang_getDefaultGlobalOptions() };
+ * \endcode
----------------
When I almost finished the requested changes, I remembered that the return value of `clang_getDefaultGlobalOptions()` depends on environment variables, and thus `0` is not necessarily the default. Adjusted the changes and updated this revision.
Does the extra requirement to non-zero initialize this second member sway your opinion on the usefulness of the helper function `inline CXIndexOptions clang_getDefaultIndexOptions()`? Note that there may be same (environment) or other important reasons why future new options couldn't be zeroes by default.
================
Comment at: clang/tools/c-index-test/c-index-test.c:79
+ Opts.PreambleStoragePath = NULL;
+ Opts.InvocationEmissionPath = getenv("CINDEXTEST_INVOCATION_EMISSION_PATH");
+
----------------
aaron.ballman wrote:
> vedgy wrote:
> > aaron.ballman wrote:
> > > vedgy wrote:
> > > > aaron.ballman wrote:
> > > > > vedgy wrote:
> > > > > > When a libclang user needs to override a single option in `CXIndexOptions`, [s]he has to set every member of the struct explicitly. When new options are added, each libclang user needs to update the code that sets the options under `CINDEX_VERSION_MINOR` `#if`s. Accidentally omitting even one member assignment risks undefined or wrong behavior. How about adding an `inline` helper function `CXIndexOptions clang_getDefaultIndexOptions()`, which assigns default values to all struct members? Libclang users can then call this function to obtain the default configuration, then tweak only the members they want to override.
> > > > > >
> > > > > > If this suggestion is to be implemented, how to deal with [[https://stackoverflow.com/questions/68004269/differences-of-the-inline-keyword-in-c-and-c|the differences of the inline keyword in C and C++]]?
> > > > > By default, `0` should give you the most reasonable default behavior for most of the existing options (and new options should follow the same pattern). Ideally, users should be able to do:
> > > > > ```
> > > > > CXIndexOptions Opts;
> > > > > memset(&Opts, 0, sizeof(Opts));
> > > > > Opts.Size = sizeof(Opts);
> > > > > Opts.Whatever = 12;
> > > > > CXIndex Idx = clang_createIndexWithOptions(&Opts);
> > > > > ```
> > > > > Global options defaulting to 0 is fine (uses regular thread priorities), we don't think want to default to excluding declarations from PCH, and we want to use the default preamble and invocation emission paths (if any). The only option that nonzero as a default *might* make sense for is displaying diagnostics, but even that seems reasonable to expect the developer to manually enable.
> > > > >
> > > > > So I don't know that we need a function to get us default indexing options as `0` should be a reasonable default for all of the options. As we add new options, we need to be careful to add them in backwards compatible ways where `0` means "do the most likely thing".
> > > > >
> > > > > WDYT?
> > > > The disadvantages of committing to defaulting to `0`:
> > > > 1. The usage you propose is still more verbose and error-prone than
> > > > ```
> > > > CXIndexOptions Opts = clang_getDefaultIndexOptions();
> > > > Opts.Whatever = 12;
> > > > CXIndex Idx = clang_createIndexWithOptions(&Opts);
> > > > ```
> > > > 2. The `memset` would look very unclean in modern C++ code.
> > > > 3. The `0` commitment may force unnatural naming of a future option to invert its meaning.
> > > >
> > > > The advantages:
> > > > 1. No need to implement the inline function now.
> > > > 2. Faster, but I am sure this performance difference doesn't matter. Even a non-inline function call itself (even if it did nothing) to `clang_createIndexWithOptions()` should take longer than assigning a few values to built-in-typed members.
> > > >
> > > > Another advantage of not having to remember to update the inline function's implementation when new options are added is counterbalanced by the need to be careful to add new options in backwards compatible way where `0` is the default.
> > > >
> > > > Any other advantages of the `0` that I miss? Maybe there are some advantages for C users, but I suspect most libclang users are C++.
> > > >The usage you propose is still more verbose and error-prone than
> > > > ```
> > > > CXIndexOptions Opts = clang_getDefaultIndexOptions();
> > > > Opts.Whatever = 12;
> > > > CXIndex Idx = clang_createIndexWithOptions(&Opts);
> > > > ```
> > >
> > > I see it as being roughly the same amount of verbosity and proneness to error, but maybe that's just me.
> > >
> > > > The memset would look very unclean in modern C++ code.
> > >
> > > I don't see this as a problem. 1) `std::memset` gets plenty of usage in modern C++ still, 2) you can also initialize with ` = { sizeof(CXIndexOptions) };` and rely on the zero init for subsequent members after the first (personally, I think that's too clever, but reasonable people may disagree), 3) this is a C API, folks using C++ may wish to wrap it in a better interface for C++ anyway.
> > >
> > > > The 0 commitment may force unnatural naming of a future option to invert its meaning.
> > >
> > > I do worry about this one, especially given there's no way to predict what future options we'll need. But it also forces us to think about what the correct default behavior should be, whereas if we push it off to a helper function, we make it harder for everyone who didn't know to use that helper function for whatever reason.
> > >
> > > > Any other advantages of the 0 that I miss? Maybe there are some advantages for C users, but I suspect most libclang users are C++.
> > >
> > > Familiarity for those who are used to dealing with existing C APIs that behave this way (like Win32 APIs), but that can probably be argued either way. My guess is that all libclang users are having to work through the C interface, and some decently large amount of them do so from languages other than C. But whether that means most are C++ users or not is questionable -- I know I've seen uses from Python and Java as well as C++.
> > >
> > > I'm not strongly opposed to having an API to get the default values, but I don't see a lot of value in it either. It's something we could always add later if we find there's a need for it in practice.
> > > I see it as being roughly the same amount of verbosity and proneness to error, but maybe that's just me.
> > Calling the default-value function is one requirement. Assigning `sizeof` and calling `memset` - 2. Plus the user has to pass the correct arguments to `memset`.
> >
> > > 1) `std::memset` gets plenty of usage in modern C++ still
> > Yes, there is usage, but modern C++ guidelines recommend replacing such usages with standard algorithms whenever possible. `memset`-ting an object generally risks undefined behavior.
> >
> > > whereas if we push it off to a helper function, we make it harder for everyone who didn't know to use that helper function for whatever reason.
> > If a user doesn't read the documentation, [s]he would likely not guess that `memset` needs to be called either. So I don't see a disadvantage of the helper function here.
> >
> > > It's something we could always add later if we find there's a need for it in practice.
> > We could add it later. But if we advise using `memset` in the first version, introducing a struct member with non-zero default value would risk breaking backward compatibility for users that don't track libclang API changes closely. It would also force all users that **do** notice the change to update their code with `#if CINDEX_VERSION_MINOR >= XY` to use either `memset` when compiling against libclang versions that don't yet have the helper function, or the helper function.
> >
> > Introducing the helper function from the beginning affords more flexibility and space for future libclang API changes with less risk of breaking backward compatibility. If someone decides to ignore the helper function and `memset` the struct instead, breaking such usage should not be an obstacle/concern to future libclang changes.
> >> I see it as being roughly the same amount of verbosity and proneness to error, but maybe that's just me.
> > Calling the default-value function is one requirement. Assigning sizeof and calling memset - 2. Plus the user has to pass the correct arguments to memset.
>
> Both of which can be done in a single one-liner if brevity is that important for your coding style.
>
> > Yes, there is usage, but modern C++ guidelines recommend replacing such usages with standard algorithms whenever possible. memset-ting an object generally risks undefined behavior.
>
> I don't see how that can be much of a real risk in a C interface.
>
> > If a user doesn't read the documentation, [s]he would likely not guess that memset needs to be called either. So I don't see a disadvantage of the helper function here.
>
> Agreed.
>
> > We could add it later. But if we advise using memset in the first version, introducing a struct member with non-zero default value would risk breaking backward compatibility ...
>
> My recommendation is that we don't add the API now and we leave a comment in the options struct reminding folks about the importance of new options being expressed such that `0` gives the correct default behavior. As you say, it is a risk that we break the model in the future, but it seems like a somewhat low risk.
>
>
Added and modified documentation, adapted usage in tests.
================
Comment at: clang/tools/libclang/CIndex.cpp:3701-3702
+CXIndex clang_createIndexWithOptions(const CXIndexOptions *options) {
+ if (options->Size != sizeof(CXIndexOptions))
+ return NULL;
+ CIndexer *CIdxr = clang_createIndex_Impl(options->ExcludeDeclarationsFromPCH,
----------------
vedgy wrote:
> aaron.ballman wrote:
> > I think we want this to be `>` and not `!=`, maybe.
> >
> > If the sizes are equal, we're on the happy path.
> >
> > If the options from the caller are smaller than the options we know about, that should be okay because we won't attempt read the options not provided and instead rely on the default behavior being reasonable.
> >
> > If the options from the caller are larger than the options we know about, we have to assume the user set an option we can't handle, and thus failing the request is reasonable.
> >
> > So the way I'm envisioning us reading the options is:
> > ```
> > if (options->Size >= offsetof(CXIndexOptions, FieldWeCareAbout))
> > do_something(options->FieldWeCareAbout);
> > ```
> You are probably right looking forward.
>
> I have opted for `!=` because this is the initial struct version. So right now a smaller struct size means something very unexpected or a failure to assign correct size on the user side. The advantage of `!=` is then more reliable catching of bugs in freshly written user code that uses the struct and an added flexibility of being able to //reduce// the struct's size in the next revision.
Added your recommended way to add new options in comments.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D143418/new/
https://reviews.llvm.org/D143418
More information about the cfe-commits
mailing list