[PATCH] D143418: [libclang] Add API to override preamble storage path

Aaron Ballman via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Feb 28 10:37:15 PST 2023


aaron.ballman added inline comments.


================
Comment at: clang/tools/c-index-test/c-index-test.c:79
+    Opts.PreambleStoragePath = NULL;
+    Opts.InvocationEmissionPath = getenv("CINDEXTEST_INVOCATION_EMISSION_PATH");
+
----------------
vedgy wrote:
> aaron.ballman wrote:
> > vedgy wrote:
> > > aaron.ballman wrote:
> > > > vedgy wrote:
> > > > > When a libclang user needs to override a single option in `CXIndexOptions`, [s]he has to set every member of the struct explicitly. When new options are added, each libclang user needs to update the code that sets the options under `CINDEX_VERSION_MINOR` `#if`s. Accidentally omitting even one member assignment risks undefined or wrong behavior. How about adding an `inline` helper function `CXIndexOptions clang_getDefaultIndexOptions()`, which assigns default values to all struct members? Libclang users can then call this function to obtain the default configuration, then tweak only the members they want to override.
> > > > > 
> > > > > If this suggestion is to be implemented, how to deal with [[https://stackoverflow.com/questions/68004269/differences-of-the-inline-keyword-in-c-and-c|the differences of the inline keyword in C and C++]]?
> > > > By default, `0` should give you the most reasonable default behavior for most of the existing options (and new options should follow the same pattern). Ideally, users should be able to do:
> > > > ```
> > > > CXIndexOptions Opts;
> > > > memset(&Opts, 0, sizeof(Opts));
> > > > Opts.Size = sizeof(Opts);
> > > > Opts.Whatever = 12;
> > > > CXIndex Idx = clang_createIndexWithOptions(&Opts);
> > > > ```
> > > > Global options defaulting to 0 is fine (uses regular thread priorities), we don't think want to default to excluding declarations from PCH, and we want to use the default preamble and invocation emission paths (if any). The only option that nonzero as a default *might* make sense for is displaying diagnostics, but even that seems reasonable to expect the developer to manually enable.
> > > > 
> > > > So I don't know that we need a function to get us default indexing options as `0` should be a reasonable default for all of the options. As we add new options, we need to be careful to add them in backwards compatible ways where `0` means "do the most likely thing".
> > > > 
> > > > WDYT?
> > > The disadvantages of committing to defaulting to `0`:
> > > 1. The usage you propose is still more verbose and error-prone than
> > > ```
> > > CXIndexOptions Opts = clang_getDefaultIndexOptions();
> > > Opts.Whatever = 12;
> > > CXIndex Idx = clang_createIndexWithOptions(&Opts);
> > > ```
> > > 2. The `memset` would look very unclean in modern C++ code.
> > > 3. The `0` commitment may force unnatural naming of a future option to invert its meaning.
> > > 
> > > The advantages:
> > > 1. No need to implement the inline function now.
> > > 2. Faster, but I am sure this performance difference doesn't matter. Even a non-inline function call itself (even if it did nothing) to `clang_createIndexWithOptions()` should take longer than assigning a few values to built-in-typed members.
> > > 
> > > Another advantage of not having to remember to update the inline function's implementation when new options are added is counterbalanced by the need to be careful to add new options in backwards compatible way where `0` is the default.
> > > 
> > > Any other advantages of the `0` that I miss? Maybe there are some advantages for C users, but I suspect most libclang users are C++.
> > >The usage you propose is still more verbose and error-prone than
> > > ```
> > > CXIndexOptions Opts = clang_getDefaultIndexOptions();
> > > Opts.Whatever = 12;
> > > CXIndex Idx = clang_createIndexWithOptions(&Opts);
> > > ```
> > 
> > I see it as being roughly the same amount of verbosity and proneness to error, but maybe that's just me.
> > 
> > > The memset would look very unclean in modern C++ code.
> > 
> > I don't see this as a problem. 1) `std::memset` gets plenty of usage in modern C++ still, 2) you can also initialize with ` = { sizeof(CXIndexOptions) };` and rely on the zero init for subsequent members after the first (personally, I think that's too clever, but reasonable people may disagree), 3) this is a C API, folks using C++ may wish to wrap it in a better interface for C++ anyway.
> >  
> > > The 0 commitment may force unnatural naming of a future option to invert its meaning.
> > 
> > I do worry about this one, especially given there's no way to predict what future options we'll need. But it also forces us to think about what the correct default behavior should be, whereas if we push it off to a helper function, we make it harder for everyone who didn't know to use that helper function for whatever reason.
> > 
> > > Any other advantages of the 0 that I miss? Maybe there are some advantages for C users, but I suspect most libclang users are C++.
> > 
> > Familiarity for those who are used to dealing with existing C APIs that behave this way (like Win32 APIs), but that can probably be argued either way. My guess is that all libclang users are having to work through the C interface, and some decently large amount of them do so from languages other than C. But whether that means most are C++ users or not is questionable -- I know I've seen uses from Python and Java as well as C++.
> > 
> > I'm not strongly opposed to having an API to get the default values, but I don't see a lot of value in it either. It's something we could always add later if we find there's a need for it in practice.
> > I see it as being roughly the same amount of verbosity and proneness to error, but maybe that's just me.
> Calling the default-value function is one requirement. Assigning `sizeof` and calling `memset` - 2. Plus the user has to pass the correct arguments to `memset`.
> 
> > 1) `std::memset` gets plenty of usage in modern C++ still
> Yes, there is usage, but modern C++ guidelines recommend replacing such usages with standard algorithms whenever possible. `memset`-ting an object generally risks undefined behavior.
> 
> > whereas if we push it off to a helper function, we make it harder for everyone who didn't know to use that helper function for whatever reason.
> If a user doesn't read the documentation, [s]he would likely not guess that `memset` needs to be called either. So I don't see a disadvantage of the helper function here.
> 
> > It's something we could always add later if we find there's a need for it in practice.
> We could add it later. But if we advise using `memset` in the first version, introducing a struct member with non-zero default value would risk breaking backward compatibility for users that don't track libclang API changes closely. It would also force all users that **do** notice the change to update their code with `#if CINDEX_VERSION_MINOR >= XY` to use either `memset` when compiling against libclang versions that don't yet have the helper function, or the helper function.
> 
> Introducing the helper function from the beginning affords more flexibility and space for future libclang API changes with less risk of breaking backward compatibility. If someone decides to ignore the helper function and `memset` the struct instead, breaking such usage should not be an obstacle/concern to future libclang changes.
>> I see it as being roughly the same amount of verbosity and proneness to error, but maybe that's just me.
> Calling the default-value function is one requirement. Assigning sizeof and calling memset - 2. Plus the user has to pass the correct arguments to memset.

Both of which can be done in a single one-liner if brevity is that important for your coding style.

> Yes, there is usage, but modern C++ guidelines recommend replacing such usages with standard algorithms whenever possible. memset-ting an object generally risks undefined behavior.

I don't see how that can be much of a real risk in a C interface.

> If a user doesn't read the documentation, [s]he would likely not guess that memset needs to be called either. So I don't see a disadvantage of the helper function here.

Agreed.

> We could add it later. But if we advise using memset in the first version, introducing a struct member with non-zero default value would risk breaking backward compatibility ...

My recommendation is that we don't add the API now and we leave a comment in the options struct reminding folks about the importance of new options being expressed such that `0` gives the correct default behavior. As you say, it is a risk that we break the model in the future, but it seems like a somewhat low risk.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143418/new/

https://reviews.llvm.org/D143418



More information about the cfe-commits mailing list