[PATCH] D139774: [libclang] Add API to set temporary directory location

Igor Kushnir via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Feb 2 00:11:34 PST 2023


vedgy added a comment.

Another disadvantage of using `CINDEX_VERSION_MINOR` instead of the `sizeof` is that an older libclang version won't know about compatibility of newer versions with itself. But this reasoning brought me to a different thought: when a program is compiled against a libclang version X, it should not be expected to be runtime-compatible with a libclang version older than X. For example, KDevelop uses `CINDEX_VERSION_MINOR` to decide whether or not certain libclang API is available.

I suspect modifying the exposed struct's size will violate ODR in C++ programs that use libclang. Quote from the cppreference ODR page <https://en.cppreference.com/w/cpp/language/definition#One_Definition_Rule>:

> Note: in C, there is no program-wide ODR for types, and even extern declarations of the same variable in different translation units may have different types as long as they are compatible. In C++, the source-code tokens used in declarations of the same type must be the same as described above: if one .cpp file defines `struct S { int x; };` and the other .cpp file defines `struct S { int y; };`, the behavior of the program that links them together is undefined. This is usually resolved with unnamed namespaces.

I think the answer to this StackOverflow question <https://stackoverflow.com/questions/5140893/struct-defined-differently-for-c-and-c-is-it-safe-pc-lint-warns> confirms my suspicion.

In order to avoid the undefined behavior, libclang can use the same approach as for other structs/classes: make the struct opaque and provide functions `clang_createIndexOptions()`,  `clang_disposeIndexOptions()`, `clang_setTempDirPath(CXIndexOptions options, const char *path)`.

`int excludeDeclarationsFromPCH` and `int displayDiagnostics` currently passed to `clang_createIndex()` can also be included in the struct and acquire their own setter functions. Then only a single argument will be passed to `clang_createIndexWithOptions()`: `CXIndexOptions`.

--------------------------

In D139774#4094619 <https://reviews.llvm.org/D139774#4094619>, @aaron.ballman wrote:

> In D139774#4092553 <https://reviews.llvm.org/D139774#4092553>, @vedgy wrote:
>
>> In D139774#4091631 <https://reviews.llvm.org/D139774#4091631>, @aaron.ballman wrote:
>>
>>> My preference is still for specific API names. If we want to do something general to all temp files, I think `FileSystemOptions` is the way to go.
>>>
>>> My concern with not using a constructor is that, because this is the C API, we will be supporting the functionality for a long time even if we do switch to using a global override in `FileSystemOptions` where you would need to set the information up before executing the compiler. To me, these paths are something that should not be possible to change once the compiler has started executing. (That also solves the multithreading problem where multiple threads are fighting over what the temp directory should be and stepping on each other's toes.)
>>
>> As I understand, this leaves two choices:
>>
>> 1. Specific preamble API names, two separate non-constructor setters; the option values are stored in a separate struct (or even as two separate data members), not inside `FileSystemOptions`.
>> 2. General temporary-storage arguments (possibly combined in a struct) to a new overloaded constructor function; the option values are stored inside the `FileSystemOptions` struct.
>>
>> The second alternative is likely more difficult to implement, more risky and less convenient to use (the store-in-memory bool option cannot be modified at any time). Perhaps it should be delayed until (and if) we learn of other temporary files libclang creates? A downside of implementing the first option now is that the specific API would have to be supported for a long time, even after the general temporary-file API is implemented.
>
> I still think the general solution is where we ultimately want to get to and that makes me hesitant to go with specific preamble API names despite that being the direction you prefer. If this wasn't the C APIs, I'd be less concerned because we make no stability guarantees about our C++ interfaces. But we try really hard not to break the C APIs, so adding the specific names is basically committing to not only the APIs but their semantics. I think that makes implementing the general solution slightly more awkward because these are weird special cases that barely get tested in-tree, so they'd be very easy to overlook and accidentally break.
>
> Is there a middle ground where, instead of #2 for general temporary storage, we went with #2 but with compiler-specific directories instead of system directories. e.g., don't let the caller set the temp directory, but do let the caller set the preamble directory (which defaults to the temp directory) so long as it's set before invoking the compiler? This still won't let you change options mid-run, but it also seems like it should have less risk of affecting other components while still solving the thread safety issues. I'm not certain if it's any easier to implement, but I think it does save you from modifying `FileSystemOptions`. As a separate item, we could then consider adding a new C API to let you toggle the in-memory vs on-disk functionality after exploring that it won't cause other problems because nobody considered the possibility that it's not a stable value for the compiler invocation.

After thinking some more, I'd like to revisit this decision. You agree that the general solution `setTempDirPath()` is an ultimate goal. But the rejection of tampering with the return value of `system_temp_directory()` means that overriding the temporary path will likely never be perfectly reliable. So how about a more sincere general solution: `setPreferredTempDirPath()`? The documentation would say that libclang tries its best to place temporary files into the specified directory, but some might end up in the system temporary directory instead.

This non-strict `Preferred` contract opens a path to design simplification. [correct me if I am wrong] `clang_createIndex()` does not need and is not likely to need a temporary directory path itself. So the documentation can recommend calling a non-constructor `setPreferredTempDirPath(CXIndex, const char *path)` function right after the call to `clang_createIndex()`. And possibly caution that calling `setPreferredTempDirPath()` later can result in fewer temporary files ending up in the overridden directory. If `clang_createIndex()`**does** need the temporary directory path in the future (unlikely I think), then the discussed overload of the `CXIndex` constructor function can be added.

The `FileSystemOptions` struct is an implementation detail not exposed in libclang. So the temporary directory path can be stored outside of it for now, if it simplifies the implementation. If and when the temporary directory path becomes more widely used and has to be passed to more classes, it could be moved inside the `FileSystemOptions` struct without modifying the libclang API.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774



More information about the cfe-commits mailing list