[llvm-dev] Supporting LLVM_BUILD_LLVM_DYLIB on Windows

Saleem Abdulrasool via llvm-dev llvm-dev at lists.llvm.org
Tue Sep 14 13:09:49 PDT 2021


On Mon, Sep 13, 2021 at 11:44 AM Chris Bieneman <chris.bieneman at me.com>
wrote:

> I had a thought and two questions related to this.
>
> The thought:
>
> We informally have three types of APIs in LLVM components. We have
> stable-ish C APIs, unstable C++ APIs that are expected to be used outside
> the component and unstable C++ APIs that are internal to the component.
>
> In some (but not all) cases we make a technical distinction between
> internal to the component APIs by putting the headers for those APIs in the
> lib folder beside the implementation.
>
> If we’re talking about annotating APIs for symbol export control (which
> I’m 100% in favor of), should we also consider a more formal designation
> for library-internal APIs.
>
> One thought I had was should we adopt a policy where all APIs in the
> `llvm` namespace are required to be annotated for export, but APIs in the
> `llvm::internal` namespace are not?
>

I think that this is likely far more heavy handed, but would absolutely
help to automate.  I think that if we adopt this and extend clang’s
compilation database, the attributes could be added as a post commit hook.

That said, I’m not sure that predicating the improvement for windows in
such a large change to llvm policy is something that’s entirely
reasonable.  I think that this happening over time and revisiting the
implantation subsequently is fine though.


> And the open questions:
>
> (1) Are there changes to the MSVC or Clang-CL toolchains that we could
> push for/make ourselves that would make this easier to maintain?
>

Some changes that clang could do to help with this is to introduce the
linker invocations into compile_commands.json.  Additionally, we would need
the module name for the output at compile time.

(2) Can we implement a clang-tidy check for however we want this to be
> done, and enable it as part of the LLVM clang-tidy configuration? (Surely
> the technical answer here is yes, it is just some amount of work)
>

I don’t think that there’s a good way to do this in reality.  The problem
is that you do not have an automated way to determine what is public and
what is not.  That said, I do have https://github.com/compnerd/ids to at
least help with the annotation.  It’s not complete and still would require
some further refinement.


> -Chris
>
> On Sep 9, 2021, at 5:15 PM, Peter Collingbourne via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> On Thu, Sep 9, 2021 at 11:51 AM Peter Collingbourne <peter at pcc.me.uk>
> wrote:
>
>>
>>
>> On Thu, Sep 9, 2021 at 9:38 AM Saleem Abdulrasool <compnerd at compnerd.org>
>> wrote:
>>
>>> On Wed, Sep 8, 2021 at 7:09 PM Peter Collingbourne <peter at pcc.me.uk>
>>> wrote:
>>>
>>>> Hi Saleem,
>>>>
>>>> I am concerned that your change will increase the maintenance burden
>>>> for those of us who would prefer to develop without shared libraries. Since
>>>> it is unclear a priori where the macros will be required, developers will
>>>> need to build both with and without shared libraries in order to verify
>>>> that they aren't breaking the build for shared library users -- in effect
>>>> slowing down the development for folks who prefer to develop without shared
>>>> libraries.
>>>>
>>>
>>> Failure to annotate the API wouldn’t break the build, it would mean that
>>> the API is not available.  Of there are no users of the API outside of the
>>> module, everything would continue to work.  It is if there are users of the
>>> API outside of the module that it matters.  However, that implicitly tells
>>> you what needs to be annotated apriori.
>>>
>>
>> It will break the build if I add code to a tool that calls an API that
>> isn't exported. Because of inlining etc it may not be obvious that a
>> particular API needs to be exported. Hence the need for two builds to check
>> for these problems.
>>
>>
>>> I think your goal should be achievable without littering the code with
>>>> macros.
>>>>
>>>
>>> In order to support that, we would need a secondary source of truth: a
>>> text file with the decorated names of any exported function.  Such a model
>>> IMO is far worse.  The name decoration scheme is not universal, and not in
>>> llvm’s control (Microsoft’s scheme is owned by Microsoft and is subject to
>>> change).  But yes, theoretically, an secondary source of truth could
>>> achieve this.
>>>
>>
>> This was not my proposal. The only exports would be:
>>
>> <tool name 1>_main
>> <tool name 2>_main
>> <tool name 3>_main
>> etc.
>>
>> And that can be very easily managed simply by exporting the *_main
>> functions, e.g. via dllexport.
>>
>>
>>>
>>> Perhaps on Windows you can achieve your goal with a variant of Leonard
>>>> Chan's "busybox" proposal [1] with some adjustments to account for a lack
>>>> of symlink support on Windows. Perhaps something like:
>>>>
>>>
>>> I’d like to be able to link this into server processes and tools with
>>> potential for dynamic loading.
>>>
>>
>> That seems a little too open ended, and at least has a higher
>> cost/benefit ratio than just solving the problem of 2GiB of bloat from
>> tools, which can be solved in a much less intrusive way than the export
>> macros.
>>
>>
>>>   Additionally, this would make execution of the tools significantly
>>> more expensive (which is also why I’m interested in a dual library
>>> approach).
>>>
>>
>> As long as the only exports are the *_main functions, the code in the
>> .dll would be basically the same as in the .exe, so I don't see how it
>> would be more expensive.
>>
>> If I’m mistaken about the multicall binary approach, perhaps we should be
>>> looking at removing the library options and replacing them with the
>>> multicall binary?
>>>
>>
>> Naively making it a multicall binary on Windows would hit the problem of
>> lack of reliable symlink support, hence the proposal to make the tools stub
>> .exes that just call into a .dll.
>>
>
> I discussed this with Saleem offline. Although I still think there is
> scope for exploring alternative approaches as described above, it seems
> neither of us are willing/have time to pursue it, so I won't stand in the
> way here.
>
> My concern remains that the rules for updating the annotations may be
> non-obvious at times. If the burden for updating the annotations were
> placed on those who care about the shared library builds, that may make
> things easier for day-to-day development. Perhaps one way to do that would
> be for the annotations to be considered "peripheral tier" in terms of our
> support policy, so that they aren't tracked by normal CI and only those who
> care about the shared library build are responsible for updating them.
>
> Peter
>
>>
>> Peter
>>
>>>
>>>
>>>> - Create a <tool name>_main() entry point for each tool that does not
>>>> use llvm::cl to parse options.
>>>> - Create a llvm.dll in the bin directory that links together all the
>>>> <tool name>_main() entry points.
>>>> - Each tool <tool name>.exe consists of:
>>>> int main() {
>>>>   <tool name>_main();
>>>> }
>>>> - Tools that use llvm::cl will need to be linked with all of their code
>>>> in the .exe for now. However, they can be incrementally switched away from
>>>> llvm::cl and moved into llvm.dll.
>>>>
>>>> Peter
>>>>
>>>> [1] https://lists.llvm.org/pipermail/llvm-dev/2021-June/151321.html
>>>>
>>>> On Wed, Sep 8, 2021 at 3:52 PM Saleem Abdulrasool via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hello llvm-dev,
>>>>>
>>>>> One of the current limitations on LLVM on Windows is that you cannot
>>>>> use LLVM_BUILD_LLVM_DYLIB:
>>>>> https://github.com/llvm/llvm-project/blob/main/llvm/tools/llvm-shlib/CMakeLists.txt#L14-L16
>>>>>   I am interested in trying to see if we can lift this limitation.
>>>>> There are others in the community that also seem to be interested in seeing
>>>>> LLVM being possible to use as a DLL on Windows and the topic does come up
>>>>> on the mailing lists every so often.
>>>>>
>>>>> When you build a distribution of a LLVM based toolchain currently, the
>>>>> result on Windows is ~2GiB for a trimmed down toolset.  This is largely due
>>>>> to the static linking used for all the tools.  I would like to be able to
>>>>> use the shared LLVM build for building a toolset on Windows.
>>>>>
>>>>> Unlike Unix platforms, the default on Windows is that all symbols are
>>>>> treated as `dso_local` (that is `-fvisibility-default=hidden`).  Symbols
>>>>> which are meant to participate in dynamic linking are to be attributed as
>>>>> `__declspec(dllexport)` in the module and `__declspec(dllimport)` external
>>>>> to the module.  This is similar to Unix platforms where
>>>>> `__attribute__((__visibility__(...)))` controls the same type of behaviour
>>>>> with `-fvisibility-default=hidden`.
>>>>>
>>>>> For the case of distributions, it would remain valuable to minimize
>>>>> the number of shared objects to reduce the files that require to be shipped
>>>>> but also to minimize the number of cross-module calls which are not
>>>>> entirely free (i.e. PLT+GOT or IAT costs).  At the same time, the number of
>>>>> possible labels which can be exposed from a single module on Windows is
>>>>> limited to 64K.  Experience from MSys2 indicates that LLVM with all the
>>>>> backends is likely to exceed this count (with a subset of targets, the
>>>>> number already is close to 60K).  This means that it may be that we would
>>>>> need two libraries on Windows.
>>>>>
>>>>> With the LLVM community being diverse, people often build on different
>>>>> platforms with different configurations, and I am concerned that adding
>>>>> more differences in how we build libraries complicates how maintainable
>>>>> LLVM is.  I would suggest that we actually change the behavior of the Unix
>>>>> builds to match that of Windows by building with
>>>>> `-fvisibility-default=hidden`.  Although this is a change, it is not
>>>>> without value.  By explicitly marking the interfaces which are vended by a
>>>>> library and making everything else internal, it does enable some potential
>>>>> optimization options for the compiler and linker (to be clear, I am not
>>>>> suggesting that this will have a guaranteed benefit, just that it can
>>>>> potentially enable additional opportunities for optimizations and size
>>>>> reductions).  This should incidentally help static linking.
>>>>>
>>>>> In order to achieve this, we would need to have a module specific
>>>>> annotation to indicate what symbols are meant to be used outside of the
>>>>> module when built in a shared configuration.  The same annotation would
>>>>> apply to all targets and is expected to be applied uniformly.  This of
>>>>> course has a cost associated with it: the public interfaces would need to
>>>>> be decorated appropriately.  However, by having the same behaviour on all
>>>>> the platforms, developers would not be impacted by the platform differences
>>>>> in their day-to-day development.  The only time that developers would need
>>>>> to be aware of this is when they are working on the module boundary, that
>>>>> is, changes which do not change the API surface of LLVM would not need to
>>>>> consider the annotations.
>>>>>
>>>>> Concretely, what I believe is required to enable building with
>>>>> LLVM_BUILD_LLVM_DYLIB on Windows is:
>>>>> - introduce module specific decoration (e.g. LLVM_SUPPORT_ABI, ...) to
>>>>> mark public interfaces of shared library modules
>>>>> - decorate all the public interfaces of the shared library modules
>>>>> with the new decoration
>>>>> - switching the builds to use `-fvisibility-default=hidden` by default
>>>>>
>>>>> I believe that these can be done mostly independently and staged in
>>>>> the order specified.  Until the last phase, it would have no actual impact
>>>>> on the builds.  However, by staging it, we could allow others to experiment
>>>>> with the option while it is under development, and allows for an easier
>>>>> path for switching the builds over.
>>>>>
>>>>> Although this would enable LLVM_BUILD_LLVM_DYLIB on Windows, give us
>>>>> better uniformity between Windows and non-Windows platforms, potentially
>>>>> enable additional optimization benefits, improve binary sizes for a
>>>>> distribution of the toolchain (though less on Linux where distributors are
>>>>> already using the build configuration ignoring the official suggestions in
>>>>> the LLVM guides), and help with runtime costs of the toolchain (by making
>>>>> the core of the tools a shared library, the backing pages can now be shared
>>>>> across multiple instances), it is not entirely without downsides.  The
>>>>> primary downsides that I see are:
>>>>> - it becomes less enticing to support both LLVM_BUILD_LLVM_DYLIB and
>>>>> BUILD_SHARED_LIBS: while technically possible, interfaces will need to be
>>>>> decorated for both forms of the build
>>>>> - LLVM_DYLIB_COMPONENTS becomes less tractable: in theory it is
>>>>> possible to apply enough CPP magic to determine where a symbol is homed,
>>>>> but allowing a symbol to be homed in a shared or static library is
>>>>> significantly more complex
>>>>> - BUILD_SHARED_LIBS becomes more expensive to maintain: the decoration
>>>>> is per-module, which requires that we would need to decorate the symbols of
>>>>> each module with module specific annotations as well
>>>>>
>>>>> One argument that people make for BUILD_SHARED_LIBS is that it reduces
>>>>> the overall time build-test cycle.  With the combination of lld, DWARF
>>>>> Fission, and LLVM_BUILD_LLVM_DYLIB, I believe that most of the benefits
>>>>> still can be had.  The cost of linking all the tools is amortized across
>>>>> the link of a single library, which while not as small as the a singular
>>>>> library, is offset by the following:
>>>>> - The LLVM_BUILD_LLVM_DYLIB would not require the re-linking of all
>>>>> the libraries for each tool.
>>>>> - DWARF Fission would avoid the need to relink all of the DWARF
>>>>> information.
>>>>> - lld is faster than the gold and bfd linkers
>>>>>
>>>>> Header changes would still ripple through the system as before,
>>>>> requiring rebuilding the transitive closure.  Source file changes do not
>>>>> have the same impact of course.
>>>>>
>>>>> For those would like a more concrete example of what a change like
>>>>> this may shape up into: https://reviews.llvm.org/D109192 contains
>>>>> `LLVMSupportExports.h` which has the expected structure for declaring the
>>>>> decoration macros with the rest of the change primarily being focused on
>>>>> applying the decoration.  Please ignore the CMake changes as they are there
>>>>> to ensure that the CI validates this without changing the configuration and
>>>>> not intended to be part of the final version of the change.
>>>>>
>>>>> --
>>>>> Saleem Abdulrasool
>>>>> compnerd (at) compnerd (dot) org
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Peter
>>>>
>>> --
>>> Saleem Abdulrasool
>>> compnerd (at) compnerd (dot) org
>>>
>>
>>
>> --
>> --
>> Peter
>>
>
>
> --
> --
> Peter
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> --
Saleem Abdulrasool
compnerd (at) compnerd (dot) org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210914/5c8c9d59/attachment-0001.html>


More information about the llvm-dev mailing list