[llvm-dev] Supporting LLVM_BUILD_LLVM_DYLIB on Windows

Peter Collingbourne via llvm-dev llvm-dev at lists.llvm.org
Thu Sep 9 11:51:01 PDT 2021


On Thu, Sep 9, 2021 at 9:38 AM Saleem Abdulrasool <compnerd at compnerd.org>
wrote:

> On Wed, Sep 8, 2021 at 7:09 PM Peter Collingbourne <peter at pcc.me.uk>
> wrote:
>
>> Hi Saleem,
>>
>> I am concerned that your change will increase the maintenance burden for
>> those of us who would prefer to develop without shared libraries. Since it
>> is unclear a priori where the macros will be required, developers will need
>> to build both with and without shared libraries in order to verify that
>> they aren't breaking the build for shared library users -- in effect
>> slowing down the development for folks who prefer to develop without shared
>> libraries.
>>
>
> Failure to annotate the API wouldn’t break the build, it would mean that
> the API is not available.  Of there are no users of the API outside of the
> module, everything would continue to work.  It is if there are users of the
> API outside of the module that it matters.  However, that implicitly tells
> you what needs to be annotated apriori.
>

It will break the build if I add code to a tool that calls an API that
isn't exported. Because of inlining etc it may not be obvious that a
particular API needs to be exported. Hence the need for two builds to check
for these problems.


> I think your goal should be achievable without littering the code with
>> macros.
>>
>
> In order to support that, we would need a secondary source of truth: a
> text file with the decorated names of any exported function.  Such a model
> IMO is far worse.  The name decoration scheme is not universal, and not in
> llvm’s control (Microsoft’s scheme is owned by Microsoft and is subject to
> change).  But yes, theoretically, an secondary source of truth could
> achieve this.
>

This was not my proposal. The only exports would be:

<tool name 1>_main
<tool name 2>_main
<tool name 3>_main
etc.

And that can be very easily managed simply by exporting the *_main
functions, e.g. via dllexport.


>
> Perhaps on Windows you can achieve your goal with a variant of Leonard
>> Chan's "busybox" proposal [1] with some adjustments to account for a lack
>> of symlink support on Windows. Perhaps something like:
>>
>
> I’d like to be able to link this into server processes and tools with
> potential for dynamic loading.
>

That seems a little too open ended, and at least has a higher cost/benefit
ratio than just solving the problem of 2GiB of bloat from tools, which can
be solved in a much less intrusive way than the export macros.


>   Additionally, this would make execution of the tools significantly more
> expensive (which is also why I’m interested in a dual library approach).
>

As long as the only exports are the *_main functions, the code in the .dll
would be basically the same as in the .exe, so I don't see how it would be
more expensive.

If I’m mistaken about the multicall binary approach, perhaps we should be
> looking at removing the library options and replacing them with the
> multicall binary?
>

Naively making it a multicall binary on Windows would hit the problem of
lack of reliable symlink support, hence the proposal to make the tools stub
.exes that just call into a .dll.

Peter

>
>
>> - Create a <tool name>_main() entry point for each tool that does not use
>> llvm::cl to parse options.
>> - Create a llvm.dll in the bin directory that links together all the
>> <tool name>_main() entry points.
>> - Each tool <tool name>.exe consists of:
>> int main() {
>>   <tool name>_main();
>> }
>> - Tools that use llvm::cl will need to be linked with all of their code
>> in the .exe for now. However, they can be incrementally switched away from
>> llvm::cl and moved into llvm.dll.
>>
>> Peter
>>
>> [1] https://lists.llvm.org/pipermail/llvm-dev/2021-June/151321.html
>>
>> On Wed, Sep 8, 2021 at 3:52 PM Saleem Abdulrasool via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hello llvm-dev,
>>>
>>> One of the current limitations on LLVM on Windows is that you cannot use
>>> LLVM_BUILD_LLVM_DYLIB:
>>> https://github.com/llvm/llvm-project/blob/main/llvm/tools/llvm-shlib/CMakeLists.txt#L14-L16
>>>  I am interested in trying to see if we can lift this limitation.  There
>>> are others in the community that also seem to be interested in seeing LLVM
>>> being possible to use as a DLL on Windows and the topic does come up on the
>>> mailing lists every so often.
>>>
>>> When you build a distribution of a LLVM based toolchain currently, the
>>> result on Windows is ~2GiB for a trimmed down toolset.  This is largely due
>>> to the static linking used for all the tools.  I would like to be able to
>>> use the shared LLVM build for building a toolset on Windows.
>>>
>>> Unlike Unix platforms, the default on Windows is that all symbols are
>>> treated as `dso_local` (that is `-fvisibility-default=hidden`).  Symbols
>>> which are meant to participate in dynamic linking are to be attributed as
>>> `__declspec(dllexport)` in the module and `__declspec(dllimport)` external
>>> to the module.  This is similar to Unix platforms where
>>> `__attribute__((__visibility__(...)))` controls the same type of behaviour
>>> with `-fvisibility-default=hidden`.
>>>
>>> For the case of distributions, it would remain valuable to minimize the
>>> number of shared objects to reduce the files that require to be shipped but
>>> also to minimize the number of cross-module calls which are not entirely
>>> free (i.e. PLT+GOT or IAT costs).  At the same time, the number of possible
>>> labels which can be exposed from a single module on Windows is limited to
>>> 64K.  Experience from MSys2 indicates that LLVM with all the backends is
>>> likely to exceed this count (with a subset of targets, the number already
>>> is close to 60K).  This means that it may be that we would need two
>>> libraries on Windows.
>>>
>>> With the LLVM community being diverse, people often build on different
>>> platforms with different configurations, and I am concerned that adding
>>> more differences in how we build libraries complicates how maintainable
>>> LLVM is.  I would suggest that we actually change the behavior of the Unix
>>> builds to match that of Windows by building with
>>> `-fvisibility-default=hidden`.  Although this is a change, it is not
>>> without value.  By explicitly marking the interfaces which are vended by a
>>> library and making everything else internal, it does enable some potential
>>> optimization options for the compiler and linker (to be clear, I am not
>>> suggesting that this will have a guaranteed benefit, just that it can
>>> potentially enable additional opportunities for optimizations and size
>>> reductions).  This should incidentally help static linking.
>>>
>>> In order to achieve this, we would need to have a module specific
>>> annotation to indicate what symbols are meant to be used outside of the
>>> module when built in a shared configuration.  The same annotation would
>>> apply to all targets and is expected to be applied uniformly.  This of
>>> course has a cost associated with it: the public interfaces would need to
>>> be decorated appropriately.  However, by having the same behaviour on all
>>> the platforms, developers would not be impacted by the platform differences
>>> in their day-to-day development.  The only time that developers would need
>>> to be aware of this is when they are working on the module boundary, that
>>> is, changes which do not change the API surface of LLVM would not need to
>>> consider the annotations.
>>>
>>> Concretely, what I believe is required to enable building with
>>> LLVM_BUILD_LLVM_DYLIB on Windows is:
>>> - introduce module specific decoration (e.g. LLVM_SUPPORT_ABI, ...) to
>>> mark public interfaces of shared library modules
>>> - decorate all the public interfaces of the shared library modules with
>>> the new decoration
>>> - switching the builds to use `-fvisibility-default=hidden` by default
>>>
>>> I believe that these can be done mostly independently and staged in the
>>> order specified.  Until the last phase, it would have no actual impact on
>>> the builds.  However, by staging it, we could allow others to experiment
>>> with the option while it is under development, and allows for an easier
>>> path for switching the builds over.
>>>
>>> Although this would enable LLVM_BUILD_LLVM_DYLIB on Windows, give us
>>> better uniformity between Windows and non-Windows platforms, potentially
>>> enable additional optimization benefits, improve binary sizes for a
>>> distribution of the toolchain (though less on Linux where distributors are
>>> already using the build configuration ignoring the official suggestions in
>>> the LLVM guides), and help with runtime costs of the toolchain (by making
>>> the core of the tools a shared library, the backing pages can now be shared
>>> across multiple instances), it is not entirely without downsides.  The
>>> primary downsides that I see are:
>>> - it becomes less enticing to support both LLVM_BUILD_LLVM_DYLIB and
>>> BUILD_SHARED_LIBS: while technically possible, interfaces will need to be
>>> decorated for both forms of the build
>>> - LLVM_DYLIB_COMPONENTS becomes less tractable: in theory it is possible
>>> to apply enough CPP magic to determine where a symbol is homed, but
>>> allowing a symbol to be homed in a shared or static library is
>>> significantly more complex
>>> - BUILD_SHARED_LIBS becomes more expensive to maintain: the decoration
>>> is per-module, which requires that we would need to decorate the symbols of
>>> each module with module specific annotations as well
>>>
>>> One argument that people make for BUILD_SHARED_LIBS is that it reduces
>>> the overall time build-test cycle.  With the combination of lld, DWARF
>>> Fission, and LLVM_BUILD_LLVM_DYLIB, I believe that most of the benefits
>>> still can be had.  The cost of linking all the tools is amortized across
>>> the link of a single library, which while not as small as the a singular
>>> library, is offset by the following:
>>> - The LLVM_BUILD_LLVM_DYLIB would not require the re-linking of all the
>>> libraries for each tool.
>>> - DWARF Fission would avoid the need to relink all of the DWARF
>>> information.
>>> - lld is faster than the gold and bfd linkers
>>>
>>> Header changes would still ripple through the system as before,
>>> requiring rebuilding the transitive closure.  Source file changes do not
>>> have the same impact of course.
>>>
>>> For those would like a more concrete example of what a change like this
>>> may shape up into: https://reviews.llvm.org/D109192 contains
>>> `LLVMSupportExports.h` which has the expected structure for declaring the
>>> decoration macros with the rest of the change primarily being focused on
>>> applying the decoration.  Please ignore the CMake changes as they are there
>>> to ensure that the CI validates this without changing the configuration and
>>> not intended to be part of the final version of the change.
>>>
>>> --
>>> Saleem Abdulrasool
>>> compnerd (at) compnerd (dot) org
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>
>> --
>> --
>> Peter
>>
> --
> Saleem Abdulrasool
> compnerd (at) compnerd (dot) org
>


-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210909/066cf7d6/attachment.html>


More information about the llvm-dev mailing list