[llvm-dev] Supporting LLVM_BUILD_LLVM_DYLIB on Windows

Tue Sep 14 14:16:16 PDT 2021

On Tue, Sep 14, 2021 at 1:44 PM Chris Bieneman <chris.bieneman at me.com>
wrote:

>
> On Sep 14, 2021, at 3:10 PM, Saleem Abdulrasool <compnerd at compnerd.org>
> wrote:
>
> 
> On Mon, Sep 13, 2021 at 11:44 AM Chris Bieneman <chris.bieneman at me.com>
> wrote:
>
>> I had a thought and two questions related to this.
>>
>> The thought:
>>
>> We informally have three types of APIs in LLVM components. We have
>> stable-ish C APIs, unstable C++ APIs that are expected to be used outside
>> the component and unstable C++ APIs that are internal to the component.
>>
>> In some (but not all) cases we make a technical distinction between
>> internal to the component APIs by putting the headers for those APIs in the
>> lib folder beside the implementation.
>>
>> If we’re talking about annotating APIs for symbol export control (which
>> I’m 100% in favor of), should we also consider a more formal designation
>> for library-internal APIs.
>>
>> One thought I had was should we adopt a policy where all APIs in the
>> `llvm` namespace are required to be annotated for export, but APIs in the
>> `llvm::internal` namespace are not?
>>
>
> I think that this is likely far more heavy handed, but would absolutely
> help to automate.  I think that if we adopt this and extend clang’s
> compilation database, the attributes could be added as a post commit hook.
>
> That said, I’m not sure that predicating the improvement for windows in
> such a large change to llvm policy is something that’s entirely
> reasonable.  I think that this happening over time and revisiting the
> implantation subsequently is fine though.
>
>
> Fair enough.
>
>
>
>> And the open questions:
>>
>> (1) Are there changes to the MSVC or Clang-CL toolchains that we could
>> push for/make ourselves that would make this easier to maintain?
>>
>
> Some changes that clang could do to help with this is to introduce the
> linker invocations into compile_commands.json.  Additionally, we would need
> the module name for the output at compile time.
>
>
> That would be a very interesting enhancement.
>
>
> (2) Can we implement a clang-tidy check for however we want this to be
>> done, and enable it as part of the LLVM clang-tidy configuration? (Surely
>> the technical answer here is yes, it is just some amount of work)
>>
>
> I don’t think that there’s a good way to do this in reality.  The problem
> is that you do not have an automated way to determine what is public and
> what is not.  That said, I do have https://github.com/compnerd/ids to at
> least help with the annotation.  It’s not complete and still would require
> some further refinement.
>
>
> I actually disagree here. We do have an automated way to determine what is
> currently public and what is not, although we may have an overly broad
> definition of public. Today, any symbol declared under the `include`
> directory for a project is public. Whether or not it _should_ be public is
> a different issue. We currently treat all of those symbols as public
> exports from component libraries.
>

Then we don’t really disagree :). We can automate the automation, it just
assumes everything is public.

That is not a very good approach, but it is what I am currently proposing.
This is maintaining the status quo with the other platforms.  I simply see
it as a means to something working rather than the desired state.  That is,
it gets us to an intermediate stage from where we should further refine the
implementation.

> -Chris
>
>
>
>> -Chris
>>
>> On Sep 9, 2021, at 5:15 PM, Peter Collingbourne via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> On Thu, Sep 9, 2021 at 11:51 AM Peter Collingbourne <peter at pcc.me.uk>
>> wrote:
>>
>>>
>>>
>>> On Thu, Sep 9, 2021 at 9:38 AM Saleem Abdulrasool <compnerd at compnerd.org>
>>> wrote:
>>>
>>>> On Wed, Sep 8, 2021 at 7:09 PM Peter Collingbourne <peter at pcc.me.uk>
>>>> wrote:
>>>>
>>>>> Hi Saleem,
>>>>>
>>>>> I am concerned that your change will increase the maintenance burden
>>>>> for those of us who would prefer to develop without shared libraries. Since
>>>>> it is unclear a priori where the macros will be required, developers will
>>>>> need to build both with and without shared libraries in order to verify
>>>>> that they aren't breaking the build for shared library users -- in effect
>>>>> slowing down the development for folks who prefer to develop without shared
>>>>> libraries.
>>>>>
>>>>
>>>> Failure to annotate the API wouldn’t break the build, it would mean
>>>> that the API is not available.  Of there are no users of the API outside of
>>>> the module, everything would continue to work.  It is if there are users of
>>>> the API outside of the module that it matters.  However, that implicitly
>>>> tells you what needs to be annotated apriori.
>>>>
>>>
>>> It will break the build if I add code to a tool that calls an API that
>>> isn't exported. Because of inlining etc it may not be obvious that a
>>> particular API needs to be exported. Hence the need for two builds to check
>>> for these problems.
>>>
>>>
>>>> I think your goal should be achievable without littering the code with
>>>>> macros.
>>>>>
>>>>
>>>> In order to support that, we would need a secondary source of truth: a
>>>> text file with the decorated names of any exported function.  Such a model
>>>> IMO is far worse.  The name decoration scheme is not universal, and not in
>>>> llvm’s control (Microsoft’s scheme is owned by Microsoft and is subject to
>>>> change).  But yes, theoretically, an secondary source of truth could
>>>> achieve this.
>>>>
>>>
>>> This was not my proposal. The only exports would be:
>>>
>>> <tool name 1>_main
>>> <tool name 2>_main
>>> <tool name 3>_main
>>> etc.
>>>
>>> And that can be very easily managed simply by exporting the *_main
>>> functions, e.g. via dllexport.
>>>
>>>
>>>>
>>>> Perhaps on Windows you can achieve your goal with a variant of Leonard
>>>>> Chan's "busybox" proposal [1] with some adjustments to account for a lack
>>>>> of symlink support on Windows. Perhaps something like:
>>>>>
>>>>
>>>> I’d like to be able to link this into server processes and tools with
>>>> potential for dynamic loading.
>>>>
>>>
>>> That seems a little too open ended, and at least has a higher
>>> cost/benefit ratio than just solving the problem of 2GiB of bloat from
>>> tools, which can be solved in a much less intrusive way than the export
>>> macros.
>>>
>>>
>>>>   Additionally, this would make execution of the tools significantly
>>>> more expensive (which is also why I’m interested in a dual library
>>>> approach).
>>>>
>>>
>>> As long as the only exports are the *_main functions, the code in the
>>> .dll would be basically the same as in the .exe, so I don't see how it
>>> would be more expensive.
>>>
>>> If I’m mistaken about the multicall binary approach, perhaps we should
>>>> be looking at removing the library options and replacing them with the
>>>> multicall binary?
>>>>
>>>
>>> Naively making it a multicall binary on Windows would hit the problem of
>>> lack of reliable symlink support, hence the proposal to make the tools stub
>>> .exes that just call into a .dll.
>>>
>>
>> I discussed this with Saleem offline. Although I still think there is
>> scope for exploring alternative approaches as described above, it seems
>> neither of us are willing/have time to pursue it, so I won't stand in the
>> way here.
>>
>> My concern remains that the rules for updating the annotations may be
>> non-obvious at times. If the burden for updating the annotations were
>> placed on those who care about the shared library builds, that may make
>> things easier for day-to-day development. Perhaps one way to do that would
>> be for the annotations to be considered "peripheral tier" in terms of our
>> support policy, so that they aren't tracked by normal CI and only those who
>> care about the shared library build are responsible for updating them.
>>
>> Peter
>>
>>>
>>> Peter
>>>
>>>>
>>>>
>>>>> - Create a <tool name>_main() entry point for each tool that does not
>>>>> use llvm::cl to parse options.
>>>>> - Create a llvm.dll in the bin directory that links together all the
>>>>> <tool name>_main() entry points.
>>>>> - Each tool <tool name>.exe consists of:
>>>>> int main() {
>>>>>   <tool name>_main();
>>>>> }
>>>>> - Tools that use llvm::cl will need to be linked with all of
>>>>> their code in the .exe for now. However, they can be incrementally switched
>>>>> away from llvm::cl and moved into llvm.dll.
>>>>>
>>>>> Peter
>>>>>
>>>>> [1] https://lists.llvm.org/pipermail/llvm-dev/2021-June/151321.html
>>>>>
>>>>> On Wed, Sep 8, 2021 at 3:52 PM Saleem Abdulrasool via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hello llvm-dev,
>>>>>>
>>>>>> One of the current limitations on LLVM on Windows is that you cannot
>>>>>> use LLVM_BUILD_LLVM_DYLIB:
>>>>>> https://github.com/llvm/llvm-project/blob/main/llvm/tools/llvm-shlib/CMakeLists.txt#L14-L16
>>>>>>   I am interested in trying to see if we can lift this limitation.
>>>>>> There are others in the community that also seem to be interested in seeing
>>>>>> LLVM being possible to use as a DLL on Windows and the topic does come up
>>>>>> on the mailing lists every so often.
>>>>>>
>>>>>> When you build a distribution of a LLVM based toolchain currently,
>>>>>> the result on Windows is ~2GiB for a trimmed down toolset.  This is largely
>>>>>> due to the static linking used for all the tools.  I would like to be able
>>>>>> to use the shared LLVM build for building a toolset on Windows.
>>>>>>
>>>>>> Unlike Unix platforms, the default on Windows is that all symbols are
>>>>>> treated as `dso_local` (that is `-fvisibility-default=hidden`).  Symbols
>>>>>> which are meant to participate in dynamic linking are to be attributed as
>>>>>> `__declspec(dllexport)` in the module and `__declspec(dllimport)` external
>>>>>> to the module.  This is similar to Unix platforms where
>>>>>> `__attribute__((__visibility__(...)))` controls the same type of behaviour
>>>>>> with `-fvisibility-default=hidden`.
>>>>>>
>>>>>> For the case of distributions, it would remain valuable to minimize
>>>>>> the number of shared objects to reduce the files that require to be shipped
>>>>>> but also to minimize the number of cross-module calls which are not
>>>>>> entirely free (i.e. PLT+GOT or IAT costs).  At the same time, the number of
>>>>>> possible labels which can be exposed from a single module on Windows is
>>>>>> limited to 64K.  Experience from MSys2 indicates that LLVM with all the
>>>>>> backends is likely to exceed this count (with a subset of targets, the
>>>>>> number already is close to 60K).  This means that it may be that we would
>>>>>> need two libraries on Windows.
>>>>>>
>>>>>> With the LLVM community being diverse, people often build on
>>>>>> different platforms with different configurations, and I am concerned that
>>>>>> adding more differences in how we build libraries complicates how
>>>>>> maintainable LLVM is.  I would suggest that we actually change the behavior
>>>>>> of the Unix builds to match that of Windows by building with
>>>>>> `-fvisibility-default=hidden`.  Although this is a change, it is not
>>>>>> without value.  By explicitly marking the interfaces which are vended by a
>>>>>> library and making everything else internal, it does enable some potential
>>>>>> optimization options for the compiler and linker (to be clear, I am not
>>>>>> suggesting that this will have a guaranteed benefit, just that it can
>>>>>> potentially enable additional opportunities for optimizations and size
>>>>>> reductions).  This should incidentally help static linking.
>>>>>>
>>>>>> In order to achieve this, we would need to have a module specific
>>>>>> annotation to indicate what symbols are meant to be used outside of the
>>>>>> module when built in a shared configuration.  The same annotation would
>>>>>> apply to all targets and is expected to be applied uniformly.  This of
>>>>>> course has a cost associated with it: the public interfaces would need to
>>>>>> be decorated appropriately.  However, by having the same behaviour on all
>>>>>> the platforms, developers would not be impacted by the platform differences
>>>>>> in their day-to-day development.  The only time that developers would need
>>>>>> to be aware of this is when they are working on the module boundary, that
>>>>>> is, changes which do not change the API surface of LLVM would not need to
>>>>>> consider the annotations.
>>>>>>
>>>>>> Concretely, what I believe is required to enable building with
>>>>>> LLVM_BUILD_LLVM_DYLIB on Windows is:
>>>>>> - introduce module specific decoration (e.g. LLVM_SUPPORT_ABI, ...)
>>>>>> to mark public interfaces of shared library modules
>>>>>> - decorate all the public interfaces of the shared library modules
>>>>>> with the new decoration
>>>>>> - switching the builds to use `-fvisibility-default=hidden` by default
>>>>>>
>>>>>> I believe that these can be done mostly independently and staged in
>>>>>> the order specified.  Until the last phase, it would have no actual impact
>>>>>> on the builds.  However, by staging it, we could allow others to experiment
>>>>>> with the option while it is under development, and allows for an easier
>>>>>> path for switching the builds over.
>>>>>>
>>>>>> Although this would enable LLVM_BUILD_LLVM_DYLIB on Windows, give us
>>>>>> better uniformity between Windows and non-Windows platforms, potentially
>>>>>> enable additional optimization benefits, improve binary sizes for a
>>>>>> distribution of the toolchain (though less on Linux where distributors are
>>>>>> already using the build configuration ignoring the official suggestions in
>>>>>> the LLVM guides), and help with runtime costs of the toolchain (by making
>>>>>> the core of the tools a shared library, the backing pages can now be shared
>>>>>> across multiple instances), it is not entirely without downsides.  The
>>>>>> primary downsides that I see are:
>>>>>> - it becomes less enticing to support both LLVM_BUILD_LLVM_DYLIB and
>>>>>> BUILD_SHARED_LIBS: while technically possible, interfaces will need to be
>>>>>> decorated for both forms of the build
>>>>>> - LLVM_DYLIB_COMPONENTS becomes less tractable: in theory it is
>>>>>> possible to apply enough CPP magic to determine where a symbol is homed,
>>>>>> but allowing a symbol to be homed in a shared or static library is
>>>>>> significantly more complex
>>>>>> - BUILD_SHARED_LIBS becomes more expensive to maintain: the
>>>>>> decoration is per-module, which requires that we would need to decorate the
>>>>>> symbols of each module with module specific annotations as well
>>>>>>
>>>>>> One argument that people make for BUILD_SHARED_LIBS is that it
>>>>>> reduces the overall time build-test cycle.  With the combination of lld,
>>>>>> DWARF Fission, and LLVM_BUILD_LLVM_DYLIB, I believe that most of the
>>>>>> benefits still can be had.  The cost of linking all the tools is amortized
>>>>>> across the link of a single library, which while not as small as the a
>>>>>> singular library, is offset by the following:
>>>>>> - The LLVM_BUILD_LLVM_DYLIB would not require the re-linking of all
>>>>>> the libraries for each tool.
>>>>>> - DWARF Fission would avoid the need to relink all of the DWARF
>>>>>> information.
>>>>>> - lld is faster than the gold and bfd linkers
>>>>>>
>>>>>> Header changes would still ripple through the system as before,
>>>>>> requiring rebuilding the transitive closure.  Source file changes do not
>>>>>> have the same impact of course.
>>>>>>
>>>>>> For those would like a more concrete example of what a change like
>>>>>> this may shape up into: https://reviews.llvm.org/D109192 contains
>>>>>> `LLVMSupportExports.h` which has the expected structure for declaring the
>>>>>> decoration macros with the rest of the change primarily being focused on
>>>>>> applying the decoration.  Please ignore the CMake changes as they are there
>>>>>> to ensure that the CI validates this without changing the configuration and
>>>>>> not intended to be part of the final version of the change.
>>>>>>
>>>>>> --
>>>>>> Saleem Abdulrasool
>>>>>> compnerd (at) compnerd (dot) org
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Peter
>>>>>
>>>> --
>>>> Saleem Abdulrasool
>>>> compnerd (at) compnerd (dot) org
>>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>>
>> --
>> --
>> Peter
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> --
> Saleem Abdulrasool
> compnerd (at) compnerd (dot) org
>
> --
Saleem Abdulrasool
compnerd (at) compnerd (dot) org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210914/c4b0e4df/attachment.html>