[llvm-dev] RFC: Revisiting LLD-as-a-library design

Sat Jun 12 10:28:10 PDT 2021

Hello,

(David Blaikie)
> Reid - if you have any particular use case of your own in mind, or links to other discussions/users who are having friction with the current state of affairs, would be hand to have.

The topic came up last Thursday in the Windows call, please see notes in https://docs.google.com/document/d/1A-W0Sas_oHWTEl_x_djZYoRtzAdTONMW_6l1BH9G6Bo/

Speaking for Ubisoft, there’s a short term practical usage for us in llvm-buildozer<https://reviews.llvm.org/D86351>. Neil Henning from Unity 3D raised a similar need for the Burst compiler.

Since I’ve already went through these topics before, I had a list of practical things to achieve the LLD-as-a-lib goal:

  1.  The cl::opt data needs to live in a stack context, LLVMContext maybe.

     *   One pain point is usages of cl::location.

Folks use that kind of pattern so that the global state can be referenced from other TUs, without having to reference the cl::opt global directly.
For example:

bool polly::ModelReadOnlyScalars;

static cl::opt<bool, true> XModelReadOnlyScalars(
    "polly-analyze-read-only-scalars",
    cl::desc("Model read-only scalar values in the scop description"),
    cl::location(ModelReadOnlyScalars), cl::Hidden, cl::ZeroOrMore,
    cl::init(true), cl::cat(PollyCategory));

     *   Make the CommandLineParser’s cl::opt_storage data live in a stack- or heap-based context.

This is similar to point a. above, except that this applies to the implicit cl::opt state (when cl::location is omitted).
There’s a PoC in https://reviews.llvm.org/D86351, lib/Support/CommandLine.cpp, L556

The rationale here is to have the ability to call LLD-as-a-lib (or Clang-as-a-lib for instance, or any other LLVM tool) in the same way as we do on the command-line. Essentially calling into LLD main() but in-process. Like mentioned in https://reviews.llvm.org/D86351 one of our objectives is to pass a CDB .json to a tool (llvm-buildozer) and build in-process.

  1.  The targets registry takes some time to initialize, it wouldn’t be desirable to do it every time we call into LLD-as-a-lib.

With a few modifications the initialization can be made thread-safe, which is step towards making the LLD-as-a-lib entry point thread-safe.
See PoC in https://reviews.llvm.org/D86351, lib/Support/TargetRegistry.cpp

  2.  Move LLD global variables into (LLVM? LLD?) context

A first inception of this can be achieved manually, as in https://reviews.llvm.org/D86351, lld/COFF/*, see changes marked LLVM_THREAD_LOCAL, which obviously just a PoC. A better implementation would move these variables into a “LLD context”.
A more advanced version would automatize this somehow, at least warn developers that a global was introduced.

  3.  How do we handle memory allocations?

It seems the most sensible thing to do on the short term (when using LLD-as-a-lib) is to run with exitEarly = false , ie: https://github.com/llvm/llvm-project/blob/d480f968ad8b56d3ee4a6b6df5532d485b0ad01e/lld/include/lld/Common/ErrorHandler.h#L101

Later, a smarter way to get full performance would be to put all allocations into a BumpPtrAllocator-like for each LLD-as-a-lib call -- which would have the effect of exitEarly = true.

  4.  Externalizing ThreadPool

One thing that wasn’t mentioned in Reid’s initial post, is that we won’t be able to spawn ThreadPools anymore inside LLD, at least when used as-a-lib. For example, if we call several instances of LLD-as-a-lib, from multiple client threads, a ThreadPool needs to be provided externally, by the client application. Jobs spawned internally by each LLD instance would be queued on the client ThreadPool.
I don’t think this should be very hard to do, but a first iteration could (temporarily) disable multi-threading in LLD when calling it as-a-lib.

  5.  Returning Errors

I didn’t have any issues with that in the COFF driver, but the ELF driver for example has at least one place which throws a fatal() in a constructor. This leaves a half-initialized (pooled) object on the heap, which is later destructed when destructing the SpecificAlloc<>, which in turn can corrupt the heap. See description in https://reviews.llvm.org/rG45b8a741fbbf271e0fb71294cb7cdce3ad4b9bf3
This is the primary reason why “canRunAgain” exists along with “safeLldMain”, https://github.com/llvm/llvm-project/blob/03769d9308fee79aa97149561bdbb6e3263789bd/lld/tools/lld/lld.cpp#L185

I suppose we would need to come up with a way to bubble up errors in the driver, and only return them later when the LLD-as-a-lib call completes.

  6.  Implicit Windows Kernel states

One difficulty is that Windows stores a implicit CWD (current working directory) state for each process. When issuing Win32 API calls with relative paths, the NT Kernel would concatenate that with the internal CWD. Essentially that means we cannot pass relative paths anymore to Win32 APIs, all the paths have to be made absolute prior, and we have to store the CWD per “context”, per LLD-as-a-lib call.
This isn’t terribly complicated, but requires some piping, see https://reviews.llvm.org/D86351, changes in llvm/lib/Support/Windows/Path.inc
There could be additional issues like this, and on Linux as well.

  7.  Splitting the LLD pipeline

As mentioned in this thread, we could later have a C API to provide more granularity on the LLD pipeline. But on the short term I would leave that aside, until we have a working example with just one single LLD-as-a-lib entry point (that does the same as what lldmain() does today).

While we’re here, we had some other adjacent objectives with this work:

  1.  Being able to call Clang driver-as-a-lib in the same way as LLD

Again, the objective here was to have the strictly same behavior when calling Clang-as-a-lib, as when calling clang-cl on the command-line.

  2.  Cache file accesses & stat

We’ve seen some contention in the Windows kernel when accessing files. Being able to build in-process opens the door to sharing state between threads build different TUs, in the same way that clang-scan-deps does. There seems to be a FileSystemStatCache but it isn’t really used, anybody knows why? We could upstream a caching implementation for clang-scan-deps that could favor any other tools that do multithread in-process building.

  1.  LLD-as-a-DLL

One point that was raised recently is, being able to compile LLVM components as DLLs on Windows. This is all adjacent to LLD-as-a-lib, perhaps it isn’t desirable to always link statically LLD into the user’s application.

Does all this sound sensible? It would be nice to split the work between us, if possible. On the short term (next few weeks) I can work on 1. and 2.

Best,
Alex.

De : Reid Kleckner <rnk at google.com>
Envoyé : June 10, 2021 2:15 PM
À : llvm-dev <llvm-dev at lists.llvm.org>; Fangrui Song <maskray at google.com>; Sam Clegg <sbc at chromium.org>; Shoaib Meenai <smeenai at fb.com>; gkm at fb.com; jezng at fb.com; Alexandre Ganea <alexandre.ganea at ubisoft.com>; Martin Storsjö <martin at martin.st>
Objet : RFC: Revisiting LLD-as-a-library design

Hey all,

Long ago, the LLD project contributors decided that they weren't going to design LLD as a library, which stands in opposition to the way that the rest of LLVM strives to be a reusable library. Part of the reasoning was that, at the time, LLD wasn't done yet, and the top priority was to finish making LLD a fast, useful, usable product. If sacrificing reusability helped LLD achieve its project goals, the contributors at the time felt that was the right tradeoff, and that carried the day.

However, it is now ${YEAR} 2021, and I think we ought to reconsider this design decision. LLD was a great success: it works, it is fast, it is simple, many users have adopted it, it has many ports (COFF/ELF/mingw/wasm/new MachO). Today, we have actual users who want to run the linker as a library, and they aren't satisfied with the option of launching a child process. Some users are interested in process reuse as a performance optimization, some are including the linker in the frontend. Who knows. I try not to pre-judge any of these efforts, I think we should do what we can to enable experimentation.

So, concretely, what could change? The main points of reusability are:
- Fatal errors and warnings exit the process without returning control to the caller
- Conflicts over global variables between threads

Error recovery is the big imposition here. To avoid a giant rewrite of all error handling code in LLD, I think we should *avoid* returning failure via the llvm::Error class or std::error_code. We should instead use an approach more like clang, where diagnostics are delivered to a diagnostic consumer on the side. The success of the link is determined by whether any errors were reported. Functions may return a simple success boolean in cases where higher level functions need to exit early. This has worked reasonably well for clang. The main failure mode here is that we miss an error check, and crash or report useless follow-on errors after an error that would normally have been fatal.

Another motivation for all of this is increasing the use of parallelism in LLD. Emitting errors in parallel from threads and then exiting the process is risky business. A new diagnostic context or consumer could make this more reliable. MLIR has this issue as well, and I believe they use this pattern. They use some kind of thread shard index to order the diagnostics, LLD could do the same.

Finally, we'd work to eliminate globals. I think this is mainly a small matter of programming (SMOP) and doesn't need much discussion, although the `make` template presents interesting challenges.

Thoughts? Tomatoes? Flowers? I apologize for the lack of context links to the original discussions. It takes more time than I have to dig those up.

Reid
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210612/d2760507/attachment-0001.html>