[PATCH] D136701: [LinkerWrapper] Perform device linking steps in parallel

Joseph Huber via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Oct 25 14:03:26 PDT 2022


jhuber6 added a comment.

In D136701#3883416 <https://reviews.llvm.org/D136701#3883416>, @tra wrote:

> In D136701#3883300 <https://reviews.llvm.org/D136701#3883300>, @jhuber6 wrote:
>
>> However, as an opt-in feature it would be very helpful in some cases.
>
> I'm OK with the explicit opt-in.

Might be good to start with it as opt-in for this patch and we can discuss defaults later, I'll make that change.

>> Like consider someone creating a static library that supports every GPU architecture LLVM supports, it would be nice to be able to optionally turn on parallelism in the driver.
>
> Yes, but the implicit assumption here is that you have sufficient resources. If you create N libraries, each for M architectures, your build machine may not have enough memory for N*M linkers.
> Having N*M processes may or may not be an issue, but if each of the linkers is an `lld` which may want to run their own K parallel threads, it would not help anything.

That's true, `AMDGPU` uses `lld` as its linker so we would be invoking a potentially parallel link step in multiple threads. I'm not sure how much this could impact

> In other words, I agree that it may be helpful in some cases, but I can also see how it may actually hurt the build, possibly catastrophically.
>
> My point is that grabbing resources will likely break build system's assumptions about their availability. How that would affect the build is anyone's guess. With infinite resources, parallel-everything would win, but in practice it's a big maybe. It would likely be a win for small builds and probably would be a wash or a regression for a larger build with multiple such targets.
>
> Ideally it would be great if there would be a way to cooperate with the build system and let it manage the scheduling, but I don't think we have a good way of doing that. 
> E.g. for CUDA compilation I was thinking of exposing per-GPU sub-compilations (well, we already do with --cuda-device-only/--cuda-device-only) and providing a way to create  combined object from them, and then let the build system manage how those per-GPU compilations would be launched. The problem there is that the build system would need to know our under-the-hood implementation details, so such an approach will be very fragile. The way the new driver does things may be a bit more suitable for this, but I suspect it would still be hard to do.
>
>> `lld` already uses all available threads for its parallel linking, the linker wrapper runs before the host linker invocation so it shouldn't interfere either.
>
> You do have a point here. As long as we don't end up with too many threads (e.g. we guarantee that per-offload linker instance does not run their own parallel threads, offload linking may be similar to parallel lld.

AMDGPU calls `lld` for its device linking stage, as LTO has `thin-lto` which could potentually use more threads. But generally I think the amount of threads required for device linking will probably be small and will always be beneficial compared to running them sequentially.

>> This is only non-deterministic for the order of linking jobs between several targets and architectures. If the user only links a single architecture it should behave as before.
>
> I'm not sure what you mean. Are you saying that linking with `--offload-arch=gfx700` is repeatable, but with `--offload-arch=gfx700,gfx701` it's not? That would still be a problem.
>
>> The average case is still probably going to be one or two architectures at once, in which case this change won't make much of a difference.
>
> Any difference is a difference, as far as content-based caching and provenance tracking is concerned.

This does bring up a good point. The linked output is going to be entered into an arbitrary order now. We should probably sort it by some metric at least. Otherwise we'd have the same binary creating different images depending on this. I'll also make that change.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136701/new/

https://reviews.llvm.org/D136701



More information about the cfe-commits mailing list