[lld] [llvm] [LLD][COFF] Add more `--time-trace` tags for ThinLTO linking (PR #156471)

Wed Sep 3 08:51:16 PDT 2025

aganea wrote:

> Thanks for the PR, will take a look shortly. But some comments/questions on the description below.

Thank you in advance!

> > In order to better see what's going on during ThinLTO linking, this PR adds more profile tags when using `--time-trace` on a `lld-link.exe` invocation. I was trying to understand what was the long delay (not multithreaded) before the actual ThinLTO multithreaded opt/codegen -- it actually was the full LTO on the index.
> 
> Can you clarify what you mean by "full LTO" on the index? I assume you mean the thin link? "full LTO" typically refers to IR based LTO. But yes, the thin link which operates on the index is indeed a serial phase, and can be non-trivial for large applications.

Yes the ThinLink phase.

> We in fact have long used -disable-auto-upgrade-debug-info for our distributed ThinLTO backend compiles to avoid this overhead, but our build system ensures that the IR is all built from a consistent version of clang.

Great to know that you're using it!

> "regular LTO" typically means IR based LTO. Here again I assume you mean the thin link on the index?

Yes.

> Avoiding repeated metadata loading would save time, but presumably at the cost of (much?) higher peak memory. This isn't something we have looked at as we used distributed ThinLTO where the backends are completely separate processes.

We mostly do ThinLTO link locally, since our users are scattered throughout the globe and not all in a consistent physical location. Sending the artifacts (Bitcode .OBJs) to a cloud can be too long with varying ISP connection. If we had shells in the cloud for all our users, distributed ThinLTO would work, but in the game industry we don't work like that.

I'll take a look at how much extra memory the above metadata loading would take, if done upfront, and perhaps we could even gate it in front of a command-line flag. However if that saves 1-2 min on the whole link, that is quite significant for our iteration times.

https://github.com/llvm/llvm-project/pull/156471