[PATCH] D130229: [ELF] Add --thinlto-index= and --remapping-file=

Sat Jul 30 00:24:02 PDT 2022

MaskRay added a comment.

In D130229#3677889 <https://reviews.llvm.org/D130229#3677889>, @wenlei wrote:

> In D130229#3672462 <https://reviews.llvm.org/D130229#3672462>, @MaskRay wrote:
>
>> I think lots of concern may possibly be resolved when a build system actually implements this, or may be not.
>> In the release notes, I state "Experimental ``--thinlto-index=`` and ``--remapping-file=`` are added for distributed ThinLTO."
>> I am open to a change if this simple approach turns out be insufficient.
>
> Thorough experiment with coordinated build system changes is expensive. The motivating case was about consistency between thinlink and final link (also between local and dist thinlto) in the presence of native libs, and we know that the consistency has two aspects -- obj selection and ordering .
>
> I think this proposal not only doesn't solve the problem for native lazy objects, it also regresses current handling of lto lazy objects because the selection part done by thinlto-index-only is now missing.
>
> Knowing the selection part is unresolved, I think it's a stretch to ask people to do such experiments. And I'm also not sure if trial and error approach is a good one for this kind of problem..

I think it is worth having more understanding of how the current --thinlto-index-only= is insufficient and coming up better examples.
You have doubt. I had it, too. But I wasn't able to come up a convincing symbol resolution example that --thinlto-index= is insufficient.
I have verified --thinlto-index= with some very large internal ThinLTO builds.

If you want to back up the idea of serializing the archive member extraction result, please share concrete examples, not just theory that "it also regresses...".

Actually, I am going to argue that --thinlto-index= may solve the "libcall defined in bitcode archive" problem: a backend compile may introduce new libcall references which do not exist in the IR symbol table, and the bitcode archive may not be extracted.
Implicit ThinLTO currently works around it by forcing extraction of bitcode files containing referenced libcall symbols.
If we use --remapping-file=, we can trigger archive member extraction of such bitcode archives.
If we serialize the archive member extraction result and ignore non-extracted bitcode archive members, we'd still need the current workaround.

The current implicit ThinLTO takes the simplest approach by inserting lto.tmp at the end, breaking ordering.
I don't think the scheme is set in stone and changing distributed ThinLTO to behave like it is probably not the right direction.
On the other hand, it's non-trivial to fix its ordering. Therefore, I think making distributed ThinLTO and implicit ThinLTO have very similar symbol resolution behaviors is a stretch goal. If --thinlto-index= doesn't behave like implicit ThinLTO, I don't think it is a design flaw.

>> Corollary: I expect that a distributed ThinLTO user should fix certain duplicate definition issues (many are ODR violation). Some are fine, if the archive member extraction leverage is reasonable.
>
> Wei's example is one possible manifestation of the underlying issue -- doing symbol resolution on the same set of inputs pre-codegen and post-codegen just isn't going to be equivalent. That can lead different results from undefined symbol to multiple definition to initializer discrepancies.
>
> I hear you that there's no strict right vs wrong symbol resolution in some cases. But the consistency between local and dist thinLTO is important. Extending the current thinlto-index-only approach to cover native lazy objects is going to handle that better.
>
>> The alternative will be very intrusive and have substantial complexity. But for the sake of more stable distributed ThinLTO I am willing to pay the cost.
>
> For anyone with reasonably sized codebase on local ThinLTO, some stability between local vs dist ThinLTO is critical for the adoption of dist ThinLTO. This is essentially blocking us from adopting dist ThinLTO internally.

As mentioned above, please share a convincing example.
If I know one, as mentioned, I am happy to take the serialization approach, even if it will be an involved change.

> ---
>
> There's another aspect that didn't get discussed as much here. If you look through the reviews @tejohnson mentioned, there're back and forth discussions with @mehdi_amini around the design principle -- specifically whether we should let final link redo what's already done by thinlink. I agree with conclusion there that it's unnecessary, and for that reason, the thinlto-index-only approach we use today is also more reasonable. I don't think we should revert back that important and heavily discussed design decision lightly.
>
> Weighing everything, it looks sensible to me to absorb a bit of complexity so we can provide some stability between local and dist thinLTO, and avoid having final link repeat what's done already. At the very least, we should try extending thinlto-index-only and see if the concern around complexity is manageable.

All replied in previous paragraphs.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130229/new/

https://reviews.llvm.org/D130229