[PATCH] D130229: [ELF] Add --thinlto-index= and --remapping-file=

Fangrui Song via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Aug 20 15:26:12 PDT 2022


MaskRay added a comment.

In D130229#3737185 <https://reviews.llvm.org/D130229#3737185>, @tejohnson wrote:

> In D130229#3736690 <https://reviews.llvm.org/D130229#3736690>, @MaskRay wrote:
>
>> In D130229#3736491 <https://reviews.llvm.org/D130229#3736491>, @tejohnson wrote:
>>
>>> Revisiting this as it doesn't look like we have a resolution. I went through the example and discussion here and have some comments and questions below:
>>
>> Thanks for reviving the discussion.
>>
>>> Regarding the example from @weiwang, I think the most ideal situation is to have ThinLTO (in process or distributed) match the behavior of non-LTO. For that I see
>>>
>>>   $ clang++ -c -O2 a.cpp b.cpp c.cpp
>>>   $ clang++ -fuse-ld=lld -Wl,--start-lib a.o -Wl,--end-lib -Wl,--start-lib b.o -Wl,--end-lib -Wl,--start-lib c.o -Wl,--end-lib 
>>>   $ a.out
>>>   $ echo $?
>>>   1
>>>
>>> So that would seem to be consistent with what was happening for in-process ThinLTO. I applied the patch and also see 0 printed in the distributed case with the new map approach, although fyi the command line used below is not correct:
>>>
>>> In D130229#3673200 <https://reviews.llvm.org/D130229#3673200>, @weiwang wrote:
>>>
>>>> dist_thinlto with --thinlto-index
>>>> =================================
>>>>
>>>>> clang++ -flto=thin -fuse-ld=lld -Wl,--thinlto-index=thinlto.map,--thinlto-emit-imports-files,--thinlto-object-suffix-replace='.o;.native.o' -Wl,--start-lib a.o -Wl,--end-lib -Wl,--start-lib b.o -Wl,--end-lib -Wl,--start-lib c.o -Wl,--end-lib
>>>
>>> You don't want to use suffix replace, that affects where the importing looks for bitcode files, you'll need to use prefix-replace instead, see my modified commands further below.
>>>
>>>>> cat thinlto.map
>>>>
>>>> c.o c.native.o
>>>> b.o b.native.o
>>>>
>>>>> clang++ -O2 -c -x ir b.o -fthinlto-index=b.o.thinlto.bc -o b.native.o
>>>>> clang++ -O2 -c -x ir c.o -fthinlto-index=c.o.thinlto.bc -o c.native.o
>>>>
>>>>
>>>>
>>>>> clang++ -fuse-ld=lld -Wl,--remapping-file=thinlto.map -Wl,--start-lib a.o -Wl,--end-lib -Wl,--start-lib b.o -Wl,--end-lib -Wl,--start-lib c.o -Wl,--end-lib -o dist2
>>>>> ./dist2
>>>>> echo $?
>>>>
>>>> 0
>>>>
>>>>   
>>>
>>> Here's what I did to test:
>>>
>>>   $ clang++ -flto=thin -fuse-ld=lld -Wl,--thinlto-index=thinlto.map,--thinlto-emit-imports-files,--thinlto-prefix-replace=';native_' -Wl,--start-lib a.o -Wl,--end-lib -Wl,--start-lib b.o -Wl,--end-lib -Wl,--start-lib c.o -Wl,--end-lib
>>>   $ clang++ -O2 -c -x ir b.o -fthinlto-index=native_b.o.thinlto.bc -o native_b.o
>>>   $ clang++ -O2 -c -x ir c.o -fthinlto-index=native_c.o.thinlto.bc -o native_c.o
>>>   $ clang++ -flto=thin -fuse-ld=lld -Wl,--remapping-file=thinlto.map -Wl,--start-lib a.o -Wl,--end-lib -Wl,--start-lib b.o -Wl,--end-lib -Wl,--start-lib c.o -Wl,--end-lib
>>>   $ a.out
>>>   $ echo $?
>>>   0
>>>
>>> It looks like the reason for the difference is that f() is inlined in native_c.o in the distributed ThinLTO case, and so instead of looking for f() first, finding it in b.o and then using b.o's def of g(), in this case it only looks for g() and it finds it in a.o. This is a variant of other cases I had looked at which motivated the original solution, although in our case because almost everything is bitcode by default in a ThinLTO build the solution listing bitcode files was sufficient.
>>
>> I consider this use case invalid (https://reviews.llvm.org/D130229#3674024) because it does not properly use archive semantics to shadow an archive.
>> Note: If done properly, using archive semantics is a reasonable extension to allow duplicate definitions.
>
> Does this violate a language standard and/or is it explicitly considered undefined behavior?
>
> I construct a similarly behaving example where g() is a weak symbol if that matters.

C++ https://en.cppreference.com/w/cpp/language/definition#One_Definition_Rule . In practice this is loosened by allowing some intercepting. I have some arguments in this thread

> If we consider a.o an interceptor, it fails the requirement "The defined symbols of the interceptor should be a superset of the archive member being shadowed." for reliable archive member extraction: https://maskray.me/blog/2021-06-20-symbol-processing#relocatable-object-file-suppressing-archive-member-extraction

and in https://reviews.llvm.org/D130975#3694954

>>> I'm wondering if it would it be possible to do a solution that is intermediate between this and D130975 <https://reviews.llvm.org/D130975>. Specifically, in this patch in order to avoid selecting lazy objects that were originally bitcode in the thin link, the remapping file maps them to /dev/null. Can that same solution be done for any lazy native objects from the thin link (list them in the remapping file as mapped to /dev/null)? We wouldn't be able to do this for any native objects in .a archives, but that's already an limitation for bitcode objects in distributed ThinLTO builds (they can't be in .a archives).
>>
>> This patch can be adjusted to use the intermediate behavior, but I'd like to hear a reasonable example to justify the change.
>> The current approach remaps lazy and non-lazy bitcode files.
>> The adjusted behavior will additionally remap lazy ELF .o files, which add inconsistency as ELF .a members are not remapped.
>
> There's an inherent inconsistency for bitcode lazy .o objects vs bitcode .a archives already. Until and unless the remapping file was extended to handle ELF .a members there would be an inconsistency there, but I don't think this is worse or less desirable than the inconsistency vs non-ThinLTO links or in process ThinLTO links.

There is no inherent inconsistency between bitcode lazy .o objects vs bitcode .a archives. We just need a --remapping-file= syntax to support an archive member and the build system needs to extract the archive member and invoke clang with appropriate`-fthinlto-index=`.

>>> Couple other follow up comments/questions:
>>>
>>> In D130229#3688840 <https://reviews.llvm.org/D130229#3688840>, @MaskRay wrote:
>>>
>>>> Actually, I am going to argue that --thinlto-index= may solve the "libcall defined in bitcode archive" problem: a backend compile may introduce new libcall references which do not exist in the IR symbol table, and the bitcode archive may not be extracted.
>>>> Implicit ThinLTO currently works around it by forcing extraction of bitcode files containing referenced libcall symbols.
>>>> If we use --remapping-file=, we can trigger archive member extraction of such bitcode archives, and rely on implicit ThinLTO to compile the very few libcall bitcode archive members.
>>>> If we serialize the archive member extraction result and ignore non-extracted bitcode archive members, we'd still need the current workaround.
>>>
>>> My understanding looking at this patch is that non-extracted bitcode archive members get mapped to /dev/null in the mapping file, so I don't see how this patch will help the libcall extraction if they are defined in bitcode as described above? But maybe I am misunderstanding the situation described above, or what the patch does.
>>
>> I had said before I noticed that /dev/null remapping is needed. I've now forgotten why /dev/null remapping is needed...
>> Not remapping ELF .a members helps libcall symbols which are not hard coded in `lto::LTO::getRuntimeLibcallSymbols()`.
>>
>>> In D130229#3672451 <https://reviews.llvm.org/D130229#3672451>, @MaskRay wrote:
>>>
>>>> The D22356 <https://reviews.llvm.org/D22356> and D22467 <https://reviews.llvm.org/D22467> examples work with --thinlto-index= without forcing the non-lazy state.
>>>>
>>>>   % fld.lld t.o --start-lib t2.o --end-lib --start-lib t3.o --end-lib --start-lib t4.o --end-lib -t -e main -mllvm -import-instr-limit=4 --thinlto-index=a.map
>>>>   % fld.lld t.o.5.precodegen.bc --start-lib t2.o.5.precodegen.bc --end-lib --start-lib t3.o.5.precodegen.bc --end-lib --start-lib t4.o --end-lib -t --remapping-file=a.map -e main
>>>>   t.o.5.precodegen.bc
>>>>   t3.o.5.precodegen.bc
>>>>   t2.o.5.precodegen.bc
>>>>
>>>> It presumably may not work with gold because GNU ld/gold has a different archive member extraction semantics (see https://lld.llvm.org/ELF/warn_backrefs.html).
>>>> lld adopts macOS ld64/link.exe's **"first lazy object wins" approach and in practice has more stable behaviors.**
>>>> I suspect lld just needs to drop --warn-backrefs for the final native link in certain cases, but the sacrifice is suitable to me.
>>>> (If we re-generate an order depending on `-t` output for the final native link, --warn-backrefs will not be useful anyway.)
>>>
>>> Yes it looks like D22467 <https://reviews.llvm.org/D22467> definitely and probably D22356 <https://reviews.llvm.org/D22356> will both work with the approach in this patch specifically because of the way lld does archive member extraction (back then we were using gold). I believe they would fail with this approach + gold or gnu ld. So the other issue is that with this approach we would have to maintain 2 solutions, since we want distributed ThinLTO to work for other linkers such as gold and gnu ld (at least as well as they are now, in the typical case where we are not linking with both native and ThinLTO lazy objects). To get those linkers to work with this approach, you would need to extend the mapping file to add a flag to the line to say "unconditionally link" - or just change the interpretation so that anything listed in the mapping file that isn't mapped to /dev/null is unconditionally extracted/linked. That may be preferable in any case to guarantee that we always link the same objects with lld too?
>>
>> I certainly like a solution which is more portable and supports more linkers. I care a lot about the GNU ld compatibility as it is the default linker on many Linux distributions. People probably care less about gold now since its default linker state in some groups has been replaced and its upstream development is stagnating (recognized by users and binutils contributors).
>>
>> Nowadays GNU ld and gold are the only linkers with the odd archive extraction behavior.
>> Everything else has moved to the VMS (now OpenVMS), Mach-O ld64, Windows link.exe, and ld.lld behavior.
>> mold uses a variant of such a behavior (I find it difficult to formalize its behavior, but it seems fine in most cases).
>>
>> If the new option we are proposing does not work with GNU ld, I consider it is ok since (a) its traditional archive member extraction looks problematic (b) I don't know any group wants to use it since it is so slow and confumses so much memory (c) any such user can fall back the --thinlto-index-only=.
>
> Nevertheless, it is still being used. So we would need to maintain 2 solutions long term.
>
> Ideally we will replace --thinlto-index-only with a solution that would work better for all linkers. In generally, I think it is dangerous to redo the archive member selection in the final link, since ThinLTO is performing linker resolution based optimizations. Imo we should strive for a solution that allows the information for this selection to be communicated between the 2 links. I think we should have a very compelling reason to go in the other direction from what we already do on this front for bitcode objects.

--thinlto-index-only will still be supported. I just recall that GNU ld does not support --start-lib (https://sourceware.org/bugzilla/show_bug.cgi?id=24600), so GNU ld cannot perform distributed ThinLTO with the current scheme.

I was thinking it would be dangerous. As stated, I cannot find a reasonable archive/lazy .o example.
Therefore I prefer using the simple form until a convincing example is raised. I have stated that the approach as implemented in this patch is considered experimental.

If we temporarily ignore the Bazel ThinLTO implementation, we probably need to recognize that native .a files are very very common among other build systems.
We don't provide remapping syntax for archive members yet, so remapping native lazy .o files is incomplete.
As stated elsewhere, if we don't remap native .a members to /dev/null, the final native link can technically trigger implicit ThinLTO, which help some native libcall implementations.

>>> I think if this patch was modified as suggested above (map non-selected native lazy objects to /dev/null in the mapping file as well, and possibly doing something as described in the paragraph above to enforce that all objects listed in the mapping file not mapped to /dev/null are always selected and linked) this would end up being somewhat similar to what happens with D130975 <https://reviews.llvm.org/D130975>, but I think it may avoid some of the problems there (i.e. you don't need to modify the rest of the link line to locate and remove the object files, and for any linked in native .a archives I don't think it ends up being worse that the situation today - they can't be listed in the mapping file without some extension to identify .a archive members but the link can work albeit you may get different behavior that in process ThinLTO or non-ThinLTO).
>>>
>>> Thoughts?
>>
>> Hopefully all answered in previous paragraphs.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130229/new/

https://reviews.llvm.org/D130229



More information about the llvm-commits mailing list