[PATCH] D120626: [ELF] Move section assignment from initializeSymbols to postParse

Peter Smith via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 1 11:17:32 PST 2022


peter.smith added a comment.

Not supporting `comdat-binding: %t/w.o %t/g.o leads to an undesired undefined symbol. This is not ideal but acceptable because valid input cannot form this case.` makes me a bit nervous. In a clang/gcc world this is not a problem, but there may be third party compilers that differ. For example arm's legacy compiler armcc from Arm Compiler 5 uses STB_GLOBAL for definitions in COMDAT groups, whereas GCC, at least used to, uses STB_WEAK. It is possible that this is illegal, but I can't find in the spec where it says that.

Perhaps it could be accounted for in resolve as I think we'd be passing a global with the discarded section. Perhaps we could prefer the existing symbol in that case?



================
Comment at: lld/ELF/Driver.h:64
+  SmallVector<
+      std::tuple<const Symbol *, InputFile *, InputSectionBase *, uint64_t>, 0>
+      duplicates;
----------------
Could be worth a named struct? Something like DuplicateSymbol. This could be accepted by reportUndefined rather than 4 individual tuple field accesses.


================
Comment at: lld/ELF/InputFiles.cpp:1148
+    // Handle non-COMMON defined symbol below. !sym.file allows a symbol
+    // assignment redefines a symbol without an error.
+    if (!sym.file || !sym.isDefined() || secIdx == SHN_UNDEF ||
----------------
nit: !sym.file allows a symbol assignment to redefine a symbol without an error.


================
Comment at: lld/ELF/InputFiles.cpp:1162
+      if (sym.traced) {
+        printTraceSymbol(Undefined{this, sym.getName(), sym.binding,
+                                   sym.stOther, sym.type, secIdx},
----------------
If this is happening in parallel, is there a chance of interleaved output? For example a traced symbol in multiple groups? I guess that would need something like driver->duplicates?



================
Comment at: lld/ELF/Symbols.cpp:543
     return;
-  const Defined *d = cast<Defined>(&sym);
+  // With concurrency, sym might be Defined in postParse but later got changed
+  // to Undefined.
----------------
I'm trying to think how this can happen? Would this be a case where there is a definition of symbol `S` outside of a group and then an existing weak `S` from prevailing group and a global `S` from a discarded group.

May be worth expanding on the case in the comment as it isn't easy (or at least I wasn't able to easily work it out) from the code.


================
Comment at: lld/test/ELF/comdat-binding.s:13
+## in a non-prevailing COMDAT. Ideally this should be defined, but our behavior
+## is fine because valid input cannot form this case.
+# RUN: ld.lld -e 0 %t/w.o %t/g.o -o %t/und
----------------
I'm not so sure about that. It is certainly an area that is ill-specified. I can't find anything ELF prohibiting it though. Certainly it wouldn't within the same tool-chain I'd expect consistency, but I could see toolchain X using weak symbols in comdat groups and toolchain Y using global symbols unless the C++ ABI for the platform is explicit about it. 

I can only find references to weak symbols in groups for http://itanium-cxx-abi.github.io/cxx-abi/abi.html#vague-static . The Arm C++ ABI references this https://github.com/ARM-software/abi-aa/blob/main/cppabi32/cppabi32.rst#426elf-binding-of-static-data-guard-variable-symbols but is silent on other cases.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120626/new/

https://reviews.llvm.org/D120626



More information about the llvm-commits mailing list