[PATCH] D112735: export unique symbol list for xcoff with llvm-nm new option "--export-symbols"

Digger Lin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 7 14:56:46 PST 2022


DiggerLin added inline comments.


================
Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:80-82
+      0x4000,      ///<  file can be loaded by the system loader, but it is
+                   ///<  ignored by the linker if it is a member of an archive.
+  F_DEP_2 = 0x8000 ///<  Data Execution Protection bit 2
----------------
jhenderson wrote:
> 
thanks


================
Comment at: llvm/include/llvm/Object/SymbolicFile.h:122
                                  // (IR only)
+    SF_Protected = 1U << 12,     // Symbol has protected visibility
+    SF_Internal = 1U << 13       // Symbol has internal visibility
----------------
jhenderson wrote:
> DiggerLin wrote:
> > MaskRay wrote:
> > > DiggerLin wrote:
> > > > DiggerLin wrote:
> > > > > MaskRay wrote:
> > > > > > DiggerLin wrote:
> > > > > > > MaskRay wrote:
> > > > > > > > I don't think we need new bits.
> > > > > > > > 
> > > > > > > > If internal visibility has similar behavior with hidden visibility, just reuse it or not set symbol properties at all.
> > > > > > > > 
> > > > > > > > I am mostly concerned with the fact that BFD style describing all binary format's every symbol property simply does not work.
> > > > > > > we need the "protected" visibility in xcoff object file.
> > > > > > > 
> > > > > > > The AIX linker  accepts 4 of such visibility attribute types:
> > > > > > > 
> > > > > > > export: Symbol is exported with the global export attribute.
> > > > > > > hidden: Symbol is not exported.
> > > > > > > protected: Symbol is exported but cannot be rebound (preempted), even if runtime linking is being used.
> > > > > > > internal: Symbol is not exported. The address of the symbol must not be provided to other programs or shared objects, but the linker does not verify this.
> > > > > > > 
> > > > > > >  please  reference 
> > > > > > > 1. [[ https://www.ibm.com/docs/en/aix/7.1?topic=formats-xcoff-object-file-format | xcoff-object-file-format ]] . (search "Symbol visibility" in file". )
> > > > > > > 
> > > > > > > 2. https://developer.ibm.com/articles/au-aix-symbol-visibility/ , search 'STV_PROTECTED"
> > > > > > > 
> > > > > > Thanks for the pointers. But see my comment below: the request is about removing the unneeded `SF_*` abstraction.
> > > > > sorry that , I can not get "the request is about removing the unneeded SF_* abstraction"
> > > > > 
> > > > > your suggestion is to change  from "SF_Protected"  to "Protected"  ?  if so, my suggestion is that we keep "SF_Protected" as in this patch. and create a NFC to remove "SF_" 
> > > > > 
> > > > @MaskRay 
> > > My suggestion is to avoid the `SF_*` abstraction when implementing the dumper support. 
> > > 
> > > `exportSymbolInfoFromObjectFile` is XCOFF specific and ideally just uses the raw XCOFF interface, instead of using `SF_*` bits.
> > > 
> > > The `SF_*` bits are added prudently, not in one-shot places.
> > thanks for explain. @MaskRay  . as I mentioned, xcoff has 4 visibility. (Protected is  only for xcoff). 
> > 
> > there is common interface
> > 
> > ```
> >  Expected<uint32_t> XCOFFObjectFile::getSymbolFlags(DataRefImpl Symb) const {
> > ```
> > in the  
> > /scratch/zhijian/llvm/src/llvm/lib/Object/XCOFFObjectFile.cpp
> > 
> > if I implement getting visibility in the getSymbolFlags(), without the  protected visibility. it will look like
> > 
> >     
> > ```
> >   // There is no visibility in old 32 bit XCOFF object file interpret.
> >   if (is64Bit() || (auxiliaryHeader32() && (auxiliaryHeader32()->getVersion() ==
> >                                             NEW_XCOFF_INTERPRET))) {
> > 
> >     uint16_t SymType = XCOFFSym.getSymbolType();
> >     if ((SymType & VISIBILITY_MASK) == SYM_V_INTERNAL)
> >       Result |= SymbolRef::SF_Internal;
> >     if ((SymType & VISIBILITY_MASK) == SYM_V_HIDDEN)
> >       Result |= SymbolRef::SF_Hidden;
> >     if ((SymType & VISIBILITY_MASK) == SYM_V_EXPORTED)
> >       Result |= SymbolRef::SF_Exported;
> >   }
> > 
> > ```
> > if the symbo's visibility is "protected" , without defining something like "SF_Protected" in the SymbolicFile.h, it can not get the correct flag for it.    it is not reasonable for xcoff. 
> > 
> > I think the only thing I can do is to change the name "SF_Protected" to "XCOFF_Protected"  and add some comment for the XCOFF_Protected in the source code. 
> > 
> I think what @MaskRay is suggesting is to not use getSymbolFlags to get the symbol visibility for XCOFF, and to instead have another function (e.g. "getSymbolVisibility") which is defined in the XCOFFObjectFile interface, which returns some enum value or similar that is specific to XCOFF.
> 
> At the moment, as far as I can tell, the only use of this visibility is within code that already takes an XCOFFObjectFile, so there is no need to extend the getSymbolFlags function (and underlying SF_* enum).
I remove the define of SF_Internal = 1U << 12


================
Comment at: llvm/lib/BinaryFormat/Magic.cpp:91-94
+  case '<':
+    if (startswith(Magic, "<bigaf>\n"))
+      return file_magic::archive;
+    break;
----------------
jhenderson wrote:
> This is unrelated, I think?
yes, it need ,  in the function 

```
static void dumpSymbolNamesFromFile(std::string &Filename) {
 .....
   Expected<std::unique_ptr<Binary>> BinaryOrErr =
       **createBinary(BufferOrErr.get()->getMemBufferRef(), ContextPtr);**
...
 }
```
createBinary will call identify_magic()









================
Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:623-628
+    if ((SymType & VISIBILITY_MASK) == SYM_V_INTERNAL)
+      Result |= SymbolRef::SF_Internal;
+    if ((SymType & VISIBILITY_MASK) == SYM_V_HIDDEN)
+      Result |= SymbolRef::SF_Hidden;
+
+    // TODO: We do not deal with SYM_V_PROTECTED here.
----------------
jhenderson wrote:
> ELF has "protected" and "internal" visibility, but doesn't have the problem you are trying to solve here, with the addition of SF_Internal, and not bothering with "protected". Perhaps you should see what happens with ELF objects and what flags are used (if any) for visibility there?
yes, in ELF, it only deal with SF_Exported and SF_Hidden  in function getSymbolFlags();




```
  if (isExportedToOtherDSO(ESym))
     Result |= SymbolRef::SF_Exported;
 
   if (ESym->getVisibility() == ELF::STV_HIDDEN)
     Result |= SymbolRef::SF_Hidden;
```




================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_ar.test:1
+## Test exporting symbol list from aix archive file with "llvn-nm --export-symbols"
+
----------------
jhenderson wrote:
> This test is using llvm-nm, but is in the llvm-objdump directory. Please fix.
> 
> Rather than introduce a canned archive to support testing this functionality, I would strongly prefer that llvm-ar support for Big Archives be finished instead, so that the archive can be created at test time. Alternatively, don't attempt to support archives in this patch, and add that in a later one.
good catch. thanks. I agree with you that we should avoid using  canned archive, The functionality of "export symbol list" was originally planned to be completed by the end of last year. but considering the workload, we postponed it to the end of January this year.  I think it still a long distance to have the "big archive write" patch landed(we not ye begin to review the patch ).  I am sorry that I have to use canned archive in the patch. 






================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:4
+## Generate XCOFF object file.
+# RUN: yaml2obj -DFLAG=0x0002 %s -o %t.o
+
----------------
jhenderson wrote:
> You probably want two different input files that both contain symbols, as regular llvm-nm output prints each object's symbols separately, rather than folding them into a single list.
not clear about the comment, can you explain more detail?


================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:14
+##  Export substring of the global symbol name  begin with "__tf1" and "__tf9"
+##  Export format: symbol_name  symbol_visibility(if there is).
+# RUN: llvm-nm --export-symbols %t.o | FileCheck --check-prefixes=SYM,WEAK-SYM %s
----------------
jhenderson wrote:
> Space before opening bracket in English prose.
> 
> I'm not sure what this comment is actually trying to say though. I *think* it's trying to say that's what the format is, but in that case, I don't think it's necessary, as that is what the test is testing!
thanks


================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:17
+
+## Test: only export unique symbols.
+# RUN: llvm-nm --export-symbols %t.o %t.o | FileCheck --check-prefixes=SYM,WEAK-SYM %s
----------------
jhenderson wrote:
> It's not clear to me what is meant by "unique" here. I'm assuming it's referring to avoiding duplicates in the output, but in that case, what is meant by a duplicate? The name? Symbol visibility? Literally the same symbol (i.e. same object file and symbol index) etc? Depending on the intent, I think you'll need significantly more testing around this test case. Based on my reading of the code, I expect you want "symbols with same name and visibility" in which case, you don't want to use a second object file. Instead, you want two symbols in the same object file, with the same name and visibility, and also a symbol with the same name but different visibility, and a symbol with a different name, but same visibility (the latter may be indirectly covered by other cases, so may not be needed).
Since removing duplicate symbol(with the same name and visibility) happen at the end of the code(after merge all the output).  I think it is reasonable to export symbols from two same object file and check whether it remove the duplicate symbol.

> and also a symbol with the same name but different visibility, and a symbol with a different name, but same visibility (the latter may be indirectly covered by other cases, so may not be needed).

I add a new symbol "export_protested_var protected" to test it.




================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:162
+
+# NO-RSRC:      export_var export
+# NO-RSRC-NEXT: protected_var protected
----------------
jhenderson wrote:
> I think it would be much simpler to a) have the __rsrc symbol later in the output, and b) putting it under a separate prefix, allowing you to do something like:
> 
> ```
> # SYM:           export_var export
> # RSRC-NEXT: __rsrc
> # SYM-NEXT: protected_var protected
> # SYM-NEXT: tf1value
> # SYM-NEXT: tf9value
> # WEAK-SYM-NEXT: weak_func
> ```
> This would remove the need for a near-duplicate set of symbols being checked. The base case would then use `--check-prefixes=SYM,RSRC,WEAK-SYM`, and the no __rsrc case could just omit the RSRC prefix.
in order to simple the review, I separate the symbol check part for different test.


================
Comment at: llvm/tools/llvm-nm/Opts.td:55-56
+def export_symbols : FF<"export-symbols", "Export symbol list for xcoff object file or archive">, Group<grp_xcoff_o>;
+def no_rsrc : FF<"no-rsrc", "Exclude the rsrc symbol from export symbol list"
+           "for xcoff (requires --export-unique-symbol)">, Group<grp_xcoff_o>;
+
----------------
jhenderson wrote:
> This help text references `--export-unique-symbol` but I don't think that's what you mean? Furthermore, if the __rsrc symbol is just a special class fo symbols, it doesn't need to depend on --export-symbols, and instead could be used to omit it from llvm-nm's regular symbol list (and therefore indirectly the --export-symbols list too), just like `--no-weak`.
> 
> Finally, could this be pushed into a separate and later patch?
we only need exclude __rsrc symbol  when export the symbol list sometimes,  we do not need to exclude the __rsrc symbol for other  purpose. so I do not think we need to implement as --no-weak.


================
Comment at: llvm/tools/llvm-nm/Opts.td:55-56
+def export_symbols : FF<"export-symbols", "Export symbol list for xcoff object file or archive">, Group<grp_xcoff_o>;
+def no_rsrc : FF<"no-rsrc", "Exclude the rsrc symbol from export symbol list"
+           "for xcoff (requires --export-unique-symbol)">, Group<grp_xcoff_o>;
+
----------------
DiggerLin wrote:
> jhenderson wrote:
> > This help text references `--export-unique-symbol` but I don't think that's what you mean? Furthermore, if the __rsrc symbol is just a special class fo symbols, it doesn't need to depend on --export-symbols, and instead could be used to omit it from llvm-nm's regular symbol list (and therefore indirectly the --export-symbols list too), just like `--no-weak`.
> > 
> > Finally, could this be pushed into a separate and later patch?
> we only need exclude __rsrc symbol  when export the symbol list sometimes,  we do not need to exclude the __rsrc symbol for other  purpose. so I do not think we need to implement as --no-weak.
> This help text references `--export-unique-symbol` but I don't think that's what you mean? Furthermore, if the __rsrc symbol is just a special class fo symbols, it doesn't need to depend on --export-symbols, and instead could be used to omit it from llvm-nm's regular symbol list (and therefore indirectly the --export-symbols list too), just like `--no-weak`.
> 
> Finally, could this be pushed into a separate and later patch?




================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:231
   std::string IndirectName;
+  StringRef Visibility;
 };
----------------
jhenderson wrote:
> Do you actually need this field? Can you just query the `Sym` member directly?
> 
> If you do need it, I think it needs to be somewhere above the comment in this class, since it's not related to Mach-O.
yes. Visibility can query the Sym member directly, if we do not have field "StringRef Visibility" ,  as cost, we need to generate the "Visibility" every time when it need 
we need Visibility when sort, compare duplication, and  print output symbols
I am prefer to add a new field here.



================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:667
 
+void printExportSymboList(raw_fd_ostream &OS) {
+  if (SymbolList.size() == 0)
----------------
jhenderson wrote:
> It seems to me like you're doing several separate things in this function which are different to llvm-nm's defaults:
> 
> 1) Sorting based on the visibility
> 2) Filtering out duplicates
> 3) Filtering out symbols with certain visibilities
> 4) Printing name and visibility columns (and not other columns like type code).
> 5) Merging input file symbols into a single list.
> 
> I feel like you could achieve this in a more natural way, without needing to have a completely independent method for printing the symbols. Specifically, you could implement each of these as individual new functionality in the core logic (possibly with a user-facing option per new functionality), which "--export-symbols" then specifies automatically.
> 
> For example:
> 1) The symbol visibility sorting is just another sorting mode, like --size-sort, so could be added at the point where --size-sort does something.
> 2) Duplicate removal would be done based on all to-be-printed pieces of information, and could be done when printing there instead.
> 3) Visibility filtering could be done at the same time as e.g. weak filtering.
> 4) Column printing would be controlled by a series of flags (some on by default, others off by default), which control whether each column would be printed.
> 5) Input file merging could be common behaviour.
> 
> The advantage of dividing things up like this is that you can implement and test each of them independently of each other, which makes reviewing much easier. Additionally, by adding some or all of these as user-level switches, it allows users to get exactly the symbols they want in the format they want.
I think over your suggestion.  My opinion is that we should  adding new functionality or options only when it is required.

> Input file merging could be common behavior.

  I  added new function PrintMergedSymbolList() (which only print symbol name and visibility(if there is)) without adding the new option "--merge-output" .  If anyone or other object file format also need the functionality only, we can add a new patch for new option at that time  to on/off the function PrintMergedSymbolList()  and deciding which others fields need to be printed out at that time.

> Duplicate removal would be done based on all to-be-printed pieces of information, and could be done when printing there instead.

  if we add a new option  "--no-duplicate" to remove the duplication symbol, the option is no sense to a single xcoff object file(I think a single object file never has duplication symbol).  the option "--no-duplicate" only has sense on the merging output together.  I  added a new function removeDuplicateSortedSymbolList() and we can add a new option "--no-duplicate" in later patch to on/off removeDuplicateSortedSymbolList() if it is need.

> Column printing would be controlled by a series of flags 

I think this is a good idea, we can add the functionality in a separate and not urgent patch,  and current patch will not depend on it.

> The symbol visibility sorting is just another sorting mode, like --size-sort, so could be added at the point where --size-sort does something.

  I add a new functionality compareSymbolNameVisibility, it only enable on the --option "--export-symbols" . if elf object file also need the functionality,  we can add a new option to enable compareSymbolNameVisibility() at that time.

the option will  "--export-symbols" will enable the three functions at same times in current patch. 
I think other new options only need to be added when not all of the three functions need to be enabled at the same time. 

> Visibility filtering could be done at the same time as e.g. weak filtering. 
  Visibility only be printed at "export symbol list" and it is always need to be printed out if there is.  I will add new option "Visibility filtering"  in later patch only there is requirement.  




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112735/new/

https://reviews.llvm.org/D112735



More information about the llvm-commits mailing list