[PATCH] D112735: export unique symbol list for xcoff with llvm-nm new option "--export-symbols"

James Henderson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 5 04:08:45 PST 2022


jhenderson added inline comments.


================
Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:72
+                        ///< only)
+  F_DEP_1 = 0x0080,     ///< data Execution Protection bit 1
+  F_VARPG = 0x0100,     ///< executable requests using variable size pages
----------------
Should this be "Data Execution Protection" (i.e. upper-case "D")?


================
Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:74
+  F_VARPG = 0x0100,     ///< executable requests using variable size pages
+  F_LPTEXT = 0x0400,    ///<  executable requires large pages for text
+  F_LPDATA = 0x0800,    ///< executable requires large pages for data
----------------



================
Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:78
+                        ///< executable (equivalent to F_EXEC on AIX)
+  F_SHROBJ = 0x2000,    ///<  file is a shared object
+  F_LOADONLY =
----------------



================
Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:80-82
+      0x4000,      ///<  file can be loaded by the system loader, but it is
+                   ///<  ignored by the linker if it is a member of an archive.
+  F_DEP_2 = 0x8000 ///<  Data Execution Protection bit 2
----------------



================
Comment at: llvm/include/llvm/Object/SymbolicFile.h:122
                                  // (IR only)
+    SF_Protected = 1U << 12,     // Symbol has protected visibility
+    SF_Internal = 1U << 13       // Symbol has internal visibility
----------------
DiggerLin wrote:
> MaskRay wrote:
> > DiggerLin wrote:
> > > DiggerLin wrote:
> > > > MaskRay wrote:
> > > > > DiggerLin wrote:
> > > > > > MaskRay wrote:
> > > > > > > I don't think we need new bits.
> > > > > > > 
> > > > > > > If internal visibility has similar behavior with hidden visibility, just reuse it or not set symbol properties at all.
> > > > > > > 
> > > > > > > I am mostly concerned with the fact that BFD style describing all binary format's every symbol property simply does not work.
> > > > > > we need the "protected" visibility in xcoff object file.
> > > > > > 
> > > > > > The AIX linker  accepts 4 of such visibility attribute types:
> > > > > > 
> > > > > > export: Symbol is exported with the global export attribute.
> > > > > > hidden: Symbol is not exported.
> > > > > > protected: Symbol is exported but cannot be rebound (preempted), even if runtime linking is being used.
> > > > > > internal: Symbol is not exported. The address of the symbol must not be provided to other programs or shared objects, but the linker does not verify this.
> > > > > > 
> > > > > >  please  reference 
> > > > > > 1. [[ https://www.ibm.com/docs/en/aix/7.1?topic=formats-xcoff-object-file-format | xcoff-object-file-format ]] . (search "Symbol visibility" in file". )
> > > > > > 
> > > > > > 2. https://developer.ibm.com/articles/au-aix-symbol-visibility/ , search 'STV_PROTECTED"
> > > > > > 
> > > > > Thanks for the pointers. But see my comment below: the request is about removing the unneeded `SF_*` abstraction.
> > > > sorry that , I can not get "the request is about removing the unneeded SF_* abstraction"
> > > > 
> > > > your suggestion is to change  from "SF_Protected"  to "Protected"  ?  if so, my suggestion is that we keep "SF_Protected" as in this patch. and create a NFC to remove "SF_" 
> > > > 
> > > @MaskRay 
> > My suggestion is to avoid the `SF_*` abstraction when implementing the dumper support. 
> > 
> > `exportSymbolInfoFromObjectFile` is XCOFF specific and ideally just uses the raw XCOFF interface, instead of using `SF_*` bits.
> > 
> > The `SF_*` bits are added prudently, not in one-shot places.
> thanks for explain. @MaskRay  . as I mentioned, xcoff has 4 visibility. (Protected is  only for xcoff). 
> 
> there is common interface
> 
> ```
>  Expected<uint32_t> XCOFFObjectFile::getSymbolFlags(DataRefImpl Symb) const {
> ```
> in the  
> /scratch/zhijian/llvm/src/llvm/lib/Object/XCOFFObjectFile.cpp
> 
> if I implement getting visibility in the getSymbolFlags(), without the  protected visibility. it will look like
> 
>     
> ```
>   // There is no visibility in old 32 bit XCOFF object file interpret.
>   if (is64Bit() || (auxiliaryHeader32() && (auxiliaryHeader32()->getVersion() ==
>                                             NEW_XCOFF_INTERPRET))) {
> 
>     uint16_t SymType = XCOFFSym.getSymbolType();
>     if ((SymType & VISIBILITY_MASK) == SYM_V_INTERNAL)
>       Result |= SymbolRef::SF_Internal;
>     if ((SymType & VISIBILITY_MASK) == SYM_V_HIDDEN)
>       Result |= SymbolRef::SF_Hidden;
>     if ((SymType & VISIBILITY_MASK) == SYM_V_EXPORTED)
>       Result |= SymbolRef::SF_Exported;
>   }
> 
> ```
> if the symbo's visibility is "protected" , without defining something like "SF_Protected" in the SymbolicFile.h, it can not get the correct flag for it.    it is not reasonable for xcoff. 
> 
> I think the only thing I can do is to change the name "SF_Protected" to "XCOFF_Protected"  and add some comment for the XCOFF_Protected in the source code. 
> 
I think what @MaskRay is suggesting is to not use getSymbolFlags to get the symbol visibility for XCOFF, and to instead have another function (e.g. "getSymbolVisibility") which is defined in the XCOFFObjectFile interface, which returns some enum value or similar that is specific to XCOFF.

At the moment, as far as I can tell, the only use of this visibility is within code that already takes an XCOFFObjectFile, so there is no need to extend the getSymbolFlags function (and underlying SF_* enum).


================
Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:59-67
   uint8_t getFlag() const {
     return static_cast<const T *>(this)->FlagAndTDataAlignment &
            AuxiHeaderFlagMask;
   }
   uint8_t getTDataAlignment() const {
     return static_cast<const T *>(this)->FlagAndTDataAlignment &
            AuxiHeaderTDataAlignmentMask;
----------------
These functions should have blank lines between them.


================
Comment at: llvm/lib/BinaryFormat/Magic.cpp:91-94
+  case '<':
+    if (startswith(Magic, "<bigaf>\n"))
+      return file_magic::archive;
+    break;
----------------
This is unrelated, I think?


================
Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:621
+                                            NEW_XCOFF_INTERPRET))) {
+
+    uint16_t SymType = XCOFFSym.getSymbolType();
----------------
Delete blank line.


================
Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:623-628
+    if ((SymType & VISIBILITY_MASK) == SYM_V_INTERNAL)
+      Result |= SymbolRef::SF_Internal;
+    if ((SymType & VISIBILITY_MASK) == SYM_V_HIDDEN)
+      Result |= SymbolRef::SF_Hidden;
+
+    // TODO: We do not deal with SYM_V_PROTECTED here.
----------------
ELF has "protected" and "internal" visibility, but doesn't have the problem you are trying to solve here, with the addition of SF_Internal, and not bothering with "protected". Perhaps you should see what happens with ELF objects and what flags are used (if any) for visibility there?


================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_ar.test:1
+## Test exporting symbol list from aix archive file with "llvn-nm --export-symbols"
+
----------------
This test is using llvm-nm, but is in the llvm-objdump directory. Please fix.

Rather than introduce a canned archive to support testing this functionality, I would strongly prefer that llvm-ar support for Big Archives be finished instead, so that the archive can be created at test time. Alternatively, don't attempt to support archives in this patch, and add that in a later one.


================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:1
+## Test the option "--export-symbols" of llvm-nm: exporting symbol list of xcoff object file. 
+
----------------
See comment above - wrong place for this test.

Also no need to spell out what the new option does here (especially as "exporting symbol list" is not particularly clear).


================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:4
+## Generate XCOFF object file.
+# RUN: yaml2obj -DFLAG=0x0002 %s -o %t.o
+
----------------
You probably want two different input files that both contain symbols, as regular llvm-nm output prints each object's symbols separately, rather than folding them into a single list.


================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:9
+
+##  Test following symbols:
+##  Not export global symbols begin with "__sinit" , "__sterm" , "." , "("
----------------



================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:10-12
+##  Not export global symbols begin with "__sinit" , "__sterm" , "." , "("
+##  Not export global symbols begin with regular expression "^__[0-9]+__"
+##  Not export hidden symbols and internal symbols.
----------------
"Not export" -> "Do not export"

Also remove double space at start of comments.


================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:13
+##  Not export hidden symbols and internal symbols.
+##  Export substring of the global symbol name  begin with "__tf1" and "__tf9"
+##  Export format: symbol_name  symbol_visibility(if there is).
----------------



================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:14
+##  Export substring of the global symbol name  begin with "__tf1" and "__tf9"
+##  Export format: symbol_name  symbol_visibility(if there is).
+# RUN: llvm-nm --export-symbols %t.o | FileCheck --check-prefixes=SYM,WEAK-SYM %s
----------------
Space before opening bracket in English prose.

I'm not sure what this comment is actually trying to say though. I *think* it's trying to say that's what the format is, but in that case, I don't think it's necessary, as that is what the test is testing!


================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:15
+##  Export format: symbol_name  symbol_visibility(if there is).
+# RUN: llvm-nm --export-symbols %t.o | FileCheck --check-prefixes=SYM,WEAK-SYM %s
+
----------------
All of these test cases probably want an `--implicit-check-not={{.}}` or they may pass spuriously, due to undesired symbols appearing in the output, other than where checked.


================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:17
+
+## Test: only export unique symbols.
+# RUN: llvm-nm --export-symbols %t.o %t.o | FileCheck --check-prefixes=SYM,WEAK-SYM %s
----------------
It's not clear to me what is meant by "unique" here. I'm assuming it's referring to avoiding duplicates in the output, but in that case, what is meant by a duplicate? The name? Symbol visibility? Literally the same symbol (i.e. same object file and symbol index) etc? Depending on the intent, I think you'll need significantly more testing around this test case. Based on my reading of the code, I expect you want "symbols with same name and visibility" in which case, you don't want to use a second object file. Instead, you want two symbols in the same object file, with the same name and visibility, and also a symbol with the same name but different visibility, and a symbol with a different name, but same visibility (the latter may be indirectly covered by other cases, so may not be needed).


================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:20
+
+## Test: Not export weak symbol with option "exclude-weak".
+# RUN: llvm-nm --export-symbols --no-weak %t.o | FileCheck --check-prefixes=SYM %s
----------------



================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:23
+
+## Test: Not export __rsrc symbol name with option --no-rsrc.
+# RUN: llvm-nm --export-symbols --no-rsrc %t.o  | FileCheck --check-prefixes=NO-RSRC %s
----------------



================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:26
+
+## Test: Not export any symobl for shared object file.
+# RUN: llvm-nm --export-symbols %t_shared.o | FileCheck --check-prefixes=ANY --allow-empty %s
----------------



================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:28
+# RUN: llvm-nm --export-symbols %t_shared.o | FileCheck --check-prefixes=ANY --allow-empty %s
+
+
----------------
Delete additional blank line.


================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:155-160
+# SYM:             __rsrc
+# SYM-NEXT:        export_var export
+# SYM-NEXT:        protected_var protected
+# SYM-NEXT:        tf1value
+# SYM-NEXT:        tf9value
+# WEAK-SYM-NEXT:   weak_func
----------------



================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:162
+
+# NO-RSRC:      export_var export
+# NO-RSRC-NEXT: protected_var protected
----------------
I think it would be much simpler to a) have the __rsrc symbol later in the output, and b) putting it under a separate prefix, allowing you to do something like:

```
# SYM:           export_var export
# RSRC-NEXT: __rsrc
# SYM-NEXT: protected_var protected
# SYM-NEXT: tf1value
# SYM-NEXT: tf9value
# WEAK-SYM-NEXT: weak_func
```
This would remove the need for a near-duplicate set of symbols being checked. The base case would then use `--check-prefixes=SYM,RSRC,WEAK-SYM`, and the no __rsrc case could just omit the RSRC prefix.


================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:166
+# NO-RSRC-NEXT: tf9value
+# NO-RSRC-NEXT:   weak_func
+
----------------



================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_obj.test:168
+
+# ANY-NOT: .*
----------------
This won't do what you think it will. This will look for the literal string ".*" and fail if it is found. You probably wanted "{{.*}}".

Using `--implicit-check-not={{.*}}` is a better way of doing this, as you can then avoid the `--check-prefix=ANY` option for this test case too.


================
Comment at: llvm/tools/llvm-nm/Opts.td:54
+
+def export_symbols : FF<"export-symbols", "Export symbol list for xcoff object file or archive">, Group<grp_xcoff_o>;
+def no_rsrc : FF<"no-rsrc", "Exclude the rsrc symbol from export symbol list"
----------------
Please remember to add the new options to the llvm-nm documentation.


================
Comment at: llvm/tools/llvm-nm/Opts.td:55-56
+def export_symbols : FF<"export-symbols", "Export symbol list for xcoff object file or archive">, Group<grp_xcoff_o>;
+def no_rsrc : FF<"no-rsrc", "Exclude the rsrc symbol from export symbol list"
+           "for xcoff (requires --export-unique-symbol)">, Group<grp_xcoff_o>;
+
----------------
This help text references `--export-unique-symbol` but I don't think that's what you mean? Furthermore, if the __rsrc symbol is just a special class fo symbols, it doesn't need to depend on --export-symbols, and instead could be used to omit it from llvm-nm's regular symbol list (and therefore indirectly the --export-symbols list too), just like `--no-weak`.

Finally, could this be pushed into a separate and later patch?


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:109-110
 static bool WithoutAliases;
+static bool ExportSymbols; // The option is xcoff specific option.
+static bool NoRsrc;        // The option is xcoff specific option.
 
----------------



================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:231
   std::string IndirectName;
+  StringRef Visibility;
 };
----------------
Do you actually need this field? Can you just query the `Sym` member directly?

If you do need it, I think it needs to be somewhere above the comment in this class, since it's not related to Mach-O.


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:667
 
+void printExportSymboList(raw_fd_ostream &OS) {
+  if (SymbolList.size() == 0)
----------------
It seems to me like you're doing several separate things in this function which are different to llvm-nm's defaults:

1) Sorting based on the visibility
2) Filtering out duplicates
3) Filtering out symbols with certain visibilities
4) Printing name and visibility columns (and not other columns like type code).
5) Merging input file symbols into a single list.

I feel like you could achieve this in a more natural way, without needing to have a completely independent method for printing the symbols. Specifically, you could implement each of these as individual new functionality in the core logic (possibly with a user-facing option per new functionality), which "--export-symbols" then specifies automatically.

For example:
1) The symbol visibility sorting is just another sorting mode, like --size-sort, so could be added at the point where --size-sort does something.
2) Duplicate removal would be done based on all to-be-printed pieces of information, and could be done when printing there instead.
3) Visibility filtering could be done at the same time as e.g. weak filtering.
4) Column printing would be controlled by a series of flags (some on by default, others off by default), which control whether each column would be printed.
5) Input file merging could be common behaviour.

The advantage of dividing things up like this is that you can implement and test each of them independently of each other, which makes reviewing much easier. Additionally, by adding some or all of these as user-level switches, it allows users to get exactly the symbols they want in the format they want.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112735/new/

https://reviews.llvm.org/D112735



More information about the llvm-commits mailing list