[PATCH] D85774: [XCOFF][AIX] Enable tooling support for 64 bit symbol table parsing

Tue Aug 18 02:38:01 PDT 2020

jhenderson added inline comments.

================
Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:296
+enum SymbolAuxType : uint8_t {
+  _AUX_EXCEPT = 255, ///< Identifies an exception auxiliary entry.
+  _AUX_FCN = 254,    ///< Identifies a function auxiliary entry.
----------------
jasonliu wrote:
> hubert.reinterpretcast wrote:
> > jasonliu wrote:
> > > hubert.reinterpretcast wrote:
> > > > jasonliu wrote:
> > > > > DiggerLin wrote:
> > > > > > the name must be consistent with the aix OS file syms.h?
> > > > > > what about the to change to AUX_EXCEPT . it consistent with our current style. 
> > > > > The current style in this file seems to be (correct me if I'm wrong):
> > > > > Use a descriptive name if it's not directly taken from OS header. Otherwise, take the name directly from header and add detailed description/comment to it.
> > > > > In this case, it follows the latter. Other enums member do not have '_' because OS version do not have it either.
> > > > The issue here is that the name from the OS header is a reserved name. So by reason of not wanting undefined behaviour, we cannot use the name taken from the OS header. Unfortunately, that means we cannot be consistent in terms of using the name taken from the OS header without switching all the enumerator names to be descriptive and in the LLVM style.
> > > hmm... Is the namespace `XCOFF` not enough to prevent the undefined behavior happening? Or are we afraid of people just use `using namespace XCOFF` to defeat it?
> > The practical cause of undefined behaviour in such a case is usually that the instance of the identifier here is misparsed, or otherwise has surprising behaviour, either because it is defined as a macro or is an extension keyword. Whether the name turns out to be in scope elsewhere is not a factor for such mechanisms.
> Got it. I will switch the style of SymbolAuxType in the next revision. We will need to come up with a plan to switch the rest of the classes in this file (There are a lot).
+1 to dropping the underscore. I think it's okay for that to be the only change, but have no strong opinion either way, so happy with whichever you prefer.

================
Comment at: llvm/include/llvm/BinaryFormat/XCOFF.h:301
+  _AUX_CSECT = 251,  ///< Identifies a csect auxiliary entry.
+  _AUX_SECT = 250    ///< Identifies a SECT auxiliary entry.
+};                   // 64-bit XCOFF file only.
----------------
Not that it really matters, but it's more traiditional to order enums in ascending numerical order. Any particular reason you've done this in the reverse order?

================
Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:419
+      is64Bit() ? getNumberOfSymbolTableEntries64()
+                : getLogicalNumberOfSymbolTableEntries32();
+  SymDRI.p =
----------------
hubert.reinterpretcast wrote:
> jhenderson wrote:
> > jasonliu wrote:
> > > DiggerLin wrote:
> > > > can we add new member function as getNumberOfSymbolTableEntries()
> > > > {
> > > >   return is64Bit() is64Bit() ? getNumberOfSymbolTableEntries64()
> > > >                 : getLogicalNumberOfSymbolTableEntries32();
> > > > }
> > > > 
> > > > the function can also use in 
> > > > XCOFFObjectFile::create()
> > > > and
> > > > getSymbolNameByIndex()
> > > About all the comments mentioning if we could combining the 32bit and 64 bit version into 1 function.
> > > I don't think it's good idea because people would ignore the fact that they are returning different types underneath. 
> > From my experience working with tools that had to support 32-bit and 64-bit ELF, you don't worry about the underlying type in most cases and always use the larger type. The same probably applies here. Of course, it becomes a bit moot if you add a common getter interface as suggested out-of-line, because those getters will have to return the larger of the two return types anyway.
> > 
> > Is there a strong reason to not use the larger type everywhere?
> > Is there a strong reason to not use the larger type everywhere?
> 
> I don't know what strength this reason has, but we had noticed that some of the tools do not reflect the width of the 32-bit format fields very well (even for relatively uninterpreted output). Where the producer of the binary is under development, developers are better served if the tools emit the correct width for fields in the format.
In think in the context of printing the appropriately formatted output, you'd want to switch on the source type (i.e. `is64Bit` or whatever), at the formatting time. Certainly, this is how we've done it in our own internal code bases I work on, and there are examples of this in a number of other LLVM utilities. For example in https://github.com/llvm/llvm-project/blob/master/llvm/lib/DebugInfo/DWARF/DWARFCompileUnit.cpp#L17, the `dump` function dumps the offset with a width according to the DWARF format (i.e. 32 or 64 bit), but the `getLength` function returns a 64-bit value always. Similarly https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-objdump/ELFDump.cpp#L257 identifies the ELF format kind and uses that in determining the width of offset, size and address fields (which are stored as uint64_t) when printing ELF program header tables.

There are certainly plenty of places where this hasn't been done. Sometimes this is a mistake, other times it's for consistency with GNU output, but I think the preferred approach is the "store large, explicitly specify format on output" approach.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85774/new/

https://reviews.llvm.org/D85774