[PATCH] D61532: implement of the parsing symbol table for xcoffobjfile and output as yaml format

Thu May 16 20:52:34 PDT 2019

sfertile added inline comments.

================
Comment at: llvm/include/llvm/Object/XCOFFObjectFile.h:155

+public:
   XCOFFObjectFile(MemoryBufferRef Object, std::error_code &EC);
----------------
DiggerLin wrote:
> sfertile wrote:
> > Why is this needed?
> I added addition public to separate the functions which are inherited from objectfile class from the functions api which only in xcoffobjectfile. it more readable.
> separate the functions which are inherited from objectfile class from the functions api which only in xcoffobjectfile.

Thats a good idea,  adding a brief comment to that end would work.

================
Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:97
+  const XCOFFSymbolEntry *SymEntPtr = toSymbolEntry(Symb);
+
+  SymEntPtr += SymEntPtr->NumberOfAuxEntries + 1;
----------------
DiggerLin wrote:
> sfertile wrote:
> > I would get rid of this white space.
> I always put blank line between a local variable define(declare) and function code, I think it maybe more readable.
Ok, that's pretty reasonable.

================
Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:404
   // Current location within the file.
   uint64_t CurPtr = 0;

----------------
MaskRay wrote:
> sfertile wrote:
> > MaskRay wrote:
> > > The heavylifting parsing work should be done in `createXCOFFObjectFile`. The constructor should be private and be a very thin wrapper.
> > > The constructor should be private 
> > 
> > This makes sense to me.
> > 
> > > The heavylifting parsing work should be done in `createXCOFFObjectFile`.  ... be a very thin wrapper.
> > 
> > This seems to go against what most of the other file formats do. I think at this point it seems reasonable to move the parsing  we have into  `createXCOFFObjectFile` but we have a downstream patch under review locally for adding 64-bit support and  I believe such a change would seriously complicate the parsing/error handling needed to support the different parts of the object format. If the parsing fails we propagate the error to the caller, so I'm not sure if the state of the object is that big if a concern. I'd like to know what benefits your looking to get by moving the parsing into the create function, maybe its worth the extra complication or maybe we can capture parts of the benefit without too drastic a change.
> In `ELFObjectFile.cpp`, the ctor is a thin wrapper.
> In `COFFObjectFile.cpp` and `MachOObjectFile.cpp`, they pass an non-const reference of `Error` variable to carry the parsing error.
> It seems to me the ELF approach is slightly better as we can just use the return type instead of passing another variable around.
> 
> ```
> // COFF
>   std::unique_ptr<COFFObjectFile> Ret(new COFFObjectFile(Object, EC));
>   if (EC)
> //// the object is created then destroyed if there is an error.
>     return errorCodeToError(EC);
> 
> // ELF
>   auto Ret = ELFObjectFile<ELFT>::create(Object);
> /// if there is a parse error, nothing gets created
>   if (Error E = Ret.takeError())
>     return std::move(E);
>   return make_unique<ELFObjectFile<ELFT>>(std::move(*Ret));
> ```
> 
> I hope this wouldn't be too drastic a change to make, but if it is, feel free to keep the current shape and make improvement (if any) in the future :)
I definitely agree the ELF way is better. The way I extended the parsing for 64-bit support though relies on it being done in the constructor  since I rely on the file header accessors and section header accessors to abstract over the structure differences. I'll have a closer look at the ELF handling to see if we can move in that direction.

================
Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:119
+  // or .debug section.
+  // A byte offset value less than 4 is a null or zero-length symbol name.
+  if (Offset < 4)
----------------
DiggerLin wrote:
> hubert.reinterpretcast wrote:
> > DiggerLin wrote:
> > > hubert.reinterpretcast wrote:
> > > > > A byte offset value of 0 is a null or zero-length symbol name.
> > > > Is the "less than 4" an invention of this patch? A byte offset of 1, 2, or 3 points into the length field. It is well-formed as claimed by the comment in the patch, or is it erroneous?
> > > in the xcoff document, it describe as  " A byte offset value of 0 is a null or zero-length symbol name."
> > > 
> > > A byte offset of 1, 2, or 3 points into the length field , the document do not talk about anything on it. we look it as "zero-length symbol name" . If we can treated  the "A byte offset of 1, 2, or 3" as error format, and return as
> > > make_error<GenericBinaryError>("Symbol Name offset error",object_error::parse_failed); 
> > > But I think looking  "A byte offset of 1, 2, or 3" as "zero-length symbol name" will make obj2yaml more compatible。
> > The comment should be clear as to what is part of the specification and what is part of the implementation. Suggestion:
> > A byte offset value of 0 is a null or zero-length symbol name. A byte offset in the range 1 to 3 (inclusive) points into the length field; as a soft-error recovery mechanism, we treat such cases as having an offset of 0.
> changed the comment as suggestion
Could we have a valid offset of less than 4 if the name is in the debug section? I'm not suggesting we need to change in this patch, just considering the future.

================
Comment at: llvm/lib/Object/XCOFFObjectFile.cpp:115
+  if (SymEntPtr->StorageClass & 0x80)
+    return make_error<StringError>(
+        "XCOFF object file with debug information not yet supported",
----------------
hubert.reinterpretcast wrote:
> I'm not sure what asserting on a asserts-enabled build but generating a hard-error on a release build buys us. If we are okay with a hard error, then we can just go with it for all builds. If we are not okay with a hard error, then we don't want the debug build to trip either (and a TODO comment with some recovery path is the right answer).
> 
> Note that `llvm/test/tools/llvm-readobj/Inputs/xcoff-basic.o` has such symbols.
> 
>Note that llvm/test/tools/llvm-readobj/Inputs/xcoff-basic.o has such symbols.

We need to  commit to implementing the debug name handling either in the same patch that enables dumping symbols from readobj or before that. It would be useless if we have to tip-toe around object files with debug info. Until then I think the assert alone is fine.

================
Comment at: llvm/tools/obj2yaml/xcoff2yaml.cpp:34
   dumpHeader();
+  if ((EC = dumpSymbols()))
+    return EC;
----------------
DiggerLin wrote:
> hubert.reinterpretcast wrote:
> > DiggerLin wrote:
> > > sfertile wrote:
> > > > What gets returned when this is false?
> > > if symbol name parse failed . 
> > > it return std::error_code which value is 3 . defined in 
> > > Error.h 
> > > 
> > > enum class object_error {
> > >   // Error code 0 is absent. Use std::error_code() instead.
> > >   arch_not_found = 1,
> > >   invalid_file_type,
> > >   parse_failed,
> > >   unexpected_eof,
> > >   string_table_non_null_end,
> > >   invalid_section_index,
> > >   bitcode_section_not_found,
> > >   invalid_symbol_index,
> > > };
> > Sean is pointing out that there is no return statement in the case where there is no error on the call to `dumpSymbols`.
> thanks for clarify the question, fixed it.
Sorry, I should have been clearer.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61532/new/

https://reviews.llvm.org/D61532