[PATCH] D54674: [llvm-objcopy] First bits for MachO

Thu Dec 6 20:46:35 PST 2018

lhames added a comment.

Apart from these specific comments, my general thought is that this is fine as far as it goes, but as it grows more fully featured it seems likely to start overlapping more libObject and MC functionality.

Have you considered ways to refactor libObject/MC to allow objcopy to be built on top of them?

================
Comment at: test/tools/llvm-objcopy/MachO/basic-big-endian-32-copy.test:116
+# CHECK: --- !mach-o
+# CHECK: FileHeader:
+# CHECK:  magic:           0xFEEDFACE
----------------
rupprecht wrote:
> alexshap wrote:
> > rupprecht wrote:
> > > Can you also verify things coming from sections/symbol tables/etc? It seems this only tests that the FileHeader is preserved and everything else could be dropped...
> > there is a problem with obj2yaml / yaml2obj - the support for MachO is incomplete, some things (like the content of the sections) are not dumped, maybe smth else is missing too. But in these tests (in this patch) where we copy the object files without modifications the real check is done on the line 3: cmp %t %t2, 
> > I've added FileCheck just to be sure that yaml2obj has done the right thing, but probably I should remove it here to avoid confusions.
> > Regarding your other comment (about making use of llvm-readobj for tests) -
> > yeah, I think llvm-readobj should work for some scenarios, although with llvm-readobj there is an issue - it doesn't print all the mach-o specific bits (i.e. all the load commands). 
> > So basically my proposal for this: in these tests cmp should be sufficient, later I will have to use yaml2obj & llvm-readobj to test the changes of the load commands, when I run into an issue that smth is not printed by the existing tools I will try to extend obj2yaml/llvm-readobj to support it. 
> Sounds good to me.
> 
> Replying to this comment, but also including something from @jakehehrlich (which was on the email review thread but I can't find it in phab...)
> > I always regretted and disliked using yaml2obj. It works fine but honestly I prefer assembly. For ELF this means that program headers can't be tested inside of llvm since only the linker produces those. Early on I felt it critical to focus on fully linked executables and stripping them because that was the use case my team had at the time (and output to binary actually). It almost accidentally occurred that relocatable files were even supported. Today you could probably test the bulk of objcopy using the asembler and leave just a limited number of use cases to hand crafted ELF files and yaml2obj for when you're testing a valid ELF file that happens to have program headers. So my gut would be to say we should prefer using assembly to generate .o files rather than yaml2obj or uploading binaries we've built. Uploading binaries should be reserved for cases when both yaml2obj and assembly cannot produce a file that tickles the code you're trying to test which is almost always something that not even the linker can produce in my experience.
> 
> My comment about using yaml2obj + llvm-readobj isn't really about preference for those tools, just that we should have consistent tests:
> * Ideally there would only be one testing method used (e.g. every test uses yaml2obj to make the obj, run llvm-objcopy/strip, then llvm-readobj to verify it)
> * If not, there should be a priority list, e.g. generated files use assembly if possible, yaml2obj if not possible w/ assembly, and checked in object files if neither are possible; similarly verification should use llvm-readobj if possible, if not then obj2yaml, if not then cmp
> I'm fine if we want to use assembly instead of yaml2obj. We should update all the tests (where possible) to do that if that's what we decide.
> 
> More explicitly, we should not have:
> * Test1 over here uses assembly because $person1 likes assembly
> * Test2 over there uses llvm IR because $person2 likes llvm IR
> * $person3 needs to fix a bug and update Test1 and Test2, and has to do it in two different ways because $person1 and $person2 have different preferences. $person3 gives up in frustration and abandons fixing the bug.
> 
> If there's any specific problems (e.g. llvm-readobj doesn't print all the fields for mach-o files), it's certainly fine to use some other tool, although it'd be nice if there's a comment/TODO saying why we aren't using the "normal" testing process, so that it could be changed later (e.g. in this case if llvm-readobj is improved w.r.t. mach-o support needed here)
> 
> I guess I don't actually have any blocking comment here, I just wanted to hijack this patch to have this discussion and we can do something independent of this patch :) Carry on!
> I always regretted and disliked using yaml2obj. It works fine but honestly I prefer assembly. For ELF this means that program headers can't be tested inside of llvm since only the linker produces those.

That seems like something that we should consider adding to yaml2obj?

In my mind, yaml2obj ought to be a tool that lets you to describe any valid object file in YAML, whereas assemblers will usually make some assumptions (e.g. load-command ordering) about how you want the object laid out.

================
Comment at: test/tools/llvm-objcopy/MachO/real-world-input-32-copy.test:24
+
+clang++ -c -arch i386 macho.cpp -o macho-little-endian-32.o
----------------
alexshap wrote:
> rupprecht wrote:
> > This doesn't seem to have anything macho specific in it. Is there a way to run this command from something besides osx and generate a macho file? (-target XXX?)
> yes, i'll update the comments
Actually it would be check in the assembly for this (clang -s), then use llvm-mc to assemble it in the test case, with something like:

  llvm-mc -assemble -triple x86_64-apple-macosx10.14.0 -filetype=obj -o %t %s

That should allow you to do this without checking in any binaries.

================
Comment at: tools/llvm-objcopy/MachO/MachOReader.cpp:49-59
+template <typename SectionType> Section constructSection(SectionType Sec);
+
+template <> Section constructSection(MachO::section Sec) {
+  return constructSectionCommon(Sec);
+}
+
+template <> Section constructSection(MachO::section_64 Sec) {
----------------
Is there any reason to use templates here, rather than ordinary overloads?

================
Comment at: tools/llvm-objcopy/MachO/MachOWriter.cpp:224-232
+  for (size_t Index = 0; Index < O.StrTable.Strings.size(); ++Index) {
+    memcpy(Out, O.StrTable.Strings[Index].data(),
+           O.StrTable.Strings[Index].size());
+    Out += O.StrTable.Strings[Index].size();
+    if (Index + 1 != O.StrTable.Strings.size()) {
+      memcpy(Out, "\0", 1);
+      Out += 1;
----------------
Strings is a vector of std::strings and those are guaranteed to be null-terminated. You could simplify this to:

  for (size_t Index = 0; Index < O.StrTable.Strings.size(); ++Index) {
    bool NullTerminate = Index + 1 != O.StrTable.Strings.size();
    memcpy(Out, O.StrTable.Strings[Index].data(),
             O.StrTable.Strings[Index].size() + NullTerminate);
    Out += O.StrTable.Strings[Index].size() + 1;
  }

================
Comment at: tools/llvm-objcopy/MachO/MachOWriter.cpp:304-307
+    if (SymTabCommand.symoff)
+      Queue.push_back({SymTabCommand.symoff, &MachOWriter::writeSymbolTable});
+    if (SymTabCommand.stroff)
+      Queue.push_back({SymTabCommand.stroff, &MachOWriter::writeStringTable});
----------------
This seems to be doubled up.

================
Comment at: tools/llvm-objcopy/MachO/Object.h:90
+struct StringTable {
+  std::vector<std::string> Strings;
+};
----------------
jakehehrlich wrote:
> Early on I took this approach for ELF as well but it wound up being a mistake for ELF. 1) Something already existed that would construct string tables for me and 2) It didn't handle sharing correctly. Do these issues not occur in MachO?
> It didn't handle sharing correctly. Do these issues not occur in MachO?

I'm not sure. What is "sharing" in this context?

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D54674/new/

https://reviews.llvm.org/D54674