[PATCH] D54674: [llvm-objcopy] First bits for MachO

Wed Dec 19 13:19:13 PST 2018

jakehehrlich accepted this revision.
jakehehrlich added a comment.
This revision is now accepted and ready to land.

I think this looks pretty good. I'd like an explicit public approval from someone who knows MachO better but pending that and a few relatively minor requests I think I'm happy. I've gone over the high level structure and approve of it. I see major obvious issues in the code, it looks preety good to me. I'll defer to someone else on the MachO details. Testing looks pretty good but I don't know the full space very well so I'd like a MachO person to look at that as well.

LGTM. Please wait for a public approval from someone like @lhames before landing. I'd like such a person to comment that both the code and the testing looks good to them if possible.

================
Comment at: test/tools/llvm-objcopy/MachO/basic-big-endian-32-copy.test:116
+# CHECK: --- !mach-o
+# CHECK: FileHeader:
+# CHECK:  magic:           0xFEEDFACE
----------------
lhames wrote:
> rupprecht wrote:
> > alexshap wrote:
> > > rupprecht wrote:
> > > > Can you also verify things coming from sections/symbol tables/etc? It seems this only tests that the FileHeader is preserved and everything else could be dropped...
> > > there is a problem with obj2yaml / yaml2obj - the support for MachO is incomplete, some things (like the content of the sections) are not dumped, maybe smth else is missing too. But in these tests (in this patch) where we copy the object files without modifications the real check is done on the line 3: cmp %t %t2, 
> > > I've added FileCheck just to be sure that yaml2obj has done the right thing, but probably I should remove it here to avoid confusions.
> > > Regarding your other comment (about making use of llvm-readobj for tests) -
> > > yeah, I think llvm-readobj should work for some scenarios, although with llvm-readobj there is an issue - it doesn't print all the mach-o specific bits (i.e. all the load commands). 
> > > So basically my proposal for this: in these tests cmp should be sufficient, later I will have to use yaml2obj & llvm-readobj to test the changes of the load commands, when I run into an issue that smth is not printed by the existing tools I will try to extend obj2yaml/llvm-readobj to support it. 
> > Sounds good to me.
> > 
> > Replying to this comment, but also including something from @jakehehrlich (which was on the email review thread but I can't find it in phab...)
> > > I always regretted and disliked using yaml2obj. It works fine but honestly I prefer assembly. For ELF this means that program headers can't be tested inside of llvm since only the linker produces those. Early on I felt it critical to focus on fully linked executables and stripping them because that was the use case my team had at the time (and output to binary actually). It almost accidentally occurred that relocatable files were even supported. Today you could probably test the bulk of objcopy using the asembler and leave just a limited number of use cases to hand crafted ELF files and yaml2obj for when you're testing a valid ELF file that happens to have program headers. So my gut would be to say we should prefer using assembly to generate .o files rather than yaml2obj or uploading binaries we've built. Uploading binaries should be reserved for cases when both yaml2obj and assembly cannot produce a file that tickles the code you're trying to test which is almost always something that not even the linker can produce in my experience.
> > 
> > My comment about using yaml2obj + llvm-readobj isn't really about preference for those tools, just that we should have consistent tests:
> > * Ideally there would only be one testing method used (e.g. every test uses yaml2obj to make the obj, run llvm-objcopy/strip, then llvm-readobj to verify it)
> > * If not, there should be a priority list, e.g. generated files use assembly if possible, yaml2obj if not possible w/ assembly, and checked in object files if neither are possible; similarly verification should use llvm-readobj if possible, if not then obj2yaml, if not then cmp
> > I'm fine if we want to use assembly instead of yaml2obj. We should update all the tests (where possible) to do that if that's what we decide.
> > 
> > More explicitly, we should not have:
> > * Test1 over here uses assembly because $person1 likes assembly
> > * Test2 over there uses llvm IR because $person2 likes llvm IR
> > * $person3 needs to fix a bug and update Test1 and Test2, and has to do it in two different ways because $person1 and $person2 have different preferences. $person3 gives up in frustration and abandons fixing the bug.
> > 
> > If there's any specific problems (e.g. llvm-readobj doesn't print all the fields for mach-o files), it's certainly fine to use some other tool, although it'd be nice if there's a comment/TODO saying why we aren't using the "normal" testing process, so that it could be changed later (e.g. in this case if llvm-readobj is improved w.r.t. mach-o support needed here)
> > 
> > I guess I don't actually have any blocking comment here, I just wanted to hijack this patch to have this discussion and we can do something independent of this patch :) Carry on!
> > I always regretted and disliked using yaml2obj. It works fine but honestly I prefer assembly. For ELF this means that program headers can't be tested inside of llvm since only the linker produces those.
> 
> That seems like something that we should consider adding to yaml2obj?
> 
> In my mind, yaml2obj ought to be a tool that lets you to describe any valid object file in YAML, whereas assemblers will usually make some assumptions (e.g. load-command ordering) about how you want the object laid out.
Follow up on this. It's far from it in its current state and it would take quite a lot to get it there. Also consider that when testing you often want to create invalid ELF files as well. With the exception of Content, this is quite hard to accomplish with yaml2obj. 

================
Comment at: tools/llvm-objcopy/MachO/Reader.cpp:68
+  std::vector<Section> Sections;
+  for (; reinterpret_cast<const void *>(Curr) < End; Curr++) {
+    if (MachOObj.isLittleEndian() != sys::IsLittleEndianHost) {
----------------
How hard would it be to put this in an ArrayRef? This loop handles the case where the load command is larger than the array of sections. 1) Can we error out if that is the case 2) How tricky is it to alignFrom based on the size of SectionType? I really like to put things in ranges and first and foremost try to use an existing std:: algorithm and then a range based for loop, and only then do I consider something else. This loop doesn't neatly fit into an algorithm but it seems like a range based for loop is doable without too much issue.

================
Comment at: tools/llvm-objcopy/MachO/Reader.cpp:84
+    if (!SecRef)
+      reportError(MachOObj.getFileName(), SecRef.takeError());
+    
----------------
I should have checked the COFF code for this sort of issue a bit more than I did but I'd like to avoid this where possible. Failing on the first sign of error will slow the pace at which we can convert this into a library which is a long term plan I'd like to see become a reality.

================
Comment at: tools/llvm-objcopy/MachO/Writer.cpp:343
+  if (auto E = B.commit())
+    reportError(B.getName(), errorToErrorCode(std::move(E)));
+}
----------------
ditto. This one might require changing the interface of MachOWriter and diverge from the interface that I used in the ELF code. I just consider the way I handled fatal errors a mistake.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D54674/new/

https://reviews.llvm.org/D54674