[PATCH] llvm-objdump macho section extractor.

Tue Aug 9 14:04:10 PDT 2016

Hi Puyan,

Adding Jim Grosbach to get his thoughts.

Stepping back and thinking a bit more about the functionality you want it doesn’t seen clear to me that llvm-objdump is really the right place to do this.  I can see that in the llvm tree this would be an easy place to add this.  However it seems that llvm-objdump is more about printing the contents of objects and operating on them and taking them apart and putting parts of them in separate files.

On Mac OS X this functionality really seems to belong to the tool segedit(1) with its -extract option.  And you want an extract “all sections” into conventional named output suffixed files.  Maybe it would be better to add the functionality to segedit(1) or create an llvm-segedit(1) tool.

Part of my thinking is how to handle the error cases of multiple files on the command line, archives and multiple slices in a fat file.  Reading your code I think multiple files might work as part off your "conventional named” files includes the input file name.  But for archives and fat files I think you would get the odd error message "The file was not recognized as a valid object file” which is what ones gets for object_error::invalid_file_type.  In this case your error really should be more like “The -extract-macho-sections option only works on thin Mach-O files” or something.

Jim, what do you think about adding this functionality to llvm-objdump?

Other comments:

Other things to check is your handling of zero fill section that have no contents.  I think your code will write junk or get errors as the offsets for these are zero but they have non-zero section sizes.  I think it is best not to write any file for an S_ZEROFILL section type.

The name of the routine ExtractLoadCommand() seems very odd.  The logic seems it would be better if just folded in to your loop over the load commands:

+    for (const auto &Load : InputObject->load_commands()) {
+      if (!ExtractLoadCommand(Filename, *InputObject, Load))
+        break;
+    }

And since you are really extracting section contents maybe the names of the routines ExtractSegmentCommand() and ExtractSegment64Command() would be better as ExtractSections() and ExtractSections64().

Also the test is writing into the directory with the inputs (I think) because of the use of "-extract-macho-sections-dir ./“ I think it would be better to some how direct this to the build directory so things can be cleaned.  Not sure exactly how to do this but I know lld writes its tests to some “Output” directory in the build directory.

Also I still think a much smaller test that a 4000+ character line of hex would be nicer.  And also checking that more then just the one file for one section gets created.  Why not create a little Mach-O file with multiple sections and check all the files for the sections are created and contain the correct contents.  I would create a Mach-O file that contains ascii section contents and little files that can be diff(1)’ed or cmp(1)’ed. I did this with archives and malformed Mach-O files.  But a small file under 100 bytes could be a simple test case.

My thoughts,
Kev

> On Aug 9, 2016, at 11:51 AM, Puyan Lotfi <puyan.lotfi.llvm at gmail.com> wrote:
> 
> Revised patch again.
> 
> I removed the hex printing code, let the test handle that. Still using hexdump and filecheck to verify the contents of the binary. 
> Added better error handling for file access, and now using all of LLVM's file IO libraries.
> No more error messages on those segment types. 
> 
> PL
> 
> On Mon, Aug 8, 2016 at 2:36 PM, Kevin Enderby <enderby at apple.com <mailto:enderby at apple.com>> wrote:
> 
>> On Aug 8, 2016, at 2:29 PM, Puyan Lotfi <plotfi at apple.com <mailto:plotfi at apple.com>> wrote:
>> 
>> 
>>> On Aug 8, 2016, at 1:53 PM, Kevin Enderby <enderby at apple.com <mailto:enderby at apple.com>> wrote:
>>> 
>>> 
>>>> On Aug 8, 2016, at 11:27 AM, Puyan Lotfi <puyan.lotfi.llvm at gmail.com <mailto:puyan.lotfi.llvm at gmail.com>> wrote:
>>>> 
>>>> Kevin
>>>> 
>>>> I have a new patch attached.
>>>> 
>>>> I am fine with only supporting LC_SEGMENT_64 or LC_SEGMENT_64  for the time being but I adding a message in case the extract code encounters a segment other than those types.
>>> 
>>> I’m not understanding why you consider it an error to have load commands other than an LC_SEGMENT_64 or LC_SEGMENT_64 used with Mach-O files used with your option.  Is this not expected work with something an x86_64 Mach-O compiled for a Mac OS X?  It would have other load commands for like LC_SYMTAB, LC_UUID, LC_FUNCTION_STARTS, etc.  Or is this option only to work in some GPU Mach-O files with only LC_SEGMENT_64 or LC_SEGMENT_64 load commands?
>>> 
>> 
>> The latter is correct. I can add support for those other segments in another commit, but right now I haven't written the code to do that. I don't consider it an error, but I figured the extract should print if it encounters a segment it doesn't support.
> 
> I guess maybe I don’t understand what you are saying.  There are only two segment load commands, the LC_SEGMENT_64 and LC_SEGMENT_64.  Are you using the word “segment” to mean “load command”?  If not what do you mean by a "encounters a segment it doesn't support”?
> 
>> Would you like me to add support for all of the possible segment types??
> 
> What other “segment types” are there?  Sorry I might be lost I’m just not following you.
> 
>> 
>>>> 
>>>> I have added some additional flags to narrow down which segment to dump.
>>> 
>>>> Also, I added a flag and some code to print the section to stderr in a hex format for testing
>>> 
>>> For test wouldn’t you just extract the sections using your option and test that it is the same as a file one you add to the test suite?
>>> 
>> 
>> I could, but how should I do that? I mean, should I just run cmp on both files?
> 
> That’s what I would do.  As that is the functionality you are testing/
> 
>> Can I even do that if the test is run on a system without cmp? Sorry, I was just trying to do the thing where I could rely on FileCheck the most here.
> 
> I’m not a test expert.  Maybe there is a way to test that to files are the same with FileCheck.  I guess you might want to see if that is possible.
> 
>> 
>>> I don’t really understand the need for the code to dump the section to stderr.   There is already the option -section to dump Mach-O sections. Also seems odd that you use the ‘C’ printf style of coding here.  Maybe something more C++ like would fit better with the llvm code if you really want to keep this bit.
>>> 
>> 
>> Ah, yeah you're right what I am doing here is weird.
>> 
>>>> and I've added a test case as well.
>>> 
>>> Seems like this could be a bit smaller and test the extracted file and use your -extract-macho-sections-dir option.  Without that it seems like the test will write into the location of the input file.
>>> 
>> 
>> Ah right I think I forgot to add that flag to the test. 
>> 
>> PL
>> 
>>>> 
>>>> Let me known how this looks to you.
>>>> 
>>>> Thanks
>>>> 
>>>> PL
>>>> 
>>>> On Fri, Aug 5, 2016 at 4:22 PM, Kevin Enderby <enderby at apple.com <mailto:enderby at apple.com>> wrote:
>>>> Hi Puyan,
>>>> 
>>>> Seems like the functionality would be easier to test if it didn’t write the output files to the same directory as the input file.  And of course you’ll want to add a test case as the llvm tradition with any changes.
>>>> 
>>>> Also I’m not an llvm style expert but you can likely shorten the logic and not use the a switch statement for your return value from ExtractLoadCommand().  Also it looks like it should get an error with any Mach-O file that has non LC_SEGMENT_64 or LC_SEGMENT_64 commands?  Is that correct and what you want.
>>>> 
>>>> You also might just want to return a bool for success of failure from ExtractSectionData() and your other routines.
>>>> 
>>>> Also it llvm seems to like the shortest bit if code so I don’t think you need the “int Res” and could just use an if statement.
>>>> 
>>>> Kev
>>>> 
>>>> > On Aug 5, 2016, at 4:06 PM, Puyan Lotfi <puyan.lotfi.llvm at gmail.com <mailto:puyan.lotfi.llvm at gmail.com>> wrote:
>>>> >
>>>> > Hi Kevin, All:
>>>> >
>>>> > I have a macho-extract tool kicking around locally that I threw together a while back that simply dumps each section in a MachO to disk, and I've decided it's probably high-time to upstream the functionality into llvm-objdump.
>>>> >
>>>> > Let me know if this looks good enough to check in, or otherwise feedback would be nice too.
>>>> >
>>>> > Thanks
>>>> >
>>>> > PL
>>>> >
>>>> >
>>>> > <llvm-objdump-macho-extract.pa <http://llvm-objdump-macho-extract.pa/>tch>
>>>> 
>>>> 
>>>> <llvm-objdump-macho-extract2.patch>
>>> 
>> 
> 
> 
> <llvm-objdump-macho-extract3.patch>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160809/736bae32/attachment-0001.html>