[PATCH] llvm-objdump macho section extractor.

Tue Aug 9 19:36:56 PDT 2016

On Tue, Aug 9, 2016 at 2:04 PM, Kevin Enderby via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> Hi Puyan,
>
> Adding Jim Grosbach to get his thoughts.
>
> Stepping back and thinking a bit more about the functionality you want it
> doesn’t seen clear to me that llvm-objdump is really the right place to do
> this.  I can see that in the llvm tree this would be an easy place to add
> this.  However it seems that llvm-objdump is more about printing the
> contents of objects and operating on them and taking them apart and putting
> parts of them in separate files.
>
> On Mac OS X this functionality really seems to belong to the tool
> segedit(1) with its -extract option.  And you want an extract “all
> sections” into conventional named output suffixed files.  Maybe it would be
> better to add the functionality to segedit(1) or create an llvm-segedit(1)
> tool.
>
> Part of my thinking is how to handle the error cases of multiple files on
> the command line, archives and multiple slices in a fat file.  Reading your
> code I think multiple files might work as part off your "conventional
> named” files includes the input file name.  But for archives and fat files
> I think you would get the odd error message "The file was not recognized as
> a valid object file” which is what ones gets for object_error::invalid_file_type.
> In this case your error really should be more like
> “The -extract-macho-sections option only works on thin Mach-O files” or
> something.
>
> Jim, what do you think about adding this functionality to llvm-objdump?
>

Just a by-stander, but, I think that this should go into a new tool
`llvm-objcopy`.  This would be another one of the missing pieces that we
have in the LLVM toolset.  `objcopy` allows you to copy out various
sections in the ELF world.  We could handle this similar to a multi call
binary and provide a similar option set to `segedit` for MachO and the
normal binutils-like command line for `objcopy` which can also do MachO,
but does ELF and COFF as well.

>
> Other comments:
>
> Other things to check is your handling of zero fill section that have no
> contents.  I think your code will write junk or get errors as the offsets
> for these are zero but they have non-zero section sizes.  I think it is
> best not to write any file for an S_ZEROFILL section type.
>
> The name of the routine ExtractLoadCommand() seems very odd.  The logic
> seems it would be better if just folded in to your loop over the load
> commands:
>
> +    for (const auto &Load : InputObject->load_commands()) {
> +      if (!ExtractLoadCommand(Filename, *InputObject, Load))
> +        break;
> +    }
>
> And since you are really extracting section contents maybe the names of
> the routines ExtractSegmentCommand() and ExtractSegment64Command() would
> be better as ExtractSections() and ExtractSections64().
>
> Also the test is writing into the directory with the inputs (I think)
> because of the use of "-extract-macho-sections-dir ./“ I think it would be
> better to some how direct this to the build directory so things can be
> cleaned.  Not sure exactly how to do this but I know lld writes its tests
> to some “Output” directory in the build directory.
>
> Also I still think a much smaller test that a 4000+ character line of hex
> would be nicer.  And also checking that more then just the one file for one
> section gets created.  Why not create a little Mach-O file with multiple
> sections and check all the files for the sections are created and contain
> the correct contents.  I would create a Mach-O file that contains ascii
> section contents and little files that can be diff(1)’ed or cmp(1)’ed. I
> did this with archives and malformed Mach-O files.  But a small file under
> 100 bytes could be a simple test case.
>
> My thoughts,
> Kev
>
> On Aug 9, 2016, at 11:51 AM, Puyan Lotfi <puyan.lotfi.llvm at gmail.com>
> wrote:
>
> Revised patch again.
>
> I removed the hex printing code, let the test handle that. Still using
> hexdump and filecheck to verify the contents of the binary.
> Added better error handling for file access, and now using all of LLVM's
> file IO libraries.
> No more error messages on those segment types.
>
> PL
>
> On Mon, Aug 8, 2016 at 2:36 PM, Kevin Enderby <enderby at apple.com> wrote:
>
>>
>> On Aug 8, 2016, at 2:29 PM, Puyan Lotfi <plotfi at apple.com> wrote:
>>
>>
>> On Aug 8, 2016, at 1:53 PM, Kevin Enderby <enderby at apple.com> wrote:
>>
>>
>> On Aug 8, 2016, at 11:27 AM, Puyan Lotfi <puyan.lotfi.llvm at gmail.com>
>> wrote:
>>
>> Kevin
>>
>> I have a new patch attached.
>>
>> I am fine with only supporting LC_SEGMENT_64 or LC_SEGMENT_64  for the
>> time being but I adding a message in case the extract code encounters a
>> segment other than those types.
>>
>>
>> I’m not understanding why you consider it an error to have load commands
>> other than an LC_SEGMENT_64 or LC_SEGMENT_64 used with Mach-O files used
>> with your option.  Is this not expected work with something an x86_64
>> Mach-O compiled for a Mac OS X?  It would have other load commands for
>> like LC_SYMTAB, LC_UUID, LC_FUNCTION_STARTS, etc.  Or is this option
>> only to work in some GPU Mach-O files with only LC_SEGMENT_64 or
>> LC_SEGMENT_64 load commands?
>>
>>
>> The latter is correct. I can add support for those other segments in
>> another commit, but right now I haven't written the code to do that. I
>> don't consider it an error, but I figured the extract should print if it
>> encounters a segment it doesn't support.
>>
>>
>> I guess maybe I don’t understand what you are saying.  There are only two
>> segment load commands, the LC_SEGMENT_64 and LC_SEGMENT_64.  Are you using
>> the word “segment” to mean “load command”?  If not what do you mean by a
>> "encounters a segment it doesn't support”?
>>
>> Would you like me to add support for all of the possible segment types??
>>
>>
>> What other “segment types” are there?  Sorry I might be lost I’m just not
>> following you.
>>
>>
>>
>> I have added some additional flags to narrow down which segment to dump.
>>
>>
>> Also, I added a flag and some code to print the section to stderr in a
>> hex format for testing
>>
>>
>> For test wouldn’t you just extract the sections using your option and
>> test that it is the same as a file one you add to the test suite?
>>
>>
>> I could, but how should I do that? I mean, should I just run cmp on both
>> files?
>>
>>
>> That’s what I would do.  As that is the functionality you are testing/
>>
>> Can I even do that if the test is run on a system without cmp? Sorry, I
>> was just trying to do the thing where I could rely on FileCheck the most
>> here.
>>
>>
>> I’m not a test expert.  Maybe there is a way to test that to files are
>> the same with FileCheck.  I guess you might want to see if that is possible.
>>
>>
>> I don’t really understand the need for the code to dump the section to
>> stderr.   There is already the option -section to dump Mach-O sections.
>> Also seems odd that you use the ‘C’ printf style of coding here.  Maybe
>> something more C++ like would fit better with the llvm code if you really
>> want to keep this bit.
>>
>>
>> Ah, yeah you're right what I am doing here is weird.
>>
>> and I've added a test case as well.
>>
>>
>> Seems like this could be a bit smaller and test the extracted file and
>> use your -extract-macho-sections-dir option.  Without that it seems like
>> the test will write into the location of the input file.
>>
>>
>> Ah right I think I forgot to add that flag to the test.
>>
>> PL
>>
>>
>> Let me known how this looks to you.
>>
>> Thanks
>>
>> PL
>>
>> On Fri, Aug 5, 2016 at 4:22 PM, Kevin Enderby <enderby at apple.com> wrote:
>>
>>> Hi Puyan,
>>>
>>> Seems like the functionality would be easier to test if it didn’t write
>>> the output files to the same directory as the input file.  And of course
>>> you’ll want to add a test case as the llvm tradition with any changes.
>>>
>>> Also I’m not an llvm style expert but you can likely shorten the logic
>>> and not use the a switch statement for your return value from
>>> ExtractLoadCommand().  Also it looks like it should get an error with any
>>> Mach-O file that has non LC_SEGMENT_64 or LC_SEGMENT_64 commands?  Is that
>>> correct and what you want.
>>>
>>> You also might just want to return a bool for success of failure from
>>> ExtractSectionData() and your other routines.
>>>
>>> Also it llvm seems to like the shortest bit if code so I don’t think you
>>> need the “int Res” and could just use an if statement.
>>>
>>> Kev
>>>
>>> > On Aug 5, 2016, at 4:06 PM, Puyan Lotfi <puyan.lotfi.llvm at gmail.com>
>>> wrote:
>>> >
>>> > Hi Kevin, All:
>>> >
>>> > I have a macho-extract tool kicking around locally that I threw
>>> together a while back that simply dumps each section in a MachO to disk,
>>> and I've decided it's probably high-time to upstream the functionality into
>>> llvm-objdump.
>>> >
>>> > Let me know if this looks good enough to check in, or otherwise
>>> feedback would be nice too.
>>> >
>>> > Thanks
>>> >
>>> > PL
>>> >
>>> >
>>> > <llvm-objdump-macho-extract.patch>
>>>
>>>
>> <llvm-objdump-macho-extract2.patch>
>>
>>
>>
>>
>>
> <llvm-objdump-macho-extract3.patch>
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>

-- 
Saleem Abdulrasool
compnerd (at) compnerd (dot) org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160809/337ddb91/attachment.html>