[PATCH] llvm-objdump macho section extractor.

Wed Aug 10 15:11:55 PDT 2016

On Wed, Aug 10, 2016 at 9:30 AM, Kevin Enderby <enderby at apple.com> wrote:

>
> On Aug 9, 2016, at 7:36 PM, Saleem Abdulrasool <compnerd at compnerd.org>
> wrote:
>
> On Tue, Aug 9, 2016 at 2:04 PM, Kevin Enderby via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> Hi Puyan,
>>
>> Adding Jim Grosbach to get his thoughts.
>>
>> Stepping back and thinking a bit more about the functionality you want it
>> doesn’t seen clear to me that llvm-objdump is really the right place to do
>> this.  I can see that in the llvm tree this would be an easy place to add
>> this.  However it seems that llvm-objdump is more about printing the
>> contents of objects and operating on them and taking them apart and putting
>> parts of them in separate files.
>>
>> On Mac OS X this functionality really seems to belong to the tool
>> segedit(1) with its -extract option.  And you want an extract “all
>> sections” into conventional named output suffixed files.  Maybe it would be
>> better to add the functionality to segedit(1) or create an llvm-segedit(1)
>> tool.
>>
>> Part of my thinking is how to handle the error cases of multiple files on
>> the command line, archives and multiple slices in a fat file.  Reading your
>> code I think multiple files might work as part off your "conventional
>> named” files includes the input file name.  But for archives and fat files
>> I think you would get the odd error message "The file was not recognized as
>> a valid object file” which is what ones gets for object_error::invalid_file_type.
>> In this case your error really should be more like
>> “The -extract-macho-sections option only works on thin Mach-O files” or
>> something.
>>
>> Jim, what do you think about adding this functionality to llvm-objdump?
>>
>
> Just a by-stander, but, I think that this should go into a new tool
> `llvm-objcopy`.
>
>
> I’ve never use a system that had an objcopy(1) tool, but looking at the
> online objcopy(1) - Linux man page it doesn’t seem to have an option to
> copy out  a section’s contents into a plain file.  It seems that the output
> is always another object file.
>

Yes, usually, the output is written to another object file.  However, you
could specify the binary object to dump it to a raw file.

>   This would be another one of the missing pieces that we have in the LLVM
> toolset.  `objcopy` allows you to copy out various sections in the ELF
> world.  We could handle this similar to a multi call binary
>
>
> Could you explain a bit more what a "multi call binary” is?  Sorry it that
> is obvious for a Linux person, my back ground is traditional old school
> Unix and Mac OS X.
>

Sure.  Basically switching the behavior of the binary based on the name
with which it was invoked.  So, we would have a symlink for llvm-segedit
which would give you the behavior of segedit, otherwise it would behave
like objcopy.

> and provide a similar option set to `segedit` for MachO and the normal
> binutils-like command line for `objcopy` which can also do MachO, but does
> ELF and COFF as well.
>
>
>>
>> Other comments:
>>
>> Other things to check is your handling of zero fill section that have no
>> contents.  I think your code will write junk or get errors as the offsets
>> for these are zero but they have non-zero section sizes.  I think it is
>> best not to write any file for an S_ZEROFILL section type.
>>
>> The name of the routine ExtractLoadCommand() seems very odd.  The logic
>> seems it would be better if just folded in to your loop over the load
>> commands:
>>
>> +    for (const auto &Load : InputObject->load_commands()) {
>> +      if (!ExtractLoadCommand(Filename, *InputObject, Load))
>> +        break;
>> +    }
>>
>> And since you are really extracting section contents maybe the names of
>> the routines ExtractSegmentCommand() and ExtractSegment64Command() would
>> be better as ExtractSections() and ExtractSections64().
>>
>> Also the test is writing into the directory with the inputs (I think)
>> because of the use of "-extract-macho-sections-dir ./“ I think it would be
>> better to some how direct this to the build directory so things can be
>> cleaned.  Not sure exactly how to do this but I know lld writes its tests
>> to some “Output” directory in the build directory.
>>
>> Also I still think a much smaller test that a 4000+ character line of hex
>> would be nicer.  And also checking that more then just the one file for one
>> section gets created.  Why not create a little Mach-O file with multiple
>> sections and check all the files for the sections are created and contain
>> the correct contents.  I would create a Mach-O file that contains ascii
>> section contents and little files that can be diff(1)’ed or cmp(1)’ed. I
>> did this with archives and malformed Mach-O files.  But a small file under
>> 100 bytes could be a simple test case.
>>
>> My thoughts,
>> Kev
>>
>> On Aug 9, 2016, at 11:51 AM, Puyan Lotfi <puyan.lotfi.llvm at gmail.com>
>> wrote:
>>
>> Revised patch again.
>>
>> I removed the hex printing code, let the test handle that. Still using
>> hexdump and filecheck to verify the contents of the binary.
>> Added better error handling for file access, and now using all of LLVM's
>> file IO libraries.
>> No more error messages on those segment types.
>>
>> PL
>>
>> On Mon, Aug 8, 2016 at 2:36 PM, Kevin Enderby <enderby at apple.com> wrote:
>>
>>>
>>> On Aug 8, 2016, at 2:29 PM, Puyan Lotfi <plotfi at apple.com> wrote:
>>>
>>>
>>> On Aug 8, 2016, at 1:53 PM, Kevin Enderby <enderby at apple.com> wrote:
>>>
>>>
>>> On Aug 8, 2016, at 11:27 AM, Puyan Lotfi <puyan.lotfi.llvm at gmail.com>
>>> wrote:
>>>
>>> Kevin
>>>
>>> I have a new patch attached.
>>>
>>> I am fine with only supporting LC_SEGMENT_64 or LC_SEGMENT_64  for the
>>> time being but I adding a message in case the extract code encounters a
>>> segment other than those types.
>>>
>>>
>>> I’m not understanding why you consider it an error to have load commands
>>> other than an LC_SEGMENT_64 or LC_SEGMENT_64 used with Mach-O files used
>>> with your option.  Is this not expected work with something an x86_64
>>> Mach-O compiled for a Mac OS X?  It would have other load commands for
>>> like LC_SYMTAB, LC_UUID, LC_FUNCTION_STARTS, etc.  Or is this option
>>> only to work in some GPU Mach-O files with only LC_SEGMENT_64 or
>>> LC_SEGMENT_64 load commands?
>>>
>>>
>>> The latter is correct. I can add support for those other segments in
>>> another commit, but right now I haven't written the code to do that. I
>>> don't consider it an error, but I figured the extract should print if it
>>> encounters a segment it doesn't support.
>>>
>>>
>>> I guess maybe I don’t understand what you are saying.  There are only
>>> two segment load commands, the LC_SEGMENT_64 and LC_SEGMENT_64.  Are you
>>> using the word “segment” to mean “load command”?  If not what do you mean
>>> by a "encounters a segment it doesn't support”?
>>>
>>> Would you like me to add support for all of the possible segment types??
>>>
>>>
>>> What other “segment types” are there?  Sorry I might be lost I’m just
>>> not following you.
>>>
>>>
>>>
>>> I have added some additional flags to narrow down which segment to dump.
>>>
>>>
>>> Also, I added a flag and some code to print the section to stderr in a
>>> hex format for testing
>>>
>>>
>>> For test wouldn’t you just extract the sections using your option and
>>> test that it is the same as a file one you add to the test suite?
>>>
>>>
>>> I could, but how should I do that? I mean, should I just run cmp on both
>>> files?
>>>
>>>
>>> That’s what I would do.  As that is the functionality you are testing/
>>>
>>> Can I even do that if the test is run on a system without cmp? Sorry, I
>>> was just trying to do the thing where I could rely on FileCheck the most
>>> here.
>>>
>>>
>>> I’m not a test expert.  Maybe there is a way to test that to files are
>>> the same with FileCheck.  I guess you might want to see if that is possible.
>>>
>>>
>>> I don’t really understand the need for the code to dump the section to
>>> stderr.   There is already the option -section to dump Mach-O sections.
>>> Also seems odd that you use the ‘C’ printf style of coding here.  Maybe
>>> something more C++ like would fit better with the llvm code if you really
>>> want to keep this bit.
>>>
>>>
>>> Ah, yeah you're right what I am doing here is weird.
>>>
>>> and I've added a test case as well.
>>>
>>>
>>> Seems like this could be a bit smaller and test the extracted file and
>>> use your -extract-macho-sections-dir option.  Without that it seems like
>>> the test will write into the location of the input file.
>>>
>>>
>>> Ah right I think I forgot to add that flag to the test.
>>>
>>> PL
>>>
>>>
>>> Let me known how this looks to you.
>>>
>>> Thanks
>>>
>>> PL
>>>
>>> On Fri, Aug 5, 2016 at 4:22 PM, Kevin Enderby <enderby at apple.com> wrote:
>>>
>>>> Hi Puyan,
>>>>
>>>> Seems like the functionality would be easier to test if it didn’t write
>>>> the output files to the same directory as the input file.  And of course
>>>> you’ll want to add a test case as the llvm tradition with any changes.
>>>>
>>>> Also I’m not an llvm style expert but you can likely shorten the logic
>>>> and not use the a switch statement for your return value from
>>>> ExtractLoadCommand().  Also it looks like it should get an error with any
>>>> Mach-O file that has non LC_SEGMENT_64 or LC_SEGMENT_64 commands?  Is that
>>>> correct and what you want.
>>>>
>>>> You also might just want to return a bool for success of failure from
>>>> ExtractSectionData() and your other routines.
>>>>
>>>> Also it llvm seems to like the shortest bit if code so I don’t think
>>>> you need the “int Res” and could just use an if statement.
>>>>
>>>> Kev
>>>>
>>>> > On Aug 5, 2016, at 4:06 PM, Puyan Lotfi <puyan.lotfi.llvm at gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi Kevin, All:
>>>> >
>>>> > I have a macho-extract tool kicking around locally that I threw
>>>> together a while back that simply dumps each section in a MachO to disk,
>>>> and I've decided it's probably high-time to upstream the functionality into
>>>> llvm-objdump.
>>>> >
>>>> > Let me know if this looks good enough to check in, or otherwise
>>>> feedback would be nice too.
>>>> >
>>>> > Thanks
>>>> >
>>>> > PL
>>>> >
>>>> >
>>>> > <llvm-objdump-macho-extract.patch>
>>>>
>>>>
>>> <llvm-objdump-macho-extract2.patch>
>>>
>>>
>>>
>>>
>>>
>> <llvm-objdump-macho-extract3.patch>
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>>
>
>
> --
> Saleem Abdulrasool
> compnerd (at) compnerd (dot) org
>
>
>

-- 
Saleem Abdulrasool
compnerd (at) compnerd (dot) org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160810/5d97d161/attachment.html>