[PATCH] llvm-objdump macho section extractor.

Thu Aug 11 10:03:54 PDT 2016

Alright,

So should I move this functionality in a new tool? Call it llvm-objcopy or
llvm-objextract? I can add the initial MachO functionality and work with
Saleem and whoever else is interested to add elf support?

PL

On Wed, Aug 10, 2016 at 3:11 PM, Saleem Abdulrasool <compnerd at compnerd.org>
wrote:

> On Wed, Aug 10, 2016 at 9:30 AM, Kevin Enderby <enderby at apple.com> wrote:
>
>>
>> On Aug 9, 2016, at 7:36 PM, Saleem Abdulrasool <compnerd at compnerd.org>
>> wrote:
>>
>> On Tue, Aug 9, 2016 at 2:04 PM, Kevin Enderby via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> Hi Puyan,
>>>
>>> Adding Jim Grosbach to get his thoughts.
>>>
>>> Stepping back and thinking a bit more about the functionality you want
>>> it doesn’t seen clear to me that llvm-objdump is really the right place to
>>> do this.  I can see that in the llvm tree this would be an easy place to
>>> add this.  However it seems that llvm-objdump is more about printing the
>>> contents of objects and operating on them and taking them apart and putting
>>> parts of them in separate files.
>>>
>>> On Mac OS X this functionality really seems to belong to the tool
>>> segedit(1) with its -extract option.  And you want an extract “all
>>> sections” into conventional named output suffixed files.  Maybe it would be
>>> better to add the functionality to segedit(1) or create an llvm-segedit(1)
>>> tool.
>>>
>>> Part of my thinking is how to handle the error cases of multiple files
>>> on the command line, archives and multiple slices in a fat file.  Reading
>>> your code I think multiple files might work as part off your "conventional
>>> named” files includes the input file name.  But for archives and fat files
>>> I think you would get the odd error message "The file was not recognized as
>>> a valid object file” which is what ones gets for object_error::invalid_file_type.
>>> In this case your error really should be more like
>>> “The -extract-macho-sections option only works on thin Mach-O files” or
>>> something.
>>>
>>> Jim, what do you think about adding this functionality to llvm-objdump?
>>>
>>
>> Just a by-stander, but, I think that this should go into a new tool
>> `llvm-objcopy`.
>>
>>
>> I’ve never use a system that had an objcopy(1) tool, but looking at the
>> online objcopy(1) - Linux man page it doesn’t seem to have an option to
>> copy out  a section’s contents into a plain file.  It seems that the output
>> is always another object file.
>>
>
> Yes, usually, the output is written to another object file.  However, you
> could specify the binary object to dump it to a raw file.
>
>>   This would be another one of the missing pieces that we have in the
>> LLVM toolset.  `objcopy` allows you to copy out various sections in the ELF
>> world.  We could handle this similar to a multi call binary
>>
>>
>> Could you explain a bit more what a "multi call binary” is?  Sorry it
>> that is obvious for a Linux person, my back ground is traditional old
>> school Unix and Mac OS X.
>>
>
> Sure.  Basically switching the behavior of the binary based on the name
> with which it was invoked.  So, we would have a symlink for llvm-segedit
> which would give you the behavior of segedit, otherwise it would behave
> like objcopy.
>
>
>> and provide a similar option set to `segedit` for MachO and the normal
>> binutils-like command line for `objcopy` which can also do MachO, but does
>> ELF and COFF as well.
>>
>>
>>>
>>> Other comments:
>>>
>>> Other things to check is your handling of zero fill section that have no
>>> contents.  I think your code will write junk or get errors as the offsets
>>> for these are zero but they have non-zero section sizes.  I think it is
>>> best not to write any file for an S_ZEROFILL section type.
>>>
>>> The name of the routine ExtractLoadCommand() seems very odd.  The logic
>>> seems it would be better if just folded in to your loop over the load
>>> commands:
>>>
>>> +    for (const auto &Load : InputObject->load_commands()) {
>>> +      if (!ExtractLoadCommand(Filename, *InputObject, Load))
>>> +        break;
>>> +    }
>>>
>>> And since you are really extracting section contents maybe the names of
>>> the routines ExtractSegmentCommand() and ExtractSegment64Command()
>>> would be better as ExtractSections() and ExtractSections64().
>>>
>>> Also the test is writing into the directory with the inputs (I think)
>>> because of the use of "-extract-macho-sections-dir ./“ I think it would be
>>> better to some how direct this to the build directory so things can be
>>> cleaned.  Not sure exactly how to do this but I know lld writes its tests
>>> to some “Output” directory in the build directory.
>>>
>>> Also I still think a much smaller test that a 4000+ character line of
>>> hex would be nicer.  And also checking that more then just the one file for
>>> one section gets created.  Why not create a little Mach-O file with
>>> multiple sections and check all the files for the sections are created and
>>> contain the correct contents.  I would create a Mach-O file that contains
>>> ascii section contents and little files that can be diff(1)’ed or
>>> cmp(1)’ed. I did this with archives and malformed Mach-O files.  But a
>>> small file under 100 bytes could be a simple test case.
>>>
>>> My thoughts,
>>> Kev
>>>
>>> On Aug 9, 2016, at 11:51 AM, Puyan Lotfi <puyan.lotfi.llvm at gmail.com>
>>> wrote:
>>>
>>> Revised patch again.
>>>
>>> I removed the hex printing code, let the test handle that. Still using
>>> hexdump and filecheck to verify the contents of the binary.
>>> Added better error handling for file access, and now using all of LLVM's
>>> file IO libraries.
>>> No more error messages on those segment types.
>>>
>>> PL
>>>
>>> On Mon, Aug 8, 2016 at 2:36 PM, Kevin Enderby <enderby at apple.com> wrote:
>>>
>>>>
>>>> On Aug 8, 2016, at 2:29 PM, Puyan Lotfi <plotfi at apple.com> wrote:
>>>>
>>>>
>>>> On Aug 8, 2016, at 1:53 PM, Kevin Enderby <enderby at apple.com> wrote:
>>>>
>>>>
>>>> On Aug 8, 2016, at 11:27 AM, Puyan Lotfi <puyan.lotfi.llvm at gmail.com>
>>>> wrote:
>>>>
>>>> Kevin
>>>>
>>>> I have a new patch attached.
>>>>
>>>> I am fine with only supporting LC_SEGMENT_64 or LC_SEGMENT_64  for the
>>>> time being but I adding a message in case the extract code encounters a
>>>> segment other than those types.
>>>>
>>>>
>>>> I’m not understanding why you consider it an error to have load
>>>> commands other than an LC_SEGMENT_64 or LC_SEGMENT_64 used with Mach-O
>>>> files used with your option.  Is this not expected work with something an
>>>> x86_64 Mach-O compiled for a Mac OS X?  It would have other load commands
>>>> for like LC_SYMTAB, LC_UUID, LC_FUNCTION_STARTS, etc.  Or is this
>>>> option only to work in some GPU Mach-O files with only LC_SEGMENT_64 or
>>>> LC_SEGMENT_64 load commands?
>>>>
>>>>
>>>> The latter is correct. I can add support for those other segments in
>>>> another commit, but right now I haven't written the code to do that. I
>>>> don't consider it an error, but I figured the extract should print if it
>>>> encounters a segment it doesn't support.
>>>>
>>>>
>>>> I guess maybe I don’t understand what you are saying.  There are only
>>>> two segment load commands, the LC_SEGMENT_64 and LC_SEGMENT_64.  Are you
>>>> using the word “segment” to mean “load command”?  If not what do you mean
>>>> by a "encounters a segment it doesn't support”?
>>>>
>>>> Would you like me to add support for all of the possible segment types??
>>>>
>>>>
>>>> What other “segment types” are there?  Sorry I might be lost I’m just
>>>> not following you.
>>>>
>>>>
>>>>
>>>> I have added some additional flags to narrow down which segment to dump.
>>>>
>>>>
>>>> Also, I added a flag and some code to print the section to stderr in a
>>>> hex format for testing
>>>>
>>>>
>>>> For test wouldn’t you just extract the sections using your option and
>>>> test that it is the same as a file one you add to the test suite?
>>>>
>>>>
>>>> I could, but how should I do that? I mean, should I just run cmp on
>>>> both files?
>>>>
>>>>
>>>> That’s what I would do.  As that is the functionality you are testing/
>>>>
>>>> Can I even do that if the test is run on a system without cmp? Sorry, I
>>>> was just trying to do the thing where I could rely on FileCheck the most
>>>> here.
>>>>
>>>>
>>>> I’m not a test expert.  Maybe there is a way to test that to files are
>>>> the same with FileCheck.  I guess you might want to see if that is possible.
>>>>
>>>>
>>>> I don’t really understand the need for the code to dump the section to
>>>> stderr.   There is already the option -section to dump Mach-O sections.
>>>> Also seems odd that you use the ‘C’ printf style of coding here.  Maybe
>>>> something more C++ like would fit better with the llvm code if you really
>>>> want to keep this bit.
>>>>
>>>>
>>>> Ah, yeah you're right what I am doing here is weird.
>>>>
>>>> and I've added a test case as well.
>>>>
>>>>
>>>> Seems like this could be a bit smaller and test the extracted file and
>>>> use your -extract-macho-sections-dir option.  Without that it seems like
>>>> the test will write into the location of the input file.
>>>>
>>>>
>>>> Ah right I think I forgot to add that flag to the test.
>>>>
>>>> PL
>>>>
>>>>
>>>> Let me known how this looks to you.
>>>>
>>>> Thanks
>>>>
>>>> PL
>>>>
>>>> On Fri, Aug 5, 2016 at 4:22 PM, Kevin Enderby <enderby at apple.com>
>>>> wrote:
>>>>
>>>>> Hi Puyan,
>>>>>
>>>>> Seems like the functionality would be easier to test if it didn’t
>>>>> write the output files to the same directory as the input file.  And of
>>>>> course you’ll want to add a test case as the llvm tradition with any
>>>>> changes.
>>>>>
>>>>> Also I’m not an llvm style expert but you can likely shorten the logic
>>>>> and not use the a switch statement for your return value from
>>>>> ExtractLoadCommand().  Also it looks like it should get an error with any
>>>>> Mach-O file that has non LC_SEGMENT_64 or LC_SEGMENT_64 commands?  Is that
>>>>> correct and what you want.
>>>>>
>>>>> You also might just want to return a bool for success of failure from
>>>>> ExtractSectionData() and your other routines.
>>>>>
>>>>> Also it llvm seems to like the shortest bit if code so I don’t think
>>>>> you need the “int Res” and could just use an if statement.
>>>>>
>>>>> Kev
>>>>>
>>>>> > On Aug 5, 2016, at 4:06 PM, Puyan Lotfi <puyan.lotfi.llvm at gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hi Kevin, All:
>>>>> >
>>>>> > I have a macho-extract tool kicking around locally that I threw
>>>>> together a while back that simply dumps each section in a MachO to disk,
>>>>> and I've decided it's probably high-time to upstream the functionality into
>>>>> llvm-objdump.
>>>>> >
>>>>> > Let me know if this looks good enough to check in, or otherwise
>>>>> feedback would be nice too.
>>>>> >
>>>>> > Thanks
>>>>> >
>>>>> > PL
>>>>> >
>>>>> >
>>>>> > <llvm-objdump-macho-extract.patch>
>>>>>
>>>>>
>>>> <llvm-objdump-macho-extract2.patch>
>>>>
>>>>
>>>>
>>>>
>>>>
>>> <llvm-objdump-macho-extract3.patch>
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>>
>>
>>
>> --
>> Saleem Abdulrasool
>> compnerd (at) compnerd (dot) org
>>
>>
>>
>
>
> --
> Saleem Abdulrasool
> compnerd (at) compnerd (dot) org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160811/1806f343/attachment-0001.html>