[llvm-dev] llvm-objcopy proposal

Wed Jun 7 11:00:03 PDT 2017

Yes, templating with ELFT works pretty well in LLD. It might be worth
summarizing the best practices of the use of ELFT  we've found in the LLD
development. So here it is.

 - If a function or a data strucutre handle/correspond to on-disk ELF
files, template them with ELFT.
 - Integral types such as Elf{32,64}_{XWord,Word,Addr,Offset} are not
useful and better to avoid. We are using uint{8,16,32,64}_t instead. It
seems it improves readability. (I honestly don't memorize the real types of
these ELFT types.)
 - ELFT::uint, whose size is 32/64 depending on ELF32/64, isn't useful.
Always uint64_t to represent a value that can be 32 or 64. The waste of
doing this is negligible, but it could drastically simplify types because
if your function uses only ELFT::uint, you can de-template that function by
using uint64_t.

On Tue, Jun 6, 2017 at 10:32 PM, Sean Silva via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
>
> On Tue, Jun 6, 2017 at 4:03 PM, Jake Ehrlich <jakehehrlich at google.com>
> wrote:
>
>> Fantastic! Thanks for all of the input! I'll be considering all of it
>> going forward. The plan right now is just to worry about ELF executables
>> and nothing else. I'm very sympathetic to the "llvm-objtool" change. If
>> everyone is cool with it I'll change the name in the next CL to
>> "llvm-objtool".
>>
>> To start out I implemented a very basic ELF64LE specific bit of code. I'm
>> currently looking for reviewers on it. The phabricator link is here:
>> https://reviews.llvm.org/D33964. I'd like to find people willing to
>> review this as I work on this going forward as well. I haven't bothered
>> worrying about it but I imagine that this will template fairly easily to
>> support ELF32LE, ELF32BE, and ELF64BE.
>>
>
> Yep. If you haven't found it, take a look at our "ELFT" infrastructure
> which should allow easily templating this. A really simple example is the
> ELF part of yaml2obj (tools/yaml2obj/yaml2elf.cpp). LLD is another example
> that uses ELFT to work across all 4 combinations.
> ELFT is so easy to use that going forward you probably won't find yourself
> needing to write an initial version for a specific {endian,is64bit}
> combination.
>
> Also, one thing to keep in mind is that types like llvm::ELF::Elf64_Word
> will have the host endianness, which may causes output differences across
> different host platforms if they sneak into the output buffer (we do have
> some big endian bots, and making sure that tool output is deterministic
> across host endianness is a goal of LLVM tools and such differences are
> considered bugs). So you may find yourself wanting to use ELFT even in the
> initial patch. By using ELFT everywhere, you make sure that things are
> guaranteed correct. It's then fairly easy to remove it as needed.
>
> That is exactly what happened in LLD/ELF. We started with everything ELFT
> so there was no chance of bugs, then later on once the project was
> stabilizing we detemplated many places it to make code simpler when there
> wasn't any risk of getting it wrong (for example, in many places you can
> just use uint64_t instead of a type that is 32 or 64 bits depending on
> ELFT). Also, at the point where we were removing the ELFT templating, we
> already had tons of test coverage. AFAIK, thanks to ELFT and that
> methodology, LLD/ELF has had zero (really, *zero*; I can't think of a
> single one) bugs due to endianness/64bit-ness mixups, despite being a tool
> that natively supports all 4 combinations simultaneously and operates on
> endian/64bit dependent values read from object files on almost every single
> line of code. It's very impressive, and big thanks to Michael for all the
> packed_endian_specific_integral / ELFT infrastructure (now if only he
> would get packed_endian_specific_integral into the C++ standard :P).
>
> -- Sean Silva
>
>
>>
>>
>> Would anyone be willing to let me set them as a reviewer going forward
>> for future CLs?
>>
>> On Sun, Jun 4, 2017 at 6:07 PM Sean Silva via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> On Fri, Jun 2, 2017 at 3:52 PM, James Y Knight via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Jun 2, 2017 at 2:34 PM, Ed Maste via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> One additional use case for you: converting from a binary to an ELF
>>>>> object file
>>>>> ```
>>>>> objcopy -I binary -O elf64-x86-64 foo.bin foo.o
>>>>> ```
>>>>> This is sometimes used for embedding binary files for use by drivers
>>>>> and such.
>>>>>
>>>>
>>>> Yea, unfortunately the command-line you actually end up needing is more
>>>> like:
>>>>   objcopy -I binary -Bi386:x86-64 -Oelf64-x86-64 --rename-section
>>>> .data=.rodata,alloc,load,readonly,data,contents --add-section
>>>> .note.GNU-stack=/dev/null
>>>>
>>>> Having to manually invoke objcopy and know what to specify for the -B
>>>> and -O options, and to know you need the .note.GNU-stack section, and how
>>>> to move it into rodata...it's really all quite terrible. Nobody should have
>>>> to do that. :(
>>>>
>>>> There's also the "-b binary" flag to GNU ld (both bfd and gold). But,
>>>> you typically need to do a dedicated "link" for that. You do:
>>>>   ld -r -b binary picture.jpg -o foo.o
>>>> How does ld know what output format to use here? It's gotta just choose
>>>> the default, which is kinda poor...or the user needs to know how to spell
>>>> an "emulation" and output format...
>>>>
>>>
>>> One way to hack around this might be to pass in one of the other object
>>> files in your project, and have the output .o file replace it. Still pretty
>>> hacky and brittle (and hard to integrate into a build system I would think).
>>>
>>>
>>>>
>>>> You could imagine trying to use -Wl to put it with the compile command,
>>>> but what do you use to switch back to the normal object format?
>>>>   gcc main.c -Wl,-b -Wl,binary -Wl,picture.jpg -Wl,-b -Wl,<<something
>>>> to undo binary mode?>>
>>>>
>>>> So, anyways, while this is _possible_ with objcopy, it'd sure be nice
>>>> if you never needed to use it for that...
>>>>
>>>
>>> The other approaches I've seen or can imagine are:
>>>
>>> - Assembler `.incbin` directive (could use it from an inline asm).
>>> - Use a "bin2h" type program which takes a binary and spits out a C file
>>> with a giant uint8_t[] literal in it, then include that in one of your
>>> normal .c files. In theory a C++11 raw string literal could bypass most of
>>> the parsing overhead of a big array literal, but the people that care about
>>> including a binary in their program probably don't care about that.
>>>
>>> -- Sean Silva
>>>
>>>
>>>>
>>>> (BTW, Apple ld actually has an option "-sectcreate SEGNAME SECTNAME
>>>> INPUT_FILE", and the clang driver will pass it through to the linker.)
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/e5aedf50/attachment.html>