[PATCH] D41687: [llvm-objcopy] Add support for input types and the -I and -B flags

Mon Jan 8 15:27:27 PST 2018

jakehehrlich added a comment.

On the topic of removing the implicit "giant if" I'm not sure there's a great way to solve this. I've considered solutions involving macros (which are simple but should work) to some rather overly complicated dynamic solutions that check a condition for each ELF type for you and then dispatch to the correct template correspondingly but a) I wasn't able to get them to work properly and b) they were super complicated. In general you're going to have at least 8 branches because a) There are 4 templates and you need to dispatch to the correct one and b) There are two fundamentally different sources of information that need to be considered to decide which of those 4 should be used. Removing the templates we can from Object dosn't help either because how we read in the object and how we decide to write out the object still require that we have the same number of branches.  Using macros to ensure uniformity of branches is one idea that a) works and b) is simple but I think I'd rather know explicitly what's going on.

The macro I have in mind looks like this

FOREACH_ELFT(auto *o = dyn_cast<ELFObjectFile<ELFT>>(&Binary), {

  HandleELF(*o, OutFmt);
  return;

})

ELFT is then typedefed in each scope that's copied 4 times for each type. ELFT is then available in both the block and the condition. The code block is only executed in the case that the condition is true. For the case where MInfo is used we can do something that looks like this:

FOREACH_ELFT(elftMatchesMachine<ELFT>(MInfo), {

  HandleBinary<ELFT>(Input, OutFmt, MInfo);

})

Using those two changes we can ensure uniformity of these dispatches. Coupling that with a function that returns a unique pointer to an Object and uses the output format to decide the which of ELFObject or BianryObject will be used beings our apparent branches down to a more manageable amount. I'm slightly against both of these solutions but I could be convinced otherwise with minimal effort. The output format decider in particular seems fine to me.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:343-345
+  // We need to construct new symbol names but Symbols don't own their names.
+  // We allocate the new symbol names using a string saver that stays around
+  // for the duration of the program.
----------------
jhenderson wrote:
> I wonder if this is an indication that Symbols should own their own names. It would mean a bit of copying in the ELF input case, but could prevent easy-to-make errors if we want to create or rename symbols.
> 
> If you prefer keeping it as is, I'd make a separate function called "MakeBinarySymbolName(StringRef BaseName, StringRef Suffix)", so that the warts of the name ownership can be kept separate from the adding of symbols, and it can be reused in other places too.
> 
> Did you consider making InputBinaryFormat a subclass of Object? That would allow you to have a slightly nicer name ownership resolution, apart from anything else.
I think switching to Symbols owning their names is a good idea. Symbols and relocations are likely to be the sticking point for optimization at some point in the future but I'd rather use the conceptually simplest option now and optimize later when we have an issue. I think the biggest optimization for symbol tables will come from lazy loading and not from optimizing copying of small strings like that.

As for making InputBinaryFormat a subclass of Object I'm not sure. I didn't consider but considering it now I was intending for those sub classes to be the output formats. This change does raise the question of how input formats should be handled however. For instance why is the binary input format handled here but the elf input format is handled inside of Object, that seems kind of off to me. I remember you mentioned an idea a while back about having read and writer types that map in and out of a common representation. Maybe we should refactor the Object code to expose enough of an interface that code outside of Object can reconstruct the ELF Object the way this code does so for the binary input case.

Repository:
  rL LLVM

https://reviews.llvm.org/D41687