[PATCH] D88827: [llvm-objcopy][NFC] Move core implementation of llvm-objcopy into separate library.

Fri Jan 14 03:54:50 PST 2022

jhenderson added a comment.

In D88827#3242326 <https://reviews.llvm.org/D88827#3242326>, @echristo wrote:

>> I changed my mind a bit since my early comment you linked, having gone cold on the review and now having come back to it. Here's my thinking: the object manipulation performed by the objcopy code is based on an internal Object class that has nothing to do with the object classes within the libObject library. By putting the two inside the same library, we risk confusion ("which type of Object do I need for this functionality?"). There are fundamental differences which would make reusing the libObject classes for the objcopy code less than ideal - it may not even be possible without a lot of work.
>
> The original idea is that these classes -could- be used for this. If that's not true we should go back and fix that rather than have a separate set. That said, we do have a separate set which complicates matters. I think the question I'd ask is "how can we get from here to there?". No straightforward answer here for sure.

I guess I might have misrepresented the internals of the llvm-objcopy classes: to populate them, they do use the libObject code. The llvm-objcopy Object class acts as another layer that provides manipulation functionality on top of the libObject object files. In some ways, it could actually be used purely for reading though, which raises some interesting thoughts which I can't exactly express. If we wanted to change the existing libObject code to support direct manipulation, without this extra layer, we'd need to change how those classes store data - at the moment, they just consist of references into memory buffers, rather than copies stored within the program. As such, it is impossible to change these references without changing how data is backed. The llvm-objcopy library provides this additional layer in essence. I think this shows that, at least at the moment, the llvm-objcopy Object code is a separate layer. Whethere that then means the separate layer should be in a separate library is probably a different point (it means it //can// be, but doesn't mean it //must// be.

>> Regarding the Archive writing code being in libObject: it's my tentative opinion that archive functionality shouldn't be in libObject at all: an archive isn't really an object itself: it's a group of objects. The functionality for archives is completely unrelated to the functionality for other file types supported by the object library. As such, I think it should actually be moved into a separate libArchive library or similar.
>
> From my perspective: ideally an archive should just be a specific type of object that contains other objects (similar to perhaps a shared library if we did that). Reading it just means cracking it, sharing the archive format information, and then iterating through each object accordingly. Writing would be the reverse. Any thoughts on where we're on a different page? :)

I don't think we really disagree here (although I wouldn't call an archive an object in those words). The point I was largely addressing was the earlier point made by @alexander-shaposhnikov about how the object library already allows manipulation of archives, unlike the other object formats.

>> Finally, there's @sbc100's point earlier: whilst it's true that by having the code in the same library doesn't force tools to link in that code, it does require distributions to include the llvm-objcopy library code anytime it needs the libObject code, because it's part of that library, even though it may not need it. I've seen several instances internally where we generate packages of LLVM libraries, without including all of them, so including the llvm-objcopy code in the libObject library would bloat the size of the package.
>
> Are you thinking about things just shipping .a or .o files or something else? Most platforms that I can think of can remove unneeded object code in a binary link so I'm curious about the concern here.

Yeah, we have some cases where we distribute .a files (or more specifically Windows .lib files, but the principle is the same). A couple of concrete examples to illustrate the points: firstly, if I want to build LLD, I don't need the llvm-objcopy code. Let's say llvm-objcopy included 5 files. If those 5 files are included in libObject, they need to be compiled and added to that library, as part fo the build process for building LLD, even though those files aren't used in LLD's link, thus unnecessarily slowing down the build as a whole. Secondly, if I have a distribution that wants to ship LLVM libraries sufficient for developers to read objects, we'd end up distributing those same 5 files as part of libObject, unnecessarily. The end executables won't be any bigger in either case of course.

>> @echristo, @alexander-shaposhnikov, do either of you have any further thoughts? What do you think of my points above? I'm not strongly opposed to it going in libObject, but just think it makes a little more sense to be separate. Also, would either of you be able to give this a review more generally?
>
> I'll do my best :)
>
> -eric

FYI, I'm on leave for the next 2-3 weeks, so won't be able to respond to further points after this for a while.

In D88827#3242580 <https://reviews.llvm.org/D88827#3242580>, @MaskRay wrote:

> llvm-objcopy has its own data models for some binary format specific things, so placing it under the `llvm::object` namespace may lead to some conflicts. Placing it into a sub-namespace does not avoid the conflict.

Not sure I follow how placing things into, say, `llvm::object::objcopy` would have potential for conflicts beyond the current state of the code?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88827/new/

https://reviews.llvm.org/D88827