[PATCH] D65633: [Object] Create MutableELFObject Class for Doing Mutations on ELFObjectFiles [Part 3]

Mon Aug 19 10:00:26 PDT 2019

rupprecht added inline comments.

================
Comment at: llvm/include/llvm/Object/MutableELFObject.h:193-194
     /// Return the number of elements in the table.
-    size_t size() const { return Mappings.size(); }
+    size_t size() const {
+      return llvm::count_if(Mappings, [](MappingType &Mapping) {
+        return Mapping.Type != MappingType::Removed;
----------------
jhenderson wrote:
> abrachet wrote:
> > rupprecht wrote:
> > > Many of these methods are now linear, meaning this library isn't feasible to operate on files that have a large number of symbols (especially if you end up calling these methods in a loop and go quadratic). I think this is fine to experiment with but should be noted, so it can be fixed before adding uses of it.
> > FWIW, I think `size` is the only method which gets its complexity changed, this can be solved by just keeping a running count, and having `add` and `remove` methods change `Size`. Does that sound ok?
> > 
> > Methods like `getMutableSection` index into the table without any checks to deleted entries so those are all O(1), I'm trying to think of others which became O(N^2) but I cannot think of them.
> Keeping a running count would make sense to me, if it improves the complexity of other functions.
`getNextIndex()` is also linear (with the linear factor being # of removed symbols), making `getFirstIndex()` linear, which makes all the `*_begin()` iterator methods linear.

See rL354597 for a real world motivation with llvm-objcopy. An easy way to need to "remove" a lot of symbols is to extract a single section with `llvm-objcopy --only-section .foo` which will implicitly delete symbols not in that section. If the number of symbols not in that section is large, it will be slow if you keep having to iterate over all removed symbols. (See the test case here if you're interested: https://reviews.llvm.org/D58296?vs=on&id=187690)

That said, there isn't necessarily a need to prematurely optimize this. I think it's fine to just document that the methods are linear, and if we run into regressions when using it on real scenarios, we can rework the data structures.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65633/new/

https://reviews.llvm.org/D65633