[PATCH] D65633: [Object] Create MutableELFObject Class for Doing Mutations on ELFObjectFiles [Part 3]

James Henderson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 20 03:00:33 PDT 2019


jhenderson added inline comments.


================
Comment at: llvm/include/llvm/Object/MutableELFObject.h:193-194
     /// Return the number of elements in the table.
-    size_t size() const { return Mappings.size(); }
+    size_t size() const {
+      return llvm::count_if(Mappings, [](MappingType &Mapping) {
+        return Mapping.Type != MappingType::Removed;
----------------
abrachet wrote:
> rupprecht wrote:
> > jhenderson wrote:
> > > abrachet wrote:
> > > > rupprecht wrote:
> > > > > Many of these methods are now linear, meaning this library isn't feasible to operate on files that have a large number of symbols (especially if you end up calling these methods in a loop and go quadratic). I think this is fine to experiment with but should be noted, so it can be fixed before adding uses of it.
> > > > FWIW, I think `size` is the only method which gets its complexity changed, this can be solved by just keeping a running count, and having `add` and `remove` methods change `Size`. Does that sound ok?
> > > > 
> > > > Methods like `getMutableSection` index into the table without any checks to deleted entries so those are all O(1), I'm trying to think of others which became O(N^2) but I cannot think of them.
> > > Keeping a running count would make sense to me, if it improves the complexity of other functions.
> > `getNextIndex()` is also linear (with the linear factor being # of removed symbols), making `getFirstIndex()` linear, which makes all the `*_begin()` iterator methods linear.
> > 
> > See rL354597 for a real world motivation with llvm-objcopy. An easy way to need to "remove" a lot of symbols is to extract a single section with `llvm-objcopy --only-section .foo` which will implicitly delete symbols not in that section. If the number of symbols not in that section is large, it will be slow if you keep having to iterate over all removed symbols. (See the test case here if you're interested: https://reviews.llvm.org/D58296?vs=on&id=187690)
> > 
> > That said, there isn't necessarily a need to prematurely optimize this. I think it's fine to just document that the methods are linear, and if we run into regressions when using it on real scenarios, we can rework the data structures.
> > Keeping a running count would make sense to me, if it improves the complexity of other functions.
> I think that `MutableTable::size()` isn't actually even used anywhere anymore I just updated it because it was used previously but its use was replaced by `getEndIndex()`. Should I just remove it then? Keeping a count of the number of elements would only improve the complexity of `size()` I think.
> 
> >getNextIndex() is also linear (with the linear factor being # of removed symbols), making getFirstIndex() linear, which makes all the *_begin() iterator methods linear.
> Oh, right.
> 
> >See rL354597 for a real world motivation with llvm-objcopy. An easy way to need to "remove" a lot of symbols is to extract a single section with llvm-objcopy --only-section .foo which will implicitly delete symbols not in that section. If the number of symbols not in that section is large, it will be slow if you keep having to iterate over all removed symbols. (See the test case here if you're interested: https://reviews.llvm.org/D58296?vs=on&id=187690)
> FWIW currently the deletion of a symbol I think is linear (the use of remove_if) and has the cost of copying the elements where this doesn't have that same cost. Certainly it will be slower if iteration happens many times after.
> 
> > That said, there isn't necessarily a need to prematurely optimize this. I think it's fine to just document that the methods are linear, and if we run into regressions when using it on real scenarios, we can rework the data structures.
> Absolutely :)
> 
> 
> 
If `size()` is not used, get rid of it and update the `getEndIndex()` comment.


================
Comment at: llvm/include/llvm/Object/MutableELFObject.h:129-131
+    /// Gets the next index that hasn't been removed after \param CurrentIndex.
+    /// If there is not an index after \param CurrentIndex which has not been
+    /// removed, this returns the index past the last entry.
----------------
jhenderson wrote:
> Hmm... thinking about it, how about "Gets the index of the next entry that hasn't ..." and "If there is no entry after ..."
> 
> Also "this returns the index one past the last entry" to be more precise.
You missed this bit:

> Also "this returns the index **one** past the last entry" to be more precise.




================
Comment at: llvm/include/llvm/Object/MutableELFObject.h:134
+      assert(CurrentIndex < Mappings.size());
+      return *llvm::find_if(seq(CurrentIndex + 1, (uint64_t)Mappings.size()),
+                            [this](uint64_t Index) {
----------------
MaskRay wrote:
> abrachet wrote:
> > jhenderson wrote:
> > > Maybe the cast indicates that these indexes should be size_t everywhere, since they're indexes into a container? What do you think?
> > Sounds good. I have changed to `size_t` it does make more sense.
> `uint64_t -> size_t` change should be made to Part 1 D64281.
> `uint64_t` -> `size_t` change should be made to Part 1 D64281.
+1 to this




================
Comment at: llvm/unittests/Object/MutableELFObjectTest.cpp:28
+  if (MatchLen)
+    if ((size_t)std::distance(Begin, End) != List.size())
+      return false;
----------------
labath wrote:
> abrachet wrote:
> > jhenderson wrote:
> > > `std::distance` already returns a size_t. No need to cast it.
> > I thought so too but it returns the iterators `difference_type` which on most containers is `ptrdiff_t`. That's a really strange decision by the committee I think, I don't see the rational of not using `size_t` given `distance` will never be negative.
> For random-access iterators, it is valid to plug the begin and end iterators into std::distance in reverse order and get a negative value in return (but only since c++11).
You're both right. My bad.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65633/new/

https://reviews.llvm.org/D65633





More information about the llvm-commits mailing list