[PATCH] D82812: [llvm-install-name-tool] Merge rpath with id/change

Shoaib Meenai via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 2 20:29:37 PDT 2020


smeenai added inline comments.


================
Comment at: llvm/tools/llvm-objcopy/CopyConfig.cpp:886
           "cannot specify both -rpath %s %s and -rpath %s %s",
-          It1->first.str().c_str(), It1->second.str().c_str(),
-          Old.str().c_str(), New.str().c_str());
+          It1->getFirst().data(), It1->getSecond().data(), Old.data(),
+          New.data());
----------------
The pointer returned by calling `.data()` on a StringRef [might not be null-terminated](https://llvm.org/doxygen/classllvm_1_1StringRef.html#a8ca8dc10ba312fe796d01b0f25b315f8), so I don't think this is safe.

There's a form of createStringError that takes a [twine](https://llvm.org/docs/ProgrammersManual.html#llvm-adt-twine-h) instead of a format string, which is more efficient than the intermediate allocations associated with `.str().c_str()`. Given that this is an error path, efficiency really doesn't matter though :) The twine form should get implicitly used if you do something like

```
"cannot specify both -rpath " + It1->getFirst() + " " + It1->getSecond() + " and -rpath " + Old + " " + New
```

You could also get fancy and use the `*` precision specifier to specify the string length manually, instead of constructing the intermediate `std::string`. See https://en.cppreference.com/w/cpp/io/c/fprintf for details about `*`. Again though, this is purely academic; efficiency doesn't really matter that much for error paths. In other words, just doing `.str().c_str()` as before should be fine.


================
Comment at: llvm/tools/llvm-objcopy/CopyConfig.h:181-182
   std::vector<StringRef> RPathToAdd;
-  std::vector<std::pair<StringRef, StringRef>> RPathsToUpdate;
-  std::vector<std::pair<StringRef, StringRef>> InstallNamesToUpdate;
+  DenseMap<StringRef, StringRef> RPathsToUpdate;
+  DenseMap<StringRef, StringRef> InstallNamesToUpdate;
   DenseSet<StringRef> RPathsToRemove;
----------------
jhenderson wrote:
> sameerarora101 wrote:
> > jhenderson wrote:
> > > Would `StringMap` not work instead of `DenseMap`?
> > I think it would work too. However, if I am not mistaken, while comparing `StringSet` and `DenseSet` in a previous diff, @smeenai  suggested that `StringSet` takes ownership of its strings (and copies them), which would be unnecessary copying (and so `DenseSet` would be better). I thought it would be the same case for `StringMap` vs `DenseMap` and so I implemented with `DenseMap`. What do you think?
> Makes sense. I haven't looked at the data structures themselves.
> 
> FWIW, I don't think taking ownership is necessarily an issue, since there aren't going to be significant numbers of options so the string copying isn't going to be significant.
Yeah, it's hard to say in general. From my understanding of the structures (and it's been a bit since I looked at them, so I may very well be mistaken):
* StringMap copies the string contents, which is unnecessary copying and memory usage.
* DenseMap allocates space for 64 key/value pairs by default (as soon as you insert a single entry), which is also unnecessary memory usage.
* StringMap does a heap allocation for every key you insert.
* DenseMap stores all key/value pairs in a single memory allocation, which is likely more cache-efficient when probing the hash table and iterating over all elements.
* StringMap caches the hash values of its keys, which means it can do cheap comparisons when probing, and doesn't need to recompute hash values when resizing the hash table.
* You can achieve the same with DenseMap by using CachedHashStringRef, however.

Over here, given the expected size of these things, I don't think it will end up mattering at all, but it is interesting to think through :)


================
Comment at: llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp:275
 
+  std::set<std::string> OriginalRPaths;
+  auto FindPair =
----------------
Ktwu wrote:
> sameerarora101 wrote:
> > sameerarora101 wrote:
> > > Ktwu wrote:
> > > > alexshap wrote:
> > > > > jhenderson wrote:
> > > > > > sameerarora101 wrote:
> > > > > > > NOTE: Cannot store `StringRef` in OriginlRPaths (i.e cannot have `set<StringRef> OriginalRPaths`). This is because the StringRef `CurrentRPath` is a reference to the string stored inside LC and thus when we change the payload (via `-rpath` option) `CurrentRPath` is also updated to hold a reference to the new string. Hence I need to store the actual strings inside the set.
> > > > > > You probably don't want a `std::set` here. Have you reviewed [[ https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc | the Programmer's Manual ]] on what set type to use?
> > > > > 1. I wanted to mention it on one of the previous diffs but looks like I forgot - it would be better to place all the details related to updating load commands into a separate helper function rather than directly into handleArgs.
> > > > >    
> > > > > ```
> > > > > static Error updateLoadCommands(const CopyConfig &Config, Object &Obj)
> > > > > ```
> > > > > handleArgs would invoke this function (updateLoadCommands) similarly to removeSections, updateAndRemoveSymbols etc.
> > > > > 
> > > > > Probably it would be even better first to refactor the code this way (on a separate diff) before moving forward if you don't mind
> > > > > (now handleArgs is accumulating the details it should not really be aware of)
> > > > > 
> > > > > 2. Please, take a look at StringSet, StringMap and their interfaces 
> > > > > (in particular, if I am not mistaken they would help you avoid some str() calls below)
> > > > >  
> > > > > Also this probably suggests that Config.RPathsToUpdate, Config.RPathsToUpdate should really be StringMap-s.
> > > > > Also this probably suggests that Config.RPathsToUpdate, Config.RPathsToUpdate should really be StringMap-s.
> > > > 
> > > > +1, although does StringMap have deterministic iteration? Maybe SetVector would work as well?
> > > yup, I have replace it with a `DenseSet<StringRef>`
> > 1. ok, so i refactored the code for updating LCs into a separate function `updateLoadCommands` and call that from `handleArgs`.
> > 
> > > Also this probably suggests that Config.RPathsToUpdate, Config.RPathsToUpdate should really be StringMap-s.
> > 
> > I agree, but I think I should do that in a separate diff considering that it would require changing the way I parse options in `CopyConfig.cpp`. What do you think?
> > 
> > What do you think?
> 
> Definitely do that in a separate diff -- it's much better to keep refactoring changes like that confined to their own diff instead of mixed up with new feature work.
Good point about the deterministic iteration. Given that duplicate `-rpath` commands are an error instead of the last one winning out, I think non-deterministic iteration should be okay, though I haven't thought about it too hard.


================
Comment at: llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp:65-66
   // specified in -delete_rpath.
   for (StringRef RPath : Config.RPathsToRemove) {
     if (RPathsToRemove.count(RPath))
       return createStringError(errc::invalid_argument,
----------------
Not related to this diff, but something I noticed: given that `RPathsToRemove` starts out as a copy of `Config.RPathsToRemove` and you're just removing elements from it, can't you just error out for any remaining entries in `RPathsToRemove` instead?


================
Comment at: llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp:170
+  // Throw errors for invalid RPaths.
+  StringRef Old, New;
+  for (const auto &OldNew : Config.RPathsToUpdate) {
----------------
Can Old and New be declared inside the loop?


================
Comment at: llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp:217
+
+  // Add new RPaths.
+  for (StringRef RPath : Config.RPathToAdd) {
----------------
Nit: do we want to be adding new load commands in a function called `updateLoadCommands`? At least to me that function name seems like it should only be updating existing load commands, since we have a separate `removeLoadCommands` to handle removal. I'll leave it to the more experienced llvm-objcopy reviewers (@alexshap, @jhenderson) to decide if this is okay as-is or if we want a separate `addLoadCommands` function.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82812/new/

https://reviews.llvm.org/D82812





More information about the llvm-commits mailing list