[PATCH] D46369: [llvm-objcopy] Add --strip-symbol (-N) option

Paul Semel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue May 8 16:10:19 PDT 2018


paulsemel added inline comments.


================
Comment at: tools/llvm-objcopy/Object.cpp:925
+    if (ToRemove(Sym)) {
+      const auto &It = std::find_if(std::begin(Groups), std::end(Groups),
+                                    [&Sym, this](const SectionBase *Sec) {
----------------
jakehehrlich wrote:
> paulsemel wrote:
> > jakehehrlich wrote:
> > > This is the wrong way to do this.  Link and info are not the only way to reference symbols and you can't know how those sections are using Link/Info here so you might get false positives. This is also unacceptably inefficient. For instance we support chromium and chromium uses --ffunction-sections. That would make this an n^2 loop where n = about 60k (also n will grow to over 100k after I implement large section indexes properly. They're currently using tricks to only run llvm-objcopy on fewer sections at a time.)  It's actually worse than n^2 because data symbols are also present. So this is unacceptably inefficient as is. We need a linear time algorithm for removing symbols. In general I want to be linear with respect to number of symbols and n*lg(n) with respect to sections (and I'd like to push for linear in the case where no program headers exist but I don't think that's the case right now).
> > > 
> > > I didn't think about this at the time but its slightly tricky to get this efficient. I think instead of using 'removeSymbolReferences' giving every section a 'removeSymbols' method that takes a predicate might be called for. The idea is that each section can a) ignore b) throw an error or c) actually remove symbols.
> > > 
> > > Then you're code is a loop over sections, one of which contains a loop over symbols. You also don't need the extra loop over sections to get the group sections.
> > Arf.. I was totally focused on groups, and totally forgot about this `-ffunction-sections` option. So yeah, in this case, this might definitely grow very much the size of the `Groups` (which name doesn't fit anymore btw). Sorry for not thinking about it before..
> > 
> > Anyway, there is something I am missing with the implementation you are suggesting. Basically, what you're saying is that our principal `removeSymbols` function will call another `removeSymbols` function on every section. The problem is that only `SymbolTable` seciton might be able to access symbols, despite we might need to issue a check in for example group sections, to know whether the predicate includes the symbol we are actually referencing. I was thinking about passing the `SymbolTable` as an argument so that we might be able to search for the referencing symbol and check whether it matches the predicate, but it would, again, lead to inefficient code...
> > 
> > The second thing I was thinking about was to get nested predicates from each sections (except `SymbolTable`), and then call the `SymbolTable` `removeSymbols` function with this last predicate so that other sections might be able to check for conflicts in symbols they are referencing without impacting the complexity of the algorithm.
> > 
> > Am I getting wrong ? What do you think about it ?
> I don't think I understand the issue you're having with SymbolTables. The only objects that can store references to symbols are sections. If we loop over the sections they'll either fatally error out or all succeed together (meaning removing the section was valid). I don't think other sections need access to SymbolTable in order to remove their own internal references (or to throw an error).
There is actually no issues, I just confused myself, but I then understood the whole thing and changed the patch (as you can see). Sorry about it..  :/


Repository:
  rL LLVM

https://reviews.llvm.org/D46369





More information about the llvm-commits mailing list