[PATCH] D67039: [ELF] Add a spell corrector for "undefined symbol" diagnostics

Fangrui Song via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 4 02:01:45 PDT 2019


MaskRay added inline comments.


================
Comment at: ELF/Relocations.cpp:721
+  StringRef name = sym.getName();
+  for (size_t i = 0, e = name.size(); i != e + 1; ++i) {
+    // Insert a character before name[i].
----------------
ruiu wrote:
> MaskRay wrote:
> > ruiu wrote:
> > > MaskRay wrote:
> > > > grimar wrote:
> > > > > ruiu wrote:
> > > > > > IIUC, this function costs (N+1)*74 where N is the length of a symbol name. Symbol names can be very long. Doesn't this too slow?
> > > > > It's enabled for the first two undefined symbols. Even with large N, like 100 it shouldn't probably be too slow for the error reporting case?
> > > > I think the complexity is fine. I avoid making the size of symtab a factor in the time complexity. Doing mutation in the input is probably fine.
> > > > 
> > > > In a large executable, I get a long mangled name of N=4398 bytes.
> > > > O(74*N) is not too bad.
> > > Maybe you are right. A symbol that is 4398 bytes long is perhaps an outlier, but even for a 1000 byte long symbol, it is 74,000 hash table lookup, which may not too bad. Did you run this code against a long symbol name? If it doesn't feel sluggish, I'm fine.
> > I tested linking a large executable. It took 19.8s in total.
> > 
> > For the long symbol (sym.getName().size() = 4379), if I patched `getAlternativeSpelling` to attempt all suggestions before returning, `suggest` was called 665682 times, and it took about 0.76s, a small portion of the whole link time.
> > 
> > (As a comparison, symVector.size() = 4195114.)
> > 
> I still have a little concern about the speed because 0.76s is not a negligible delay. When you get an undefined error, you are likely to be in a edit-compile-link cycle, and in the cycle you are likely to get an error message as soon as possible. That being said, 0.76s is probably a worst-case number, and if it becomes a problem, we may be able to improve this code, so I'm fine for now.
The pessimistic case has a symbol of 4379 bytes long. I think the length of a typical mangled name is much smaller than 1000.

This is a failed attempt and it tries all suggestions. A successful attempt may take just half of the time.

This is an extremely large executable with 4195114 symbols. A typical application has much less.

The pessimistic 0.76s case assumes the worse case in every factor. For a typical application, I'd be surprised if the typo correction takes even 5% of the pessimistic 0.76s case, i.e 0.038s.


Repository:
  rLLD LLVM Linker

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67039/new/

https://reviews.llvm.org/D67039





More information about the llvm-commits mailing list