[PATCH] D149759: [symbolizer] Support symbol lookup

Thu Sep 7 03:06:59 PDT 2023

sepavloff marked 5 inline comments as done.
sepavloff added inline comments.

================
Comment at: llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp:384
+  json::Object Json = toJSON(Request);
+  Json["Loc"] = std::move(Definitions);
+  if (ObjectList)
----------------
jhenderson wrote:
> It's possible I've missed it, but I don't see any test that shows this array has appropriate contents (I see a few where it is empty, but there needs to be one or more test cases that cover it having contents, at least one of which should cover it having more than one entry).
At the end of `output-style-json-code.test` a test is added that checks the case of more than one entry.

================
Comment at: llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp:357
+  std::vector<object::SectionedAddress> Result;
+  for (const SymbolDesc &Sym : Symbols) {
+    if (Sym.Name.equals(Symbol)) {
----------------
jhenderson wrote:
> I'm a little concerned about the performance of this code. It might be a premature concern, but looping through a list of symbols to find one, for every input symbol, sounds like it could get expensive quite rapidly - if you were naively to use llvm-symbolizer to symbolize an entire object by using its listed symbol names, you'd end up with an n^2 operation, as every search has to go through the entire list.
> 
> The performance of the symbolizer code is considered important, hence why I'm bringing this up. I wonder if it's worth considering changing the `std::vector` used for `Symbols` into another container with more efficient searching performance?
This is a very important aspect of this solution. It however requires substantial code changes, so it is not practical to implement the performance enhancements in this patch. There are few users of this feature right now and GNU `addr2line` also uses linear search, so it is unlikely that performance problems would be noticed immediately.

Some possible applications need `llvm-symbolizer` for large files and many symbols, performance is critical for such cases, so a solution will be elaborated. It requires more efforts than just changing the implementation of `Symbol`, as it is already sorted by address. There are other points where the speed or memory consumption can be improved, they could be treated together.

================
Comment at: llvm/lib/DebugInfo/Symbolize/Symbolize.cpp:238-239
+  auto InfoOrErr = getOrCreateModuleInfo(ModuleSpecifier);
+  if (!InfoOrErr)
+    return InfoOrErr.takeError();
+
----------------
jhenderson wrote:
> Is there testing covering this failure for this specific case?
It is unlikely that this code fails, because `getOrCreateModuleInfo` is called early to check existence and validity of binary file.

================
Comment at: llvm/lib/DebugInfo/Symbolize/Symbolize.cpp:246-247
+  // result.
+  if (!Info)
+    return Result;
+
----------------
jhenderson wrote:
> Ditto.
Actually this code was copied from `symbolize*Common`, it will fail exactly in the same cases when fail those functions.

================
Comment at: llvm/lib/DebugInfo/Symbolize/Symbolize.cpp:254-255
+    if (LineInfo.FileName != DILineInfo::BadString) {
+      if (Opts.Demangle)
+        LineInfo.FunctionName = DemangleName(LineInfo.FunctionName, Info);
+      Result.push_back(LineInfo);
----------------
jhenderson wrote:
> This needs a test case to show that if `Opts.Demangle` is false, the name isn't demangled.
Such tests are added at the end of `symbol-search.test`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149759/new/

https://reviews.llvm.org/D149759