[PATCH] D149759: [symbolizer] Support symbol lookup

Tue Sep 5 01:03:45 PDT 2023

jhenderson added inline comments.

================
Comment at: llvm/docs/ReleaseNotes.rst:326
   to improve correctness and clarity.
+  
+* llvm-symbolizer and llvm-addr2line now support addresses specified as symbol names.
----------------
ikudrin wrote:
> Please remove the spaces in the blank line.
Not addresssed?

================
Comment at: llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h:110
+                                               StringRef Symbol);
+  Expected<std::vector<DILineInfo>> findSymbol(const std::string &ModuleName,
+                                               StringRef Symbol);
----------------
This is a bit of a nit-pick, but it is throwing me off a bit having both `const std::string &` and `StringRef` arguments for this overload of `findSymbol`. I see why you've done it (similarity with other overloaded functions above and below), but I feel like it should probably still use the correct and consistent style (`StringRef`) for both.

================
Comment at: llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp:384
+  json::Object Json = toJSON(Request);
+  Json["Loc"] = std::move(Definitions);
+  if (ObjectList)
----------------
It's possible I've missed it, but I don't see any test that shows this array has appropriate contents (I see a few where it is empty, but there needs to be one or more test cases that cover it having contents, at least one of which should cover it having more than one entry).

================
Comment at: llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp:357
+  std::vector<object::SectionedAddress> Result;
+  for (const SymbolDesc &Sym : Symbols) {
+    if (Sym.Name.equals(Symbol)) {
----------------
I'm a little concerned about the performance of this code. It might be a premature concern, but looping through a list of symbols to find one, for every input symbol, sounds like it could get expensive quite rapidly - if you were naively to use llvm-symbolizer to symbolize an entire object by using its listed symbol names, you'd end up with an n^2 operation, as every search has to go through the entire list.

The performance of the symbolizer code is considered important, hence why I'm bringing this up. I wonder if it's worth considering changing the `std::vector` used for `Symbols` into another container with more efficient searching performance?

================
Comment at: llvm/lib/DebugInfo/Symbolize/Symbolize.cpp:238-239
+  auto InfoOrErr = getOrCreateModuleInfo(ModuleSpecifier);
+  if (!InfoOrErr)
+    return InfoOrErr.takeError();
+
----------------
Is there testing covering this failure for this specific case?

================
Comment at: llvm/lib/DebugInfo/Symbolize/Symbolize.cpp:246-247
+  // result.
+  if (!Info)
+    return Result;
+
----------------
Ditto.

================
Comment at: llvm/lib/DebugInfo/Symbolize/Symbolize.cpp:254-255
+    if (LineInfo.FileName != DILineInfo::BadString) {
+      if (Opts.Demangle)
+        LineInfo.FunctionName = DemangleName(LineInfo.FunctionName, Info);
+      Result.push_back(LineInfo);
----------------
This needs a test case to show that if `Opts.Demangle` is false, the name isn't demangled.

================
Comment at: llvm/test/tools/llvm-symbolizer/symbol-search.test:24
+
+# A symbol name may be resolved into more than one location.
+
----------------

================
Comment at: llvm/test/tools/llvm-symbolizer/symbol-search.test:36
+
+# More than one symbols may be specified.
+
----------------

================
Comment at: llvm/test/tools/llvm-symbolizer/symbol-search.test:8
+## Command "CODE" supports search by symbol name.
+#
+# RUN: llvm-addr2line --obj=%p/Inputs/symbols.so "CODE func_01" | FileCheck --check-prefix=CODE-CMD %s
----------------
jhenderson wrote:
> See my above comment: as this comment is tightly tied to the test case that immediately follows, it doesn't need a blank line (and regardless, blank lines generally don't have a comment marker, unless they are separating paragraphs within a longer comment).
Not addressed. For a concrete example, I personally recommend the following format:

```
# Comment
RUN: ...
RUN: ...

CHECK: ...
CHEC-NEXT:

# Next test case comment
RUN: ...
```

================
Comment at: llvm/test/tools/llvm-symbolizer/symbol-search.test:13-14
+
+## GNU addr2line accepts symbol names in command line, llvm-symbolizer also allows
+## this in both GNU and LLVM modes.
+#
----------------
jhenderson wrote:
> I don't think you should reference GNU here - that's a motivation for the behaviour, but not for the test case for that behaviour. Rather, what this comment could say is "Show that llvm-addr2line and llvm-symbolizer accept symbol names on the command-line."
Not addressed.

================
Comment at: llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp:239
   // "0x" or "0X" prefix; do the same for compatibility.
+  bool StartsWithHexPrefix = false;
   if (IsAddr2Line)
----------------
`StartsWithHexPrefix` appears to be unused?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149759/new/

https://reviews.llvm.org/D149759