[Lldb-commits] [PATCH] D145180: [lldb] Introduce new SymbolFileJSON and ObjectFileJSON

Thu Mar 2 21:11:46 PST 2023

clayborg added a comment.

I like this idea of this, but I would like to see this be a bit more complete. One idea is to remove the ObjectFileJSON::Symbol definition and just use lldb_private::Symbol objects and allow these objects to construct encode and decode from JSON. Then we have the ability to re-create any symbols we need in full fidelity. But that not be the goal in your case, but I don't think it would hurt to use the lldb_private::Symbol as the object we serialize and deserialize from JSON as ObjectFileJSON::Symbol just can't reproduce the full depth of the symbols we want.

>From reading this it looks like your main use case is to supply additional symbols to an existing stripped binary. Is that correct? Are you aware that each dSYM file has a full copy of the unstripped symbol table already? It removes the debug map entries, but each dSYM copies all non debug map symbols inside of it already. So the same thing from this patch can already be accomplished by stripping a dSYM file of all debug info sections and leaving the symbol table alone, and then using this this minimal dSYM file just to get the symbols.

Any idea on where the JSON file will live if it is just a companion file for another mach-o or ELF executable? Will it always be next to the mach-o executable? Will we enable a Spotlight importer to find it like we do for dSYM files?

================
Comment at: lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.cpp:133
+void ObjectFileJSON::ParseSymtab(Symtab &symtab) {
+  // Nothing to do for JSON files, all information is parsed as debug info.
+}
----------------
Don't we want to create a Symtab from the ObjectFileJSON::Symbol vector here? Symbols are not considered debug info. Debug info for functions is supposed to be more rich than just address and name.

================
Comment at: lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h:92-95
+  struct Header {
+    std::string triple;
+    std::string uuid;
+  };
----------------
It would be great to be able to also define sections somewhere. I can think of this being really handy for core file serialization where we generate a core file and then store extra metadata that describes the object file in this JSON format. It would be great to have sections for this, and we really need sections to make ObjectFileJSON be able to represent something that actually looks like a real object file. See Section.h for  what we expect  section to have, but we can just store uint64_t values instead of lldb_private::Address values for the start address of the range for the section.

================
Comment at: lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h:97-100
+  struct Symbol {
+    uint64_t addr;
+    std::string name;
+  };
----------------
A few things to think about:
- some symbols have a byte size (like in ELF or mach-o symbols in the debug map that have pairs)
- some symbol have absolute values that are not actually addresses, there is no way to represent this currently here
- some symbols have symbol types. We could allow symbols to specify a lldb::SymbolType by name?
- If we have symbol types other than just symbols with addresses, some might refer to a sibling symbol by ID or index. It might be good to allow symbols to have integer ids

One suggestion would be to define Symbol as:

```
 struct Symbol {
    uint64_t value;
    std::optional<uint64_t> size;
    std::optional<uint64_t> id; 
    std::optional<SymbolType> type; 
    std::string name;
    bool value_is_address; // Set to true if "value" is an address, false if "value" is just an integer with a different meaning
  };
```

================
Comment at: lldb/source/Plugins/SymbolFile/JSON/SymbolFileJSON.cpp:87
+
+void SymbolFileJSON::AddSymbols(Symtab &symtab) {
+  auto json_object_file = dyn_cast_or_null<ObjectFileJSON>(m_objfile_sp.get());
----------------
If we have no debug info i SymbolFileJSON, then there is really no need to add the symbols here, we can just avoid this whole SymbolFileJSON class and have ObjectFileJSON parse the JSON symbols and add them to the Symtab in:
```
void ObjectFileJSON::ParseSymtab(Symtab &symtab);
```
So if we are not going to add JSON debug info to the ObjectFileJSON schema, then I would vote to remove this class and just do everything from ObjectFileJSON.

================
Comment at: lldb/source/Plugins/SymbolFile/JSON/SymbolFileJSON.cpp:112
+    symtab.AddSymbol(Symbol(
+        /*symID*/ 0, Mangled(symbol.name), eSymbolTypeCode,
+        /*is_global*/ true, /*is_debug*/ false,
----------------
JDevlieghere wrote:
> mib wrote:
> > Should we increment the `symID` here ?
> The IDs are arbitrary, and if we start at zero, we'll have conflicts with the ones already in the symbol table (e.g. the lldb_unnamed_symbols for stripped binaries). We could check the size of the symtab and continue counting from there. Or we can use 0 like we did here to indicate that these values are "special". I went with the latter approach mostly because that's what SymbolFileBreakpad does too. 
regardless of where the code goes that converts ObjectFileJSON::Symbol to lldb_private::Symbol, I would vote that we include enough information in ObjectFileJSON::Symbol so that we don't have to make up symbols IDs, or set all symbol IDs to zero. LLDB relies on having unique symbol IDs in each lldb_private::Symbol instance. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145180/new/

https://reviews.llvm.org/D145180