[Lldb-commits] [PATCH] D67390: [LLDB][ELF] Load both, .symtab and .dynsym sections

Greg Clayton via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Tue Sep 17 08:43:05 PDT 2019


clayborg added a comment.

In D67390#1672210 <https://reviews.llvm.org/D67390#1672210>, @kwk wrote:

> @clayborg thank you for this explanation. My patch for minidebuginfo is already done in D66791 <https://reviews.llvm.org/D66791> . But this one here I was asked to pull out as a separate patch. For good reasons as we see. I wonder how this extra parameter `SymbolMapType` of yours does help. In the end I have to identify duplicates. But if no symbol with the same **name** should be added then why do I care about where the symbol is coming from?


The uniqueness is for symbols with the same name and file address and size. You can have multiple symbols with the same name, and each on could have a different address. We want there to only be one symbol per ObjectFile that has the same name + addr + size. That way when we ask for symbols by name, we don't have to end up getting more than one symbol for something that is the same thing.

> Please help me understand of follow my thoughts here:
> 
> When I'm in the `(lldb)` console and type `b foo` I expect LLDB to set a breakpoint on the **function** foo, right? The type of the symbol `foo` is deduced as **function**.

Breakpoints ask for symbols whose type is lldb::eSymbolTypeCode and that match the name. The name matching is much more complex than you would think though. "b foo" by default turns into 'set a breakpoint on functions whose "basename" matches foo'. This means a C function named 'foo', any C++ method (stand alone function or class method) whose basename is 'foo' (bar::foo(int)", "foo(int)", "foo(float)", "std::a::b::foo()", many more) and any objective C function whose selector is 'foo' ("-[MyClass foo]", "+[AnotherClass foo]", and any other basename from any other language.

If you type "b foo::bar" this will end up looking up all functions whose basename is "bar" and then making sure any found matches contain "foo::bar".

> I ask this question because `Symtab` has no function to just search for a symbol by name; instead you always have to pass an address, a type or an ID:
> 
>   Symbol *FindSymbolByID(lldb::user_id_t uid) const;
>   Symbol *FindSymbolWithType(lldb::SymbolType symbol_type,
>   size_t FindAllSymbolsWithNameAndType(ConstString name,
>   size_t FindAllSymbolsWithNameAndType(ConstString name,
>   size_t FindAllSymbolsMatchingRexExAndType(
>   Symbol *FindFirstSymbolWithNameAndType(ConstString name,
>   Symbol *FindSymbolAtFileAddress(lldb::addr_t file_addr);
>   Symbol *FindSymbolContainingFileAddress(lldb::addr_t file_addr);
>   size_t FindFunctionSymbols(ConstString name, uint32_t name_type_mask,
>    

FindAllSymbolsWithNameAndType() takes a name. The symbol type can be lldb::eSymbolTypeAny, so yes there is way to search. So there is a way to search for a symbol by name. The main issue is we are still parsing the symbol table and we don't want to initialized the name lookup table in the symbol table just yet since we are still adding symbols to the complete list. This is why an extra map that is used only during parsing of the symbol table makes sense.

> So my point of this whole question is: What makes a symbol unique in the sense that it shouldn't be added to the symtab if it is already there?

Symbol  is unique within one representation of an ObjectFile where the symbol has the same name, address and size and type.

> Shouldn't the type of the symbol together with it's name define uniqueness? We shouldn't care about where the symbol is coming from nor if it is located at a different address.

We don't care where the symbol comes from as long as it is representing information for one ObjectFile. I would contend that if you have an "a.out" binary that has a .gnu_debugdata that points to "a.out.gnu_debugdata" that we would have one single ObjectFile that represents "a.out" and give the best description of what "a.out" contains.

We do care if a symbol has a different address. You can have as many static functions called "foo" as you want in a single binary. They are each unique since they have different addresses. So if you have 10 source files where 3 of those sources have a symbol "foo", we want there to be 3 different symbols with the same name, different addresses and possibly different sizes.

> Well, if there's an overloaded function `foo(int)` and `foo(char*)` then both symbols are of type **function** and they both share the same **name**. When you type `b foo` you DO want 2 breakpoints to be set.

Yes we do! One breakpoint in LLDB has N breakpoint locations. A breakpoint of any kind (source file and line, function name breakpoint, and more) all constantly adding and removing locations as shared libraries get loaded. So if you have "foo(int)" and "foo(char *)" and say "b foo" we would end up with one breakpoint whose name matches "foo" with two locations.

> Hence, niqueness cannot be defined over the name and the type . But wait, the **name** is mangled, so it IS unique enough unless I use `Symbol::GetNameNoArguments()`; there only the name is returned.

Again, there can be N breakpoints with the same name and different addresses. We are just trying to avoid a symbol table that has 3 symbols all with the name "foo", address of 0x1000 and size of 0x10. Why? Because the information is redundant and is just noise. The map I suggested would track these symbols so we don't end up adding multiple of the same symbols. Again, name address and size must all match. We _can_ have multiple symbols at the same address with different names. How? Linkers often have aliased symbols that point to the same thing. if we have a symbol "foo" at 0x1000 with size 0x10, we can also have a symbol "foo_alias" at 0x1000 and size 0x10. We need both so that we can set breakpoints correctly when setting breakpoints by name.

> Here's my naive approach to test the admittedly very weird thought process from above: https://github.com/kwk/llvm-project/commit/5da4559a00c73ebefd8f8199890bd1991c94fa3f

I will take a look.

Let me know if you have any questions about what I said above.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67390/new/

https://reviews.llvm.org/D67390





More information about the lldb-commits mailing list