[Lldb-commits] [lldb] Remove redundant symbol lookups in IRExecutionUnit::FindInSymbols (PR #102835)

Greg Clayton via lldb-commits lldb-commits at lists.llvm.org
Wed Aug 14 22:31:35 PDT 2024


clayborg wrote:

> Could you please explain to me why we use different rules in symbol lookups? Namely:
> 
> * [ClangExpressionDeclMap::GetSymbolAddress](https://github.com/llvm/llvm-project/pull/102835/files#diff-5d2da8306a4f4991885836925979f188658789adc8041c37811c243f2cdca24c) doesn't search in the ModuleList if a module is given even if the search provides no results.
> * [SymbolContext::FindBestGlobalDataSymbol](https://github.com/llvm/llvm-project/pull/102835/files#diff-da1a374f5098c39acfebae4b87a261f143a842e613082c2296de7ee28df12a33) searches in the module first, then checks the results; if exactly one external or internal symbol is found returns it. Otherwise, if more than 1 is found it produces an error. But if nothing is found in the module it searches in the module list and repeats the checks.
>   So theoretically there could be multiple results in the module list that would trigger an error, but we don't check for them.
>   Also, it seems an external symbol takes precedence over an internal symbol. If we found an internal symbol in the module, we still could find an external symbol in the module list, but we don't search for it. Is this correct? Does an internal symbol of the module actually take precedence over an external symbol from the module list?
> * [IRExecutionUnit::FindInSymbols](https://github.com/llvm/llvm-project/pull/102835/files#diff-717a850c4315286c025e2739ebe9dacbf27e626b7679c72479b05c996d721112) looks similar to the previous one, but actually very different. It returns early if and only if the found symbol is external(see LoadAddressResolver::Resolve), otherwise it does a full search in the module list. And only then if an external symbol isn't found - returns the first internal symbol(if any).
> 
> It's hard to generalize the optimization since all the callers post-process the results differently.

Yes, each use case is specialized. Since this is the case, maybe the callback mechanism should be used and we give back the results starting with the `hint` module first, and then proceed to the entire module list. If return value of the callback will say "keep going" or "stop". This allows clients to pick and choose what they do.  Unless we can break down the changes into something that fits all of these. 


Most of these lookups are trying to figure out the right thing to do given a symbol context (module, compile unit, function, inlined function etc), and it falls back onto the module list from the target (one for each of the shared libraries loaded into your process) to find results. 

If you think about it, starting with the current module makes sense because that is where the expression is being evaluated (in some stack frame whose PC is in a module). So if we find a symbol in the current module, we can go with it. If we can't find a symbol in the current module, then we want to go with the right one, but there shouldn't be duplicate symbols if those symbols are exported. If a user types `printf(...)`, and lets say we don't find it in our current module, then we should look for an external symbol that matches, and if there is more than one external symbol for `printf`, that would be an error because we wouldn't know which one to call. But if we have a local `printf` function in the current module, we might want that one to take precedence. Of course if `printf` is a method in a class, and our expression is being run from another method in the class, then we should call the `<class>::printf`. But if things fall back to finding symbols, then we need to have some way to resolve ambiguous symbols. So this is the system we came up with.

If you have an expression like `my_global`, you really want to look at the compile unit your expression is being run from, and if that compile unit has a global or static variable named `my_global`, we should pick that one. If there isn't a match in the current compile unit, then we should search in the current module. If it has one, that is most likely the global the user wants. If there isn't one in the current module, then we must search around in all modules and if we find more than one, lets say we find a global variable from `liba.so` where the variable is exported, and another from `libb.so` that is a static variable, then we should pick the global variable to try and match the visibility that a program would have to a global variable from another shared library. The public one wins. But it we have two libraries that both have a exported `my_global` variables, we don't know which to pick, so we return an error.

`IRExecutionUnit::FindInSymbols` might be the function that is called to search for symbols in the first place and `ClangExpressionDeclMap::GetSymbolAddress` might be used to get the address of a symbol by name. Why? Because after we evaluated an expression, we need to hook things up and tell the code generator/linker where this value is. If I have things stated correctly above, I am not sure I do, then we want to make sure we find the original symbol returned by `IRExecutionUnit::FindInSymbols`. But I will defer to the expression parser experts.



https://github.com/llvm/llvm-project/pull/102835


More information about the lldb-commits mailing list