[Lldb-commits] [Patch][Please Review] Add support for GNU indirect functions

Fri Feb 8 11:05:24 PST 2013

On Feb 7, 2013, at 3:05 PM, "Kopec, Matt" <matt.kopec at intel.com> wrote:

> Hi Greg,
> 
> Do you have an opinion on the behaviour of Module::FindSymbolsWithNameAndType whether it should return Symbols which contain no address (ie. dynamic symbols) for code symbols? Or do dynamic symbols need to even be added to the symbol table?

I haven't ever heard the term "dynamic symbols". Is this referring the the symbols in ELF symbol tables that come from "SHT_DYNSYM" versus from "SHT_SYMTAB"? 

No other object files we support has the notion of this (only mach-o and COFF), although mach files segregate the symbols that the dynamic loader needs into chunks so they can easily be found. 

Symbols are anything that are in the symbol table, though we could make new "FindSymbolsXXX" functions that are only looking for symbols that have addresses, or we can add more parameters (no default params if we go this route please...) to specify if the matching symbols must have addresses.

> Sean mentioned he assumes the returned Symbols contain addresses that are callable but in my experience I've found that is not true, at least for ELF on Linux.

I would be interested to see what symbols those are. Are you saying you have some symbols that purport to be Code symbols, but they don't have addresses? This sounds like a bug in the ObjectFileELF object file parser where it isn't correctly classifying its symbols when it parses its own symbol table?? We should attempt to classify all symbols correctly and there shouldn't be any eSymbolTypeCode, eSymbolTypeData, eSymbolTypeTrampoline that don't have addresses. We can easily expand the eSymbolTypeXXX enums to allow ELF to correctly classify any symbols it parses if there are symbols that require it.

So think of the symbols that are parsed by the ObjectFileELF and see if we can improve how symbols are classified to avoid any confusion when doing lookups.

> I need to search the returned list of symbols from this call and find the one which has an actual address.

If we don't currently have a symbol classification issue, we can add more params to the symbol lookup functions that can specify if we want symbols with addresses only.

Does this address all of your concerns?

Greg

> Thanks,
> Matt
> 
> On 2013-01-25, at 7:47 PM, "Kopec, Matt" <matt.kopec at intel.com> wrote:
> 
>> I agree that the expression parser is maybe doing more then it should. The problem I found with Module::FindSymbolsWithNameAndType is that it returns both actual symbols and dynamic symbols. I wasn't sure if this is intentional or not but to get the correct symbol, I had to search through the returned list to find the symbol with an address. Should the symbol table contain symbols of type code with 0x0 addresses or is FindSymbolsWithNameAndType in the wrong here?
>> 
>> Resolving the function can be done at a lower level as you say. If, say, I pass the symbol as a parameter to GetCallableLoadAddress, I can check there whether the symbol is indirect (or maybe check lower) and resolve the function otherwise proceed as normal. Since I have a Target pointer I should be able to access the Process pointer and make the inferior call.
>> 
>> I'm not sure I understand your explanation of GetCallableLoadAddress taking a thread parameter, I'm not sure why it would need one. If we currently call GetCallableLoadAddress for "strlen" on Linux, it will resolve to a legitimate address in libc.so. However, this address is for the "strlen" resolver function which will return to us the address of the "strlen" implementation. We need to decide whether to resolve the indirect function or do what GetCallableLoadAddress does based on whether the symbol is indirect.
>> 
>> Let me know the best way to proceed.
>> 
>> Thanks,
>> Matt
>> ________________________________________
>> From: Sean Callanan [scallanan at apple.com]
>> Sent: Friday, January 25, 2013 5:35 PM
>> To: Kopec, Matt
>> Cc: lldb-commits at cs.uiuc.edu
>> Subject: Re: [Lldb-commits] [Patch][Please Review] Add support for GNU indirect functions
>> 
>> Matt,
>> 
>> So if I understand you correctly there is no case for indirect symbols where the indirection isn't yet resolved in the inferior.  It's just a matter of walking one step further through pointers.  That should address my first concern.
>> 
>> My second concern is that the expression parser is being changed to
>> 
>> - search for non-NULL addresses, and
>> - resolve indirect functions
>> 
>> when it seems like both of these ought to be done lower down.
>> 
>> Right now, the expression parser (ClangExpressionDeclMap::GetFunctionAddress) uses Module::FindSymbolsWithNameAndType to find code symbols, and assumes that this list contains valid entries with addresses that actually are the callable addresses.  I would expect that FindSymbolsWithNameAndType return Symbols that contain Addresses that can transparently be evaluated by the expression parser, no special-casing required.  This would obviate any change on the part of GetFunctionAddress (and also AddOneFunction).
>> 
>> I understand that this is tricky because you need to be able to do an inferior call on a thread to get the value of the pointer.  I'm fine if GetCallableLoadAddress takes a thread as an additional (optional!) argument and fails if you don't pass one in and we need to call a function to determine the resulting address.  Greg may have his own opinion here :)
>> 
>> Sean
>> 
>> On Jan 25, 2013, at 12:24 PM, "Kopec, Matt" <matt.kopec at intel.com> wrote:
>> 
>>> Hi Sean,
>>> 
>>> No problem, I'm fine with Mr. Kopec as well :)
>>> 
>>> Resolution only occurs after we have found a symbol and find that it is indirect. There are two places where the resolution occurs.
>>> -In AddOneFunction, if the function can't be resolved then it won't be added. Actually, it will assert and abort lldb. I don't see a reason why a symbol can't be resolved but maybe there is a better way to handle this?
>>> -In GetFunctionAddress, we will fail if a function cannot be resolved in the target.
>>> 
>>> For the case of calling a function that the program hasn't called yet, this should be fine, as long as the symbol can be found and resolved in the inferior/linked libraries.
>>> 
>>> For instance, test/lang/c/strings/TestCString.py has this test:
>>> 
>>>      self.expect("expression -- (int)strlen(\"hello\")",
>>>                  startstr = "(int) $2 = 5")
>>> 
>>> The binary generated for this test doesn't have a strlen call (strlen is an indirect function) and the test above works as expected.
>>> 
>>> I'm not sure what else lldb could do to force resolution. This is probably the simplest way. I think this is something that should be done on an as needed basis rather then, for instance, resolving all the indirect function addresses when the symbol table is generated.
>>> 
>>> Does this address your concerns?
>>> 
>>> Thanks,
>>> Matt
>>> ________________________________________
>>> From: Sean Callanan [scallanan at apple.com]
>>> Sent: Friday, January 25, 2013 2:21 PM
>>> To: Kopec, Matt
>>> Cc: lldb-commits at cs.uiuc.edu
>>> Subject: Re: [Lldb-commits] [Patch][Please Review] Add support for GNU indirect functions
>>> 
>>> Sorry, I should probably have addressed you as "Matt."  I looked at the headers wrong  :)
>>> 
>>> Sean
>>> 
>>> On Jan 25, 2013, at 11:19 AM, Sean Callanan <scallanan at apple.com> wrote:
>>> 
>>>> Kopec,
>>>> 
>>>> one thing I noticed is that it looks like if the indirect symbol hasn't been resolved yet in the underlying process, it looks like we just give up.  What if the user wants to call a function that the program hasn't yet called?  Are there cases where this would blow up?  Is there something more that LLDB should be able to do to force resolution?
>>>> 
>>>> Sean
>>>> 
>>>> On Jan 18, 2013, at 2:33 PM, "Kopec, Matt" <matt.kopec at intel.com> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> This patch allows indirect functions to work on Linux when evaluating expressions. I'm not sure if this is a feature supported on Mac OS X.
>>>>> 
>>>>> This is needed to support some library functions such as strcat and strlen when used in an expression. If one of these functions are used, the function will run once in the context of the inferior, which after completion, will return the address of the actual implementation of the function.
>>>>> 
>>>>> Some more details on indirect functions if interested: http://www.agner.org/optimize/blog/read.php?i=167
>>>>> 
>>>>> Thanks,
>>>>> Matt
>>>>> <exprfix.patch>_______________________________________________
>>>>> lldb-commits mailing list
>>>>> lldb-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>> 
>>>> _______________________________________________
>>>> lldb-commits mailing list
>>>> lldb-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>> 
>> 
>> 
>> _______________________________________________
>> lldb-commits mailing list
>> lldb-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>