[Lldb-commits] [PATCH] Patch for LLDB demangler for demangling upon actual language

Fri Jan 30 16:11:21 PST 2015

I do also want to stress that I believe that all the places that are manually checking for "_Z" and the likes switch over to using methods on Mangled so we can avoid the issues you are seeing.

> On Jan 30, 2015, at 4:03 PM, Greg Clayton <clayborg at gmail.com> wrote:
> 
> 
>> On Jan 30, 2015, at 3:50 PM, Zachary Turner <zturner at google.com> wrote:
>> 
>> I agree that if memory is important then we should use the opportunity to reduce memory usage rather than keeping it the same by changing stuff.  But the reason I asked leads into my next question.
>> 
>> I've been thinking about mangling and demangling for a while and how it relates to Windows.  I see a lot of code all over the place that manually inspects mangled names, and usually the code is all custom and handrolled.  (If you're interested I can point you to a bunch of examples).  I don't like this way of doing things and I think it's generally fragile.  There should be one place that's responsible for anything to do with mangling.  All these places that are inspecting strings for _Z or ? should just be calling some class to ask it about the properties of this string.
> 
> Exactly, why can't we just look at the mangled name and look for the prefix and return the language we calculate?
> 
>> The most sensible place to do that, to me, seems like the ABI.  So I'm imagining that there's a Mangler base class, and then from that there is an ItaniumCppMangler, a MsCppMangler, and let's say perhaps a JavaMangler for the purposes of this CL.  Maybe they share some code, but that's not the important part.
> 
> Doesn't windows actually have 2 forms of mangling? Itanium + the $ mangling?
> 
>> 
>> ABI provides a method called getMangler().  It returns a singleton instance (which for Windows would be an MsCppMangler, and for everyone else would be an ItaiumCppMangler).
> 
> Again, why do we need to get so fancy. I would prefer to avoid this if we can just try demangling if it starts with one of the mangling prefixes. 
> 
>> 
>> In the Symbol class, then, all you need to store is the mangled name.  
> 
> And you need to know if the name is mangled in the first place. C function names have no mangling, so if you store the name you can store the name + a flag to say is this mangled.
> 
>> Implement a method in Symbol called getMangler() which looks at m_comp_unit
> 
> With no debug info we have no compile unit and no way to figure out which compile unit a symbol came from. So you can't associate symbols with compile units. Symbol are from symbol tables in the object file, compile units, function, blocks and variables come from debug info which may or may not live in the object file. So what ever you do, just know symbols do not refer to compile units and won't store any compile unit info inside them. You can always take your symbol address and look it up in the debug info and then associate things that way, but there should be no direct reference.
> 
>> and either gets the ABI and calls getCppMangler (if Lang is C++) or a null mangler (if Lang is C)  or a java mangler (if Lang is Java), etc.
> 
> Again, you can't associate symbols with compile units. So we need something else. Again, can't we just look at the prefix and know how to demangle it?
> 
>> 
>> Then, just call the method on it.
>> 
>> All this seems complicated, but the advantage is that now this logic is abstracted for anyone else who wants to use it.  
> 
> It was abstracted before when we were relying on the prefix to be able to demangle. Are you saying this isn't possible now?
> 
>> The Mangler interface could provide such methods as IsGuardVariable() or IsFunction() that things like the interpreter could use by getting the correct mangler from the ABI, for example.  
> 
> Again, this is a question for the Mangled class to answer based solely on the mangled name itself. If would prefer to stick to looking at mangling prefixes if we can. If not, let me know why we can't.
> 
>> And all of the places in the code that currently have hardcoded mangler checks could be made to work in the presence of ABI differences and language differences easily.
> 
> This can be switched to asking the mangled name for its language which will be calculable from the mangled name prefix.
>> 
>> And this doesn't impose any memory penalty on Symbol (and actually reduces the footprint of each Symbol by the size of 1 pointer)
> 
> I would prefer to save this memory to make symbol tables more efficient. We can also change lldb_private::Symbol values using file addresses only and then convert them to lldb_private::Address values on the fly using the section list of the object file.
> 
> So you will need to prove that the Mangled class function that calculates the language is costly by showing it causing slowdowns in a sampling tool before we add the space to a class that is used all over.
> 
> Greg
>