[Lldb-commits] [PATCH] D50071: Use rich mangling information in Symtab::InitNameIndexes()

Thu Aug 2 11:22:23 PDT 2018

sgraenitz added a comment.

> (However if you are interested in something like this, then it might be interesting to look at whether this MethodName stuff couldn't be properly pluginified. Something like where a Language class registers a callback for a specific mangling type, and then accessing that through this)

Yes this sounds more like a plan. An isolated refactoring patch without would be appropriate I think. In contrast, making a quick fix just to avoid the `llvm::Any` seems less appealing to me.

================
Comment at: include/lldb/Core/RichManglingInfo.h:83-84
+public:
+  RichManglingInfo *SetItaniumInfo();
+  RichManglingInfo *SetLegacyCxxParserInfo(const ConstString &mangled);
+
----------------
labath wrote:
> sgraenitz wrote:
> > labath wrote:
> > > I find it odd that one of these functions is stateless (`SetLegacyCxxParserInfo` takes the string which initializes the object as an argument), while the other receives the state magically via the IPD accessor.
> > > 
> > > Would it be possible to change it so that the other option is stateless too (`SetItaniumInfo(ConstString mangled)`)?
> > > 
> > > We could change `RichManglingInfo` to return the full demangled name too, so that you have access to the demangled name via this API too (which /I think/ is the reason why you created a separate `GetItaniumRichDemangleInfo` function).
> > > Would it be possible to change it so that the other option is stateless too (`SetItaniumInfo(ConstString mangled)`)?
> > I think that would be misleading for the reader. Symmetry is always nice to indicate congruent behavior, but this is not the case here. The fallback C++ parser version mimics the previous behavior: demangle the name in step 1 and if that succeeds, parse it with `CPlusPlusLanguage::MethodName` to decode its properties. The IPD does that in a single step and so there is nothing to do in `SetItaniumInfo()`. `SetLegacyCxxParserInfo()` on the other hand has to create a `CPlusPlusLanguage::MethodName` from the mangled string.
> > 
> > > We could change `RichManglingInfo` to return the full demangled name too, so that you have access to the demangled name via this API too (which /I think/ is the reason why you created a separate GetItaniumRichDemangleInfo function).
> > The interface wants to provide access to the name's encoded properties. That doesn't really include the demangled "human readable" string representation (it's also not needed in `Symtab::RegisterMangledNameEntry()`).
> > 
> > Of course `Symtab::InitNameIndexes()` will ask for `GetDemangledName()` just after processing the mangled one. I think chances are good that for symbols that are not filtered out anyway, we reached `DemangleWithRichManglingInfo()` earlier and it will be in the string pool already (see Mangled.cpp:322). Otherwise, yes it will be demangled in with a new IPD instance.
> > 
> > I think it's not bad to keep these cases as they are. I would propose to investigate the performance of the current implementation in more detail and think about the benefits before mixing them. Not sure at all, if it would bring measurable gain.
> I don't see how this would be misleading. Both versions provide a way to access the name's properties. The fact that they take slightly different inputs (one takes a mangled name, one demangled) makes them slightly different, but I don't find that too troublesome. We could encode that difference in the method name to make it more obvious. What troubles me more is that you have two-step initialization for the IPD case. It's always nice to avoid that, and that's particularly important when it's asymmetric with the other method.
> 
> Let me try to elaborate on how I'd do this. `RichManglingContext` (ps: maybe it's not correct to call this a *mangling* context, because the second method does not use mangled names. `RichMethodContext` ?) would have two methods:
> `fromItaniumName(ConstString ItaniumMangled)` and `fromCxxMethodName(ConstString Name)`. Both would return `RichManglingInfo` (`RichMethodInfo` ?) as they do now.
> 
> Then the usage would be something like:
> ```
>   case eManglingSchemeItanium:
>     auto *info = context.fromItaniumName(m_mangled);
>     m_demangled.SetCStringWithMangledCounterPart(info->GetFullName(), m_mangled);
>     return info;
> 
>   case eManglingSchemeMSVC:
>     m_demangled.SetCStringWithMangledCounterpart(demangle_manually(m_mangled), m_mangled);
>     return context.fromCxxMethodName(m_demangled);
> ```
> The only change necessary for this is to include the full demangled name in the "name's encoded properties". While that may not be a *property* of the name by the strictest definition, I don't think it's much of a stretch to include it there, and it does not require making any compromises, as both methods already have access to this information.
> 
> So overall, I believe this should have no impact on performance, while improving the "symmetry" of the code (which is somewhat subjective, but I am hoping that we can agree that a having single method is better than having two methods which must be called in the right order). Is there anything I'm missing here?
Ok, sounds reasonable. Let's try that. I will prepare a patch.

================
Comment at: source/Core/RichManglingInfo.cpp:36-40
+RichManglingContext::SetLegacyCxxParserInfo(const ConstString &mangled) {
+  m_info.ResetProvider();
+  m_info.m_provider = RichManglingInfo::PluginCxxLanguage;
+  m_info.m_legacy_parser = new CPlusPlusLanguage::MethodName(mangled);
+  return &m_info;
----------------
labath wrote:
> sgraenitz wrote:
> > labath wrote:
> > > Is this really the **mangled** name?
> > Yes, it is.
> How is that possible? I see nothing in CPlusPlusLanguage::MethodName which would perform the demangling, and yet it clearly operates on demangled names. Also, the LHS of this patch creates MethodName instances like this
> `CPlusPlusLanguage::MethodName cxx_method(mangled.GetDemangledName(lldb::eLanguageTypeC_plus_plus));`
> 
> 
Oh right, I should revisit that.

While we are here: It's a pity that only MSVC-mangled names trigger the fallback case with the new implementation. Thus, my unit test has no coverage for it and I didn't recognize the mistake. I thought about integrating ObjCLanguage parsing here too. On the one hand this would improve test coverage. On the other hand it would cause a semantic change in name indexing (and my ObjC knowledge is quite limited).

What do you think?

https://reviews.llvm.org/D50071