[Lldb-commits] [PATCH] D50071: Use rich mangling information in Symtab::InitNameIndexes()

Thu Aug 2 04:00:21 PDT 2018

labath added a comment.

In https://reviews.llvm.org/D50071#1184960, @sgraenitz wrote:

> > If I understand things correctly, we could avoid circular deps and untyped pointers (or llvm::Any, which is essentially the same thing), by moving CPlusPlusLanguage::MethodName to a separate file.
>
> Banning public nested classes in general is a good practice as long as C++ is lacking rich forward declarations. This is the issue here. If we could forward declare the class, all was great. The problem is, of course, that there is a pattern behind `Language::MethodName` within the language plugins. So changing all of them? Hmm..
>
> > moving CPlusPlusLanguage::MethodName to a separate file.
>
> You mean putting it in a separate file and include that one in the `RichManglingInfo.h`? I am not so familiar with the code yet, but it looks like there are reasons for having `CPlusPlusLanguage` under `Plugins` and separate these from other things like `Core` or `Symbol`. I don't think we should include a `Plugins` header from a `Core` header..

I'd actually draw the line even higher, generic code shouldn't include Plugin code, ever (the SystemInitializer classes being the only exception, I guess). While having dependencies in the headers is more troublesome (e.g. for modules), it is not a line that can easily be held as refactorings (such as yours) which move functionality around (even with a single module) can easily cause a dep which was previously cpp-only to show up in the header too. However, it is true that we are very far from the line I wish to hold, while yours is almost there (I see only one exception where we include some clang stuff from `ClangASTContext.h`). I was not aware of this fact until now, so I do concede that it's good to not make this worse here.

> 
> 
>> Could we do that as a preparatory step for this patch?
> 
> Well, now I spent the time to make the hack nice haha.
>  To be honest, I am happy to talk about a follow-up task here, but doing this now and before finishing the demangling sounds like a big unrelated piece of work.

FWIW, I was not thinking of any major changes. Just moving the single c++ class to a separate file and that's it. We could even leave behind a typedef in CPlusPlusLanguage so that it's accessible under the old name too.

(However if you are interested in something like this, then it might be interesting to look at whether this MethodName stuff couldn't be properly pluginified. Something like where a Language class registers a callback for a specific mangling type, and then accessing that through this)

================
Comment at: include/lldb/Core/RichManglingInfo.h:83-84
+public:
+  RichManglingInfo *SetItaniumInfo();
+  RichManglingInfo *SetLegacyCxxParserInfo(const ConstString &mangled);
+
----------------
sgraenitz wrote:
> labath wrote:
> > I find it odd that one of these functions is stateless (`SetLegacyCxxParserInfo` takes the string which initializes the object as an argument), while the other receives the state magically via the IPD accessor.
> > 
> > Would it be possible to change it so that the other option is stateless too (`SetItaniumInfo(ConstString mangled)`)?
> > 
> > We could change `RichManglingInfo` to return the full demangled name too, so that you have access to the demangled name via this API too (which /I think/ is the reason why you created a separate `GetItaniumRichDemangleInfo` function).
> > Would it be possible to change it so that the other option is stateless too (`SetItaniumInfo(ConstString mangled)`)?
> I think that would be misleading for the reader. Symmetry is always nice to indicate congruent behavior, but this is not the case here. The fallback C++ parser version mimics the previous behavior: demangle the name in step 1 and if that succeeds, parse it with `CPlusPlusLanguage::MethodName` to decode its properties. The IPD does that in a single step and so there is nothing to do in `SetItaniumInfo()`. `SetLegacyCxxParserInfo()` on the other hand has to create a `CPlusPlusLanguage::MethodName` from the mangled string.
> 
> > We could change `RichManglingInfo` to return the full demangled name too, so that you have access to the demangled name via this API too (which /I think/ is the reason why you created a separate GetItaniumRichDemangleInfo function).
> The interface wants to provide access to the name's encoded properties. That doesn't really include the demangled "human readable" string representation (it's also not needed in `Symtab::RegisterMangledNameEntry()`).
> 
> Of course `Symtab::InitNameIndexes()` will ask for `GetDemangledName()` just after processing the mangled one. I think chances are good that for symbols that are not filtered out anyway, we reached `DemangleWithRichManglingInfo()` earlier and it will be in the string pool already (see Mangled.cpp:322). Otherwise, yes it will be demangled in with a new IPD instance.
> 
> I think it's not bad to keep these cases as they are. I would propose to investigate the performance of the current implementation in more detail and think about the benefits before mixing them. Not sure at all, if it would bring measurable gain.
I don't see how this would be misleading. Both versions provide a way to access the name's properties. The fact that they take slightly different inputs (one takes a mangled name, one demangled) makes them slightly different, but I don't find that too troublesome. We could encode that difference in the method name to make it more obvious. What troubles me more is that you have two-step initialization for the IPD case. It's always nice to avoid that, and that's particularly important when it's asymmetric with the other method.

Let me try to elaborate on how I'd do this. `RichManglingContext` (ps: maybe it's not correct to call this a *mangling* context, because the second method does not use mangled names. `RichMethodContext` ?) would have two methods:
`fromItaniumName(ConstString ItaniumMangled)` and `fromCxxMethodName(ConstString Name)`. Both would return `RichManglingInfo` (`RichMethodInfo` ?) as they do now.

Then the usage would be something like:
```
  case eManglingSchemeItanium:
    auto *info = context.fromItaniumName(m_mangled);
    m_demangled.SetCStringWithMangledCounterPart(info->GetFullName(), m_mangled);
    return info;

  case eManglingSchemeMSVC:
    m_demangled.SetCStringWithMangledCounterpart(demangle_manually(m_mangled), m_mangled);
    return context.fromCxxMethodName(m_demangled);
```
The only change necessary for this is to include the full demangled name in the "name's encoded properties". While that may not be a *property* of the name by the strictest definition, I don't think it's much of a stretch to include it there, and it does not require making any compromises, as both methods already have access to this information.

So overall, I believe this should have no impact on performance, while improving the "symmetry" of the code (which is somewhat subjective, but I am hoping that we can agree that a having single method is better than having two methods which must be called in the right order). Is there anything I'm missing here?

================
Comment at: source/Core/RichManglingInfo.cpp:36-40
+RichManglingContext::SetLegacyCxxParserInfo(const ConstString &mangled) {
+  m_info.ResetProvider();
+  m_info.m_provider = RichManglingInfo::PluginCxxLanguage;
+  m_info.m_legacy_parser = new CPlusPlusLanguage::MethodName(mangled);
+  return &m_info;
----------------
sgraenitz wrote:
> labath wrote:
> > Is this really the **mangled** name?
> Yes, it is.
How is that possible? I see nothing in CPlusPlusLanguage::MethodName which would perform the demangling, and yet it clearly operates on demangled names. Also, the LHS of this patch creates MethodName instances like this
`CPlusPlusLanguage::MethodName cxx_method(mangled.GetDemangledName(lldb::eLanguageTypeC_plus_plus));`

https://reviews.llvm.org/D50071