[cfe-dev] [RFC] Improving import failure checking strategy inside ASTImporter

Adrian Prantl via cfe-dev cfe-dev at lists.llvm.org
Fri Apr 27 16:08:06 PDT 2018

Adding lldb-dev to the thread, since LLDB is a primary user of ASTImporter.

> On Apr 27, 2018, at 8:54 AM, Aleksei Sidorin via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> Hello Gabor,
> Thank you for the reply!
> 26.04.2018 12:39, Gábor Márton пишет:
>> Hi Aleksei,
>> It is a great and welcoming initiative to identify the weaknesses in
>> the ASTImporter.
>> About, handling the error case after a call:
>> I couldn't agree more that this is a burden to always have a check at
>> the call site, but this is the price we pay for not having exceptions.
>> By using macros I have a feeling that the code would be even less
>> readable because we would hide control flow modifications inside the
>> macro, also because finally we'd need to create so many different
>> kinds of macros.
>> As far as I know, there are cases where an `Import` can return a
>> `nullptr` or a default even in case of non-error situations, please
>> correct me if I am wrong. Thus, introducing an optional like return
>> type would require to discover all these cases. Still, I think this
>> would worth it, because the enforcement of error checking (as you
>> mentioned), plus it would make it more explicit which situation is
>> really an error.
> Yes, this is what I described as problem (2).
>> Actually, I think this should be done with a few large commits, e.g.
>> once we change the return type of `ASTImporter::Import(Decl*)` to
>> `Optional<Decl*>` then we have to change all call sites.
>> By introducing an `ImportOrError` and then doing the changes gradually
>> would result the same final state as if we had done it in one larger
>> step where we changed all call sites, but until all call sites are
>> changed we have to maintain both `Import` and `ImportOrError`.
>> Beside the problem of handling the error case after an `Import` call
>> we have observed several other weaknesses, and would like to hear your
>> opinion about them as well.
>> So, the list of problems so far:
>> - Do the same things after and before a new AST node is created (::Create)
>> The original import logic should be really simple, we create the node,
>> then we mark that as imported, then we recursively import the parts
>> for that node and then set them on that node.
> That sounds pretty good, and this is what I was thinking about some time ago. You mean something like CreateDeserialized?
> The problem here are Types: their creation requires all dependencies to be imported before the Type is created. I don't know how to deal with it.
>> However, the AST is actually a graph, so we have to handle circles.
>> If we mark something as imported (Imported()) then we return with the
>> corresponding To whenever we want to import that node again, this way
>> circles are handled.
>> In order to make this algorithm work we must ensure things, which are
>> not enforced in the current ASTImporter:
>> * There are no Import() calls in between any node creation (::Create)
>> and the Imported() call.
>> * There are no VisitXXX() calls in any other VisitYYY() function, only
>> call of Import() allowed.
>> * Before actually creating an AST node (::Create), we must check if
>> the Node had been imported already, if yes then return with that one.
>> One very important case for this is connected to templates: we may
>> start an import both from the templated decl of a template and from
>> the template itself.
>> There are good news, the first and third points can be enforced by
>> providing a variadic forwarding template for the creation, we are
>> currently preparing to upstream this change from our fork
>> (https://github.com/Ericsson/clang/pull/354/files <https://github.com/Ericsson/clang/pull/354/files>) but this is going
>> to be a huge change.
>> Still, I have no idea how to enforce the second point. (Maybe a checker?)
> If I remember correctly, this pattern is not widely used and such code can be refactored easily.
>> - Decl chains are not imported
>> Currently we import declarations only if there is no available definition.
>> If in the "From" TU there are both a declaration and a definition we
>> import only the definition.
>> Thus we do not import all information from the original AST.
>> One example:
>>     struct B { virtual void f(); };
>>     void B::f() {}
>> Here, when we import the definition, then the virtual flag is not
>> getting imported, so finally we import a function as a non-virtual
>> function.
>> An other problem is with recursive functions:
>>     void f(); void f() { f(); }
>> When we import the prototype then we end up having two identical
>> definitions for f(), and no prototype.
>> We again preparing a patch for this, but this solves the problem only
>> in case of functions.
> Yes, I have met this problem before. I tried to import all redeclaration chain but haven't completed the work. One of problems here is how to stick the redeclaration chain to an existing declaration.
>> By not importing the whole decl chain we loose all attributes which
>> may be present only in declarations.
>> - Structural Equivalency
>> There are several holes here. E.g. we do not check the member
>> functions in a class for equivalency, only members.
>> Thus, we may end up stating that a forward decl is equivalent with a
>> class definition if the class does not have data members.
>> There are no tests at all for structural equivalency.
>> In our fork we have some tests, and planning to upstream them too.
>> - Minimal import
>> This feature is used by LLDB, but not at all in CTU. This feature
>> makes the code bigger and more complex.
>> There are 0 unit tests for minimal import, whatever information we
>> have about this we can get only from the lldb repo.
> I think we need to discuss this with LLDB folks (lldb-dev?). Unfortunately, Sean Callanan (who was an active reviewer and committer) left the project and Jim Ingham never participated in ASTImporter reviews.
>> - Lookup
>> The way we lookup names during an import of a decl is getting really
>> complex, especially with respect to friends.
> What is even worse is that the lookup (which is pretty harmless itself) requires import of DeclContexts - possibly, including the declaration we are going to search for (and that makes its logic much more complicated). 
>> We find an already imported function from an unnamed namespace, thus
>> we simply skip the import of a function in an other TU if that has the
>> same name and that is also in an unnamed namespace
>> (https://github.com/Ericsson/clang/issues/335 <https://github.com/Ericsson/clang/issues/335>).
> Never noticed this problem before, thank you. I'm also not sure if import of TU-local data structures with same names is performed correctly now.
>> We still use `localUncachedLookup` which can be very inefficient, we
>> should be using `noload_lookup`.
> This can be true. Do you have any numbers? For me, import correctness has greater priority now than speed optimizations, but this can be useful for future improvements.
>> Thanks,
>> Gabor
>> On Wed, Apr 25, 2018 at 5:52 PM, Aleksei Sidorin via cfe-dev
>> <cfe-dev at lists.llvm.org> <mailto:cfe-dev at lists.llvm.org> wrote:
>>> Hello,
>>> I'd like to discuss the idea of changing the way how we check for import
>>> issues  in ASTImporter and ASTNodeImporter.
>>> 1. Introduction
>>> ASTImporter is now extensively used in Cross-TU analysis and is in active
>>> development. However, we find (and introduce) many errors related to how
>>> import failures are handled now.
>>> Currently, ASTImporter::Import() and ASTNodeImporter::Visit*() methods
>>> return nullptr for pointers and default value in case of conflict. And this
>>> approach brings many issues due to the following reasons.
>>> 1. We have to check the return result and take actions explicitly.
>>> The common strategy of failure handling is very simple - just return. While
>>> it is very simple, almost all the code of ASTImporter consists of such
>>> explicit checks making it much harder to read.
>>> 2. Both nullptr and default value are valid by itself so we have to place
>>> explicit checks to check if a failure happened. Example:
>>> Decl *To = Importer.Import(From);
>>> if (From && !To)
>>>   return nullptr;
>>> This doesn't look well and it's easy to forget where From can be nullptr and
>>> where it is impossible.
>>> 3. Error checking is not enforced.
>>> We have to check for import errors every time Import() is called. And it is
>>> extremely easy to forget placing a check after a call. This error pattern is
>>> so common in ASTImporter that I was going to write a CSA check for this.
>>> 2. Possible solutions.
>>> The problem with enforcing checks is the most annoying one but also looks to
>>> be the simplest - we can just replace the return value with Optional or
>>> ErrorOr so clients will be enforced to perform a failure checking. This
>>> replacement can include two changes.
>>> 1. A number of large commits that change the return type of
>>> ASTNodeImporter::Visit*() methods. And I'd like to hear the comments from
>>> existing users that have their own improvements of ASTImporter because they
>>> have to change their code not touched by the commit as well - there be some
>>> pain.
>>> 2. Introducing an alternative method in ASTImporter that returns ErrorOr.
>>> So, we can migrate to the new, say, ImportOrError(), by small patches.
>>> This replacement also solves the problem (2) because value and error are now
>>> separated.
>>> The problem of improving readability of these checks is a bit harder. The
>>> most straightforward approach is to create a macro like:
>>> #define IMPORT_PTR_INTO_VAR(VarName, Type, From) \
>>>   Type *VarName = nullptr;                                 \
>>>   {                                                        \
>>>     auto MacroImportRes = Importer.Import(From);           \
>>>     if (!MacroImportRes)                                   \
>>>       return MacroImportRes.getError();                    \
>>>     VarName = *MacroImportRes;                             \
>>>   }
>>> Same for non-pointer type. So, the final code will look like this:
>>> ErrorOr<Expr *> ASTNodeImporter::VisitCXXMemberCallExpr(CXXMemberCallExpr
>>> *E) {
>>>   IMPORT_QUAL_TYPE_INTO_VAR(T, E->getType())
>>>   IMPORT_PTR_INTO_VAR(ToFn, Expr, E->getCallee())
>>>   SmallVector<Expr *, 4> ToArgs(E->getNumArgs());
>>>   IMPORT_CONTAINER(E->arguments(), ToArgs);
>>>   return new (Importer.getToContext()) CXXMemberCallExpr(
>>>         Importer.getToContext(), ToFn, ToArgs, T, E->getValueKind(),
>>>         Importer.Import(E->getRParenLoc()));
>>> }
>>> But macros don't look clean. Unfortunately, I still haven't found a better
>>> option so any suggestions are very welcome.
>>> PS. The error handling pattern we wish to get is very close to exception
>>> handling in C++ because usually we don't want to perform local error
>>> handling but centralized instead. With a possibility to just throw an
>>> exception and catch it in the ASTImporter entry point (returning an
>>> ErrorOr), error handling in ASTImporter becomes almost trivial and pretty
>>> clean. But we don't use exceptions in LLVM and clang and I don't know a
>>> similar functionality in LLVM utilities.
>>> --
>>> Best regards,
>>> Aleksei Sidorin,
>>> SRR, Samsung Electronics
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
> -- 
> Best regards,
> Aleksei Sidorin,
> SRR, Samsung Electronics
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180427/c47a0ed2/attachment.html>

More information about the cfe-dev mailing list