[cfe-dev] [RFC] Improving import failure checking strategy inside ASTImporter

Fri Apr 27 08:54:48 PDT 2018

Hello Gabor,

Thank you for the reply!

26.04.2018 12:39, Gábor Márton пишет:
> Hi Aleksei,
>
> It is a great and welcoming initiative to identify the weaknesses in
> the ASTImporter.
>
> About, handling the error case after a call:
> I couldn't agree more that this is a burden to always have a check at
> the call site, but this is the price we pay for not having exceptions.
> By using macros I have a feeling that the code would be even less
> readable because we would hide control flow modifications inside the
> macro, also because finally we'd need to create so many different
> kinds of macros.
> As far as I know, there are cases where an `Import` can return a
> `nullptr` or a default even in case of non-error situations, please
> correct me if I am wrong. Thus, introducing an optional like return
> type would require to discover all these cases. Still, I think this
> would worth it, because the enforcement of error checking (as you
> mentioned), plus it would make it more explicit which situation is
> really an error.
Yes, this is what I described as problem (2).

> Actually, I think this should be done with a few large commits, e.g.
> once we change the return type of `ASTImporter::Import(Decl*)` to
> `Optional<Decl*>` then we have to change all call sites.
> By introducing an `ImportOrError` and then doing the changes gradually
> would result the same final state as if we had done it in one larger
> step where we changed all call sites, but until all call sites are
> changed we have to maintain both `Import` and `ImportOrError`.
>
>
> Beside the problem of handling the error case after an `Import` call
> we have observed several other weaknesses, and would like to hear your
> opinion about them as well.
> So, the list of problems so far:
>
> - Do the same things after and before a new AST node is created (::Create)
> The original import logic should be really simple, we create the node,
> then we mark that as imported, then we recursively import the parts
> for that node and then set them on that node.
That sounds pretty good, and this is what I was thinking about some time 
ago. You mean something like CreateDeserialized?
The problem here are Types: their creation requires all dependencies to 
be imported before the Type is created. I don't know how to deal with it.

> However, the AST is actually a graph, so we have to handle circles.
> If we mark something as imported (Imported()) then we return with the
> corresponding To whenever we want to import that node again, this way
> circles are handled.
> In order to make this algorithm work we must ensure things, which are
> not enforced in the current ASTImporter:
> * There are no Import() calls in between any node creation (::Create)
> and the Imported() call.
> * There are no VisitXXX() calls in any other VisitYYY() function, only
> call of Import() allowed.
> * Before actually creating an AST node (::Create), we must check if
> the Node had been imported already, if yes then return with that one.
> One very important case for this is connected to templates: we may
> start an import both from the templated decl of a template and from
> the template itself.
> There are good news, the first and third points can be enforced by
> providing a variadic forwarding template for the creation, we are
> currently preparing to upstream this change from our fork
> (https://github.com/Ericsson/clang/pull/354/files) but this is going
> to be a huge change.
> Still, I have no idea how to enforce the second point. (Maybe a checker?)
If I remember correctly, this pattern is not widely used and such code 
can be refactored easily.

> - Decl chains are not imported
> Currently we import declarations only if there is no available definition.
> If in the "From" TU there are both a declaration and a definition we
> import only the definition.
> Thus we do not import all information from the original AST.
> One example:
>      struct B { virtual void f(); };
>      void B::f() {}
> Here, when we import the definition, then the virtual flag is not
> getting imported, so finally we import a function as a non-virtual
> function.
> An other problem is with recursive functions:
>      void f(); void f() { f(); }
> When we import the prototype then we end up having two identical
> definitions for f(), and no prototype.
> We again preparing a patch for this, but this solves the problem only
> in case of functions.
Yes, I have met this problem before. I tried to import all redeclaration 
chain but haven't completed the work. One of problems here is how to 
stick the redeclaration chain to an existing declaration.

> By not importing the whole decl chain we loose all attributes which
> may be present only in declarations.
>
> - Structural Equivalency
> There are several holes here. E.g. we do not check the member
> functions in a class for equivalency, only members.
> Thus, we may end up stating that a forward decl is equivalent with a
> class definition if the class does not have data members.
> There are no tests at all for structural equivalency.
> In our fork we have some tests, and planning to upstream them too.
>
> - Minimal import
> This feature is used by LLDB, but not at all in CTU. This feature
> makes the code bigger and more complex.
> There are 0 unit tests for minimal import, whatever information we
> have about this we can get only from the lldb repo.
I think we need to discuss this with LLDB folks (lldb-dev?). 
Unfortunately, Sean Callanan (who was an active reviewer and committer) 
left the project and Jim Ingham never participated in ASTImporter reviews.

> - Lookup
> The way we lookup names during an import of a decl is getting really
> complex, especially with respect to friends.
What is even worse is that the lookup (which is pretty harmless itself) 
requires import of DeclContexts - possibly, including the declaration we 
are going to search for (and that makes its logic much more complicated).

> We find an already imported function from an unnamed namespace, thus
> we simply skip the import of a function in an other TU if that has the
> same name and that is also in an unnamed namespace
> (https://github.com/Ericsson/clang/issues/335).
Never noticed this problem before, thank you. I'm also not sure if 
import of TU-local data structures with same names is performed 
correctly now.
> We still use `localUncachedLookup` which can be very inefficient, we
> should be using `noload_lookup`.
This can be true. Do you have any numbers? For me, import correctness 
has greater priority now than speed optimizations, but this can be 
useful for future improvements.
> Thanks,
> Gabor
>
> On Wed, Apr 25, 2018 at 5:52 PM, Aleksei Sidorin via cfe-dev
> <cfe-dev at lists.llvm.org>  wrote:
>> Hello,
>>
>> I'd like to discuss the idea of changing the way how we check for import
>> issues  in ASTImporter and ASTNodeImporter.
>>
>> 1. Introduction
>>
>> ASTImporter is now extensively used in Cross-TU analysis and is in active
>> development. However, we find (and introduce) many errors related to how
>> import failures are handled now.
>>
>> Currently, ASTImporter::Import() and ASTNodeImporter::Visit*() methods
>> return nullptr for pointers and default value in case of conflict. And this
>> approach brings many issues due to the following reasons.
>>
>> 1. We have to check the return result and take actions explicitly.
>>
>> The common strategy of failure handling is very simple - just return. While
>> it is very simple, almost all the code of ASTImporter consists of such
>> explicit checks making it much harder to read.
>>
>> 2. Both nullptr and default value are valid by itself so we have to place
>> explicit checks to check if a failure happened. Example:
>>
>> Decl *To = Importer.Import(From);
>> if (From && !To)
>>    return nullptr;
>>
>> This doesn't look well and it's easy to forget where From can be nullptr and
>> where it is impossible.
>>
>> 3. Error checking is not enforced.
>>
>> We have to check for import errors every time Import() is called. And it is
>> extremely easy to forget placing a check after a call. This error pattern is
>> so common in ASTImporter that I was going to write a CSA check for this.
>>
>> 2. Possible solutions.
>>
>> The problem with enforcing checks is the most annoying one but also looks to
>> be the simplest - we can just replace the return value with Optional or
>> ErrorOr so clients will be enforced to perform a failure checking. This
>> replacement can include two changes.
>>
>> 1. A number of large commits that change the return type of
>> ASTNodeImporter::Visit*() methods. And I'd like to hear the comments from
>> existing users that have their own improvements of ASTImporter because they
>> have to change their code not touched by the commit as well - there be some
>> pain.
>>
>> 2. Introducing an alternative method in ASTImporter that returns ErrorOr.
>> So, we can migrate to the new, say, ImportOrError(), by small patches.
>>
>> This replacement also solves the problem (2) because value and error are now
>> separated.
>>
>> The problem of improving readability of these checks is a bit harder. The
>> most straightforward approach is to create a macro like:
>>
>> #define IMPORT_PTR_INTO_VAR(VarName, Type, From) \
>>    Type *VarName = nullptr;                                 \
>>    {                                                        \
>>      auto MacroImportRes = Importer.Import(From);           \
>>      if (!MacroImportRes)                                   \
>>        return MacroImportRes.getError();                    \
>>      VarName = *MacroImportRes;                             \
>>    }
>>
>> Same for non-pointer type. So, the final code will look like this:
>>
>> ErrorOr<Expr *> ASTNodeImporter::VisitCXXMemberCallExpr(CXXMemberCallExpr
>> *E) {
>>    IMPORT_QUAL_TYPE_INTO_VAR(T, E->getType())
>>    IMPORT_PTR_INTO_VAR(ToFn, Expr, E->getCallee())
>>
>>    SmallVector<Expr *, 4> ToArgs(E->getNumArgs());
>>    IMPORT_CONTAINER(E->arguments(), ToArgs);
>>
>>    return new (Importer.getToContext()) CXXMemberCallExpr(
>>          Importer.getToContext(), ToFn, ToArgs, T, E->getValueKind(),
>>          Importer.Import(E->getRParenLoc()));
>> }
>>
>> But macros don't look clean. Unfortunately, I still haven't found a better
>> option so any suggestions are very welcome.
>>
>> PS. The error handling pattern we wish to get is very close to exception
>> handling in C++ because usually we don't want to perform local error
>> handling but centralized instead. With a possibility to just throw an
>> exception and catch it in the ASTImporter entry point (returning an
>> ErrorOr), error handling in ASTImporter becomes almost trivial and pretty
>> clean. But we don't use exceptions in LLVM and clang and I don't know a
>> similar functionality in LLVM utilities.
>>
>> --
>> Best regards,
>> Aleksei Sidorin,
>> SRR, Samsung Electronics
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>

-- 
Best regards,
Aleksei Sidorin,
SRR, Samsung Electronics

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180427/fa2719a3/attachment.html>