[cfe-dev] RFC: AST Matcher Sub-expression references implementation straw man

Thu Apr 18 06:00:17 PDT 2013

If we don't have to worry about unresolved sub-expression references then the implementation of a sub-ex ref matcher is pretty straight-forward. However, the 'look up node in the BoundNodesTree' statement is where all the trickiness is. Using existing machinery to do this could be computationally expensive. I'm going to do a bit of profiling to check this out. I've seen some mention of profiling stubs in the code already. Are there any tips or best known methods I should be aware of?

From: Manuel Klimek [mailto:klimek at google.com]
Sent: Wednesday, April 17, 2013 3:43 PM
To: Vane, Edwin
Cc: Daniel Jasper; Alexander Kornienko; Clang Dev List (cfe-dev at cs.uiuc.edu)
Subject: Re: RFC: AST Matcher Sub-expression references implementation straw man

On Wed, Apr 17, 2013 at 7:47 PM, Vane, Edwin <edwin.vane at intel.com<mailto:edwin.vane at intel.com>> wrote:

From: Manuel Klimek [mailto:klimek at google.com<mailto:klimek at google.com>]
Sent: Wednesday, April 17, 2013 3:45 AM
To: Vane, Edwin; Daniel Jasper; Alexander Kornienko
Cc: Clang Dev List (cfe-dev at cs.uiuc.edu<mailto:cfe-dev at cs.uiuc.edu>)
Subject: Re: RFC: AST Matcher Sub-expression references implementation straw man

On Tue, Apr 16, 2013 at 8:47 PM, Manuel Klimek <klimek at google.com<mailto:klimek at google.com>> wrote:
+djasper, alexfh

On Tue, Apr 16, 2013 at 7:15 PM, Vane, Edwin <edwin.vane at intel.com<mailto:edwin.vane at intel.com>> wrote:
Hi all,

I've been looking at implementing sub-expression references in AST Matchers. I'd like to propose an implementation and get feedback, especially if I've overlooked something.

======
Quick motivation first:
The goal of sub-expression references is to write matchers that can refer to nodes tagged by the matcher and use these nodes in more tests. Consider the following simple example that wants to find variable declarations whose type exactly matches the initializer type (for simplicity, lets ignore all the implicit cast stuff that will exist around the initializer):

varDecl(hasType(qualType(...).bind("declType")),
               hasInitializer(hasType(sameAs("declType")))

Minor point: I'd guess we'll need to specify the type here explicitly, like sameAs<QualType>("declType"). Also, I'm not very happy with the color of the bike shed called "sameAs" yet :)

[Edwin] I had thought the type could be inferred from which matcher the sub-ex ref is passed to à la PolymorphicMatchWithParam*. Ambiguities would be solved the same way we currently have to resolve ambiguities in the polymorphic matchers; by explicitly inserting a dyncast matcher:
hasType(qualType(sameAs("blah")))

You're right...

I have no attachments to the name. I'm open to suggestions.

Daniel usually has good naming ideas :)

)

There's potential for other sub-expression reference matchers beyond the "sameAs()" matcher proposed by this example. Name comparisons for example.
======

Proposed changes to implement sub-expression references:
- In a sub-ex matcher, look-up in the provided BoundNodesTreeBuilder for the given id.
  - If the id exists perform matcher logic, return truth value.
  - If an id doesn't exist, record an unresolved sub-expression reference in the Builder and return true.
    - Info recorded: id we're looking for and some sort of matcher "future": a callback or functor for executing matcher logic at a later time along with all necessary data matcher needs.
- When control returns to the top level finder logic (MatchASTVisitor::match()) we tend to unresolved sub-ex refs. For each recorded unresolved sub-ex ref, do a search through the BoundNodesTree for a matching id. If one is found, execute matcher future. If the future returns false, the match fails and the Callback that would normally get called will not be. If no id is found, match also fails. If the matcher future returns true, proceed to calling Callback for the match. The handling of unresolved sub-ex refs could be handled by modifying MatchVisitor.

I'm currently opposed to unresolved subexpression matcher handling (but happy to be convinced otherwise).
Currently, we have:
a) a defined order of execution
b) a well-defined cut-off for logical expressions

That is, if you say anyOf(a(), b()), b() will not be matched if a() matches. Now a() could contain a sub-expression matcher referring to something that might be bound in b(); how would you propose to solve this?

And on a more principled note:
Which things can we solve with unresolved subexpression matchers that cannot also be solved by re-arranging the matcher so that the bind() to the id must always happen before the sameAs match, otherwise sameAs will not match?

[Edwin] The unresolved sub-ex refs were meant to handle what I thought was an un-defined order of execution actually. But now that I think about it, the un-defined part is just the order of *creation* of matchers, not the calling of their match() functions. I think you're right that we could just re-arrange a matcher so the tagging happens before the ref. Given that, I don't think unresolved sub-ex refs are necessary. I don't think it's that big of onus on users of AST Matchers to organize their matchers this way.

Outstanding questions:
- Where should unresolved sub-ex refs be stored? Top-most level of the BoundNodesTree or just in the level at which the ref happens. Does it matter? Sub-ex refs should be able to refer to nodes anywhere else in the matcher and not have specific scope. Even if refs are stored at the current level, they should be collected and handled all at once when matching is complete.

--
Edwin Vane
  Software Developer
  Intel of Canada, Inc.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130418/c4520d00/attachment.html>