[cfe-dev] For loop conversion (was: C++11 migration tools)

Sam Panzer panzer at google.com
Thu Jul 12 14:31:15 PDT 2012


Here are three kinds of operations that I would want to completely avoid
any kind of recursive AST traversal:
 - Optional matchers: In order to ignore certain ignorable parts of the
AST, I often want to create
 - Right-recursive matchers:

  I have a function that hunts for the expression which is actually passed
to a single-parameter constructor, which sheds ImplicitCastExprs,
single-parameter CXXConstructExprs, and MaterializeTemporaryExprs until it
finds something that is not a single-parameter constructor. Essentially,
the matcher would look like
  StmtMatcher =
expression(rightRecursion(ignoreImplicitCasts(constructorCall(argumentCountIs(1),
hasArgument(0, optional(materializeTemporaryExpr(RecursionPoint))),
withBaseCase(id(expresion))))

 - The ability to run the equivalent of a MatchFinder on an arbitrary AST
(e.g. in the callback of another matcher)


On Thu, Jul 12, 2012 at 1:06 AM, Manuel Klimek <klimek at google.com> wrote:

> On Wed, Jul 11, 2012 at 11:17 PM, Sam Panzer <panzer at google.com> wrote:
> >
> >
> >
> > On Wed, Jul 11, 2012 at 11:20 AM, Manuel Klimek <klimek at google.com>
> wrote:
> >>
> >> On Wed, Jul 11, 2012 at 8:08 PM, Sam Panzer <panzer at google.com> wrote:
> >> > Good point. I currently have a matcher that finds for loops that
> appear
> >> > to
> >> > have the right shape for conversion, and I do some extra checking
> along
> >> > with
> >> > the conversion work in the callback. I can completely handle
> array-style
> >> > loops with a Preprocessor, which is used when generating the names of
> >> > new
> >>
> >> I don't understand yet what the Preprocessor is needed for here...
> >
> >
> > I use Preprocessor in exactly one place right now, namely to create an
> > IdentifierResolver. When I am generating the name of a new loop
> variable, I
> > want to make sure that the identifier is unique, which is an
> approximation
> > to checking that the identifier neither shadows an existing definition
> nor
> > is shadowed in an inner scope. If there's a better way to do this, I
> would
> > be happy to change it.
>
> The ASTContext has the identifier table. (ASTContext::Idents).
>
> >> > variables. For iterator-style loops, I use Sema to determine the
> return
> >> > type
> >> > of operator*, which will be the type of the loop variable. Finally
> >> > (though
> >>
> >> Isn't the return type of the oeprator* already in the AST?
> >
> >
> > It's actually possible that operator* is never called. My initial
> > implementation tried to infer the type from  expressions involving the
> > iterator, but this didn't work correctly for CXXMemberCallExpr. I can
> > conceivably work around this without checking operator*, but
>
> did you want to finish that
>
> >> > this isn't implemented yet), containers that are used as if they're
> >> > arrays
> >> > will require Sema to check if suitable begin() and end() functions
> >> > exist.
> >>
> >> Ah, I see - basically you want to trigger overload resolution to see
> >> whether the conversion would work. That makes sense. I'm not sure we
> >> want to get all of Sema available though - it's an insanely huge
> >> interface, and it's rather hard to figure out what's safe to do and
> >> what's not.
> >>
> >> Perhaps we can create a class that encapsulates Sema for those
> >> "what-if" type questions we'll need to answer for refactoring tools?
> >
> >
> > This sounds like a good idea. I can try to identify exactly the features
> I
> > need (most likely just overload resolution).
>
> That would sound like most of what we need. It has come up a few
> times. Let me know if you identify more things we need.
>
> Cheers,
> /Manuel
>
> >
> >>
> >>
> >> > For now, I can force the use of auto rather than an explicit type
> >> > listing
> >> > for iterator-based loops and get most of the work done without Sema.
> The
> >> > question then becomes the correct way to avoid repeatedly creating a
> >> > Preprocessor object, though if this is a cheap operation, I can create
> >> > one
> >> > in the callback.
> >> >
> >> > Does this make the use clearer?
> >>
> >> Yep, thx.
> >>
> >> Cheers,
> >> /Manuel
> >>
> >> > Thanks!
> >> >
> >> >
> >> > On Wed, Jul 11, 2012 at 12:29 AM, Manuel Klimek <klimek at google.com>
> >> > wrote:
> >> >>
> >> >> On Wed, Jul 11, 2012 at 6:03 AM, Sam Panzer <panzer at google.com>
> wrote:
> >> >> > I'm trying to port my code to take advantage of matchers, now that
> >> >> > they
> >> >> > are
> >> >> > in mainline. Some of the work I want to do involves semantic
> analysis
> >> >> > of
> >> >> > the
> >> >> > results (i.e. in the callback). What would be the best way to get a
> >> >> > Sema
> >> >> > or
> >> >> > CompilerInstance out of either RefactoringTool or MatchResult? I'm
> >> >> > currently
> >> >> > playing with changing MatchASTConsumer to inherit from
> SemaConsumer,
> >> >> > so
> >> >> > that
> >> >> > MatchFinder can track a Sema object the same way it does an
> >> >> > ASTContext.
> >> >>
> >> >> I'd be interested in what the use cases for the semantic analysis are
> >> >> first. Often it seems like it would be better to have the information
> >> >> available in the AST instead of rerunning (potentially expensive)
> >> >> semantic analysis.
> >> >>
> >> >> Cheers,
> >> >> /Manuel
> >> >>
> >> >> >
> >> >> > Thanks!
> >> >> >
> >> >> >
> >> >> > On Mon, Jul 2, 2012 at 7:22 AM, Manuel Klimek <klimek at google.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> On Mon, Jul 2, 2012 at 4:16 PM, Sam Panzer <panzer at google.com>
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> On Sun, Jul 1, 2012 at 10:45 PM, Manuel Klimek <
> klimek at google.com>
> >> >> >>> wrote:
> >> >> >>>>
> >> >> >>>> On Fri, Jun 29, 2012 at 8:17 PM, Sam Panzer <panzer at google.com>
> >> >> >>>> wrote:
> >> >> >>>>>
> >> >> >>>>> Thanks for the input!
> >> >> >>>>>
> >> >> >>>>> Tooling/Refactoring is definitely the right way to go - dumping
> >> >> >>>>> to
> >> >> >>>>> stdout was just a holdover from the example on LibTooling. I'll
> >> >> >>>>> change it
> >> >> >>>>> once I figure out how it works - and a clean way to arrange the
> >> >> >>>>> tests.
> >> >> >>>>>
> >> >> >>>>> As for the use of matchers vs. visitors, I decided to use a
> >> >> >>>>> visitor
> >> >> >>>>> because this is a relatively complex transformation. I would
> >> >> >>>>> happily
> >> >> >>>>> use
> >> >> >>>>> matchers if I thought I could - and I think that some other
> c++11
> >> >> >>>>> migrations
> >> >> >>>>> can easily be written with matchers - but I think the for loop
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> I'm not claiming that the matchers needed to match those
> >> >> >>>> constructs
> >> >> >>>> are
> >> >> >>>> all already written - but if we write the questions you need to
> >> >> >>>> ask
> >> >> >>>> into
> >> >> >>>> matchers, other people who want to match similar things can
> reuse
> >> >> >>>> them, thus
> >> >> >>>> amplifying the impact of the code you write ;)
> >> >> >>>>
> >> >> >>>>>
> >> >> >>>>> checks need some features that matchers don't have (correct me
> if
> >> >> >>>>> I'm
> >> >> >>>>> wrong!). For example, the check for statically allocated
> >> >> >>>>> array-based
> >> >> >>>>> loops
> >> >> >>>>> does this:
> >> >> >>>>> Given a for loop, determine if:
> >> >> >>>>>  - The increment statement increments (via ++) exactly one
> >> >> >>>>> integral
> >> >> >>>>> index variable, such that the variable is declared and
> >> >> >>>>> initialized
> >> >> >>>>> to zero
> >> >> >>>>> in the init portion of the loop, and that this variable's value
> >> >> >>>>> is a
> >> >> >>>>> compared (via <, > or !=) to a nonnegative compile-time
> constant
> >> >> >>>>> N
> >> >> >>>>> in the
> >> >> >>>>> compare portion.
> >> >> >>>>>  - The index variable is only ever used in an
> ArrayIndexExpession
> >> >> >>>>> indexing a single, statically allocated array A.
> >> >> >>>>>  - The array A has exactly N elements.
> >> >> >>>>>  - Additionally, if the ArrayIndexExpression A[index] is ever
> >> >> >>>>> assigned,
> >> >> >>>>> passed to a function or copied as a non-const reference, or its
> >> >> >>>>> address
> >> >> >>>>> taken with & (I still need to add a check for calls to
> non-const
> >> >> >>>>> member
> >> >> >>>>> functions), the loop variable in the converted version needs to
> >> >> >>>>> be a
> >> >> >>>>> non-const reference so that the value will be correctly updated
> >> >> >>>>> (this step
> >> >> >>>>> adds the most complexity).
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> ... and the matcher I would want to write for this looks
> something
> >> >> >>>> like
> >> >> >>>> that:
> >> >> >>>> ForLoop(
> >> >> >>>>   HasInitialization(Declaration(Id("loopvar",
> >> >> >>>> HasType(IsIntegral())))),
> >> >> >>>>   HasCondition(BinaryOperator(
> >> >> >>>>     HasAnyOperand(DeclarationReference(Id("condref",
> >> >> >>>> To(Variable())))),
> >> >> >>>>     HasAnyOperand(IntegerLiteral()))),
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> HasIncrement(UnaryOperator(HasUnaryOperand(DeclarationReference(Id("incref",
> >> >> >>>> To(Variable()))))), ...),
> >> >> >>>> )
> >> >> >>>>
> >> >> >>>> In general, the complex stuff can stay complex, but the simple
> >> >> >>>> stuff
> >> >> >>>> shouldn't be lots of code.
> >> >> >>>
> >> >> >>>
> >> >> >>> Good point - and this is much easier to read than the equivalent
> >> >> >>> code
> >> >> >>> I
> >> >> >>> had written.
> >> >> >>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>>
> >> >> >>>>> The other types of loop (iterator-based, and array-like
> >> >> >>>>> container)
> >> >> >>>>> are
> >> >> >>>>> more complicated to detect, since there are more permitted ways
> >> >> >>>>> to
> >> >> >>>>> define
> >> >> >>>>> and use the index/iterators. What makes this difficult to do
> >> >> >>>>> entirely with
> >> >> >>>>> matchers is the number of back- and cross-references, as well
> as
> >> >> >>>>> the
> >> >> >>>>> different local behaviors based on semantic properties. On the
> >> >> >>>>> other
> >> >> >>>>> hand,
> >> >> >>>>> if there were some kind of backreference-enabled matcher that
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> The way to handle the back-refs is to bind the nodes you want,
> and
> >> >> >>>> then
> >> >> >>>> pull them out and compare them in the callback.
> >> >> >>>
> >> >> >>>
> >> >> >>> I see - make the matcher slightly more general, then filter the
> >> >> >>> results,
> >> >> >>> perhaps with a visitor.
> >> >> >>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> Thoughts?
> >> >> >>>> /Manuel
> >> >> >>>
> >> >> >>>
> >> >> >>> This sounds like it would make at least half the work much
> easier,
> >> >> >>> so
> >> >> >>> I
> >> >> >>> think it would definitely be worth it to try switching to a
> >> >> >>> matcher-based
> >> >> >>> solution. When are matchers supposed to hit mainline (or some
> extra
> >> >> >>> cloneable repo) :) ?
> >> >> >>
> >> >> >>
> >> >> >> Matchers are currently in
> >> >> >> ^cfe/branches/tooling/include/clang/ASTMatchers/...
> >> >> >>
> >> >> >> I'm currently working on renaming them to camelCase from
> CamelCase;
> >> >> >> there's a Tool to help with the conversion though, so no problem
> in
> >> >> >> starting
> >> >> >> now ...
> >> >> >>
> >> >> >> Cheers,
> >> >> >> /Manuel
> >> >> >>
> >> >> >>>
> >> >> >>>
> >> >> >>> -Sam
> >> >> >>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>>
> >> >> >>>>> allowed me to locate all matches in a given Stmt, it could be
> >> >> >>>>> *much*
> >> >> >>>>> easier to express some parts of the logic, such as the first
> step
> >> >> >>>>> in
> >> >> >>>>> the
> >> >> >>>>> above list. I also suspect that a single-Stmt matcher would
> >> >> >>>>> better
> >> >> >>>>> way to
> >> >> >>>>> handle the last step; currently I track  whether the visitor is
> >> >> >>>>> looking at a
> >> >> >>>>> statement or expression which fits any of the
> const-disqualifying
> >> >> >>>>> conditions, and a note is made if I run into A[index].
> >> >> >>>>>
> >> >> >>>>> Does this make the use case clearer? I don't really see a
> better
> >> >> >>>>> way
> >> >> >>>>> to
> >> >> >>>>> approach this problem, but you guys know the available toolkit
> >> >> >>>>> far
> >> >> >>>>> better
> >> >> >>>>> than I do.
> >> >> >>>>>
> >> >> >>>>> On Fri, Jun 29, 2012 at 2:48 AM, Manuel Klimek
> >> >> >>>>> <klimek at google.com>
> >> >> >>>>> wrote:
> >> >> >>>>>>
> >> >> >>>>>> On Fri, Jun 29, 2012 at 11:45 AM, Chandler Carruth
> >> >> >>>>>> <chandlerc at google.com> wrote:
> >> >> >>>>>>>
> >> >> >>>>>>> I tend to agree that this should use the Tooling/Refactoring
> >> >> >>>>>>> stuff.
> >> >> >>>>>>>
> >> >> >>>>>>> However, I'm curious about the best way to structure the
> >> >> >>>>>>> location
> >> >> >>>>>>> of
> >> >> >>>>>>> migration candidates: AST matchers vs. visitor.
> >> >> >>>>>>>
> >> >> >>>>>>> I can almost see the visitor pattern working really well here
> >> >> >>>>>>> as
> >> >> >>>>>>> each
> >> >> >>>>>>> different construct can have a pile of migration logic
> dropped
> >> >> >>>>>>> in.... But if
> >> >> >>>>>>> there is a need to connect dots between more distant
> >> >> >>>>>>> constructs,
> >> >> >>>>>>> that
> >> >> >>>>>>> wouldn't work so well.... Not at all sure what would be best
> >> >> >>>>>>> here.
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>> I've used a combination before - use matchers for the stuff
> >> >> >>>>>> where
> >> >> >>>>>> they
> >> >> >>>>>> work well, then write a very small easy-to-understand visitor
> if
> >> >> >>>>>> you need
> >> >> >>>>>> more. I think that brings down code size by quite a bit -
> >> >> >>>>>> obviously
> >> >> >>>>>> just a
> >> >> >>>>>> gut feeling here.
> >> >> >>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>> On Fri, Jun 29, 2012 at 1:37 AM, Manuel Klimek
> >> >> >>>>>>> <klimek at google.com>
> >> >> >>>>>>> wrote:
> >> >> >>>>>>>>
> >> >> >>>>>>>> On Fri, Jun 29, 2012 at 4:06 AM, Sam Panzer
> >> >> >>>>>>>> <panzer at google.com>
> >> >> >>>>>>>> wrote:
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> In case anyone wanted to take a look, the attached patch
> >> >> >>>>>>>>> includes
> >> >> >>>>>>>>> the tool I've been working on. I create a new binary,
> >> >> >>>>>>>>> c++migrate, which
> >> >> >>>>>>>>> attempts to convert for loops in the source files given to
> >> >> >>>>>>>>> it.
> >> >> >>>>>>>>> Most of my
> >> >> >>>>>>>>> focus has been on the FrontedAction, so I skirted all of
> the
> >> >> >>>>>>>>> issues
> >> >> >>>>>>>>> mentioned above by keeping the frontend interaction minimal
> >> >> >>>>>>>>> (i.e. I just
> >> >> >>>>>>>>> call Tooling::ClangTool::run), and the changes are just
> >> >> >>>>>>>>> reported
> >> >> >>>>>>>>> on standard
> >> >> >>>>>>>>> output, if there are any to be made.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> The tool can currently convert for loops that range over
> (1)
> >> >> >>>>>>>>> statically allocated arrays, and (2) Clang-style
> >> >> >>>>>>>>> iterator-based
> >> >> >>>>>>>>> loops (with
> >> >> >>>>>>>>> begin and end iterators defined). All loop variables need
> to
> >> >> >>>>>>>>> be
> >> >> >>>>>>>>> declared
> >> >> >>>>>>>>> within the loop's initialization step in order for it to be
> >> >> >>>>>>>>> converted,
> >> >> >>>>>>>>> though this requirement can potentially be eliminated. I'm
> >> >> >>>>>>>>> working on
> >> >> >>>>>>>>> converting iterator-based loops that call
> someContainer.end()
> >> >> >>>>>>>>> on
> >> >> >>>>>>>>> each
> >> >> >>>>>>>>> iteration, since they're probably the common case in many
> >> >> >>>>>>>>> codebases.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> Just for fun, I ran the tool over the 41 .cpp files in
> >> >> >>>>>>>>> lib/Sema,
> >> >> >>>>>>>>> and my tool found 71 convertible loops in 17 files. There
> is
> >> >> >>>>>>>>> plenty more
> >> >> >>>>>>>>> work to go, because it clearly missed some easy ones.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> Any input or feedback is welcome!
> >> >> >>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>> High-level observations:
> >> >> >>>>>>>> 1. the handling of the rewrites; any reason not to use the
> >> >> >>>>>>>> Tooling/Refactoring stuff? Currently in the patch it looks
> to
> >> >> >>>>>>>> me
> >> >> >>>>>>>> like the
> >> >> >>>>>>>> files are not rewritten, but dumped to stdout
> >> >> >>>>>>>> 2. is the reason not to use the matchers here that they're
> not
> >> >> >>>>>>>> landed in mainline yet?
> >> >> >>>>>>>>
> >> >> >>>>>>>> Cheers,
> >> >> >>>>>>>> /Manuel
> >> >> >>>>>>>>
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> -Sam
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> On Thu, Jun 28, 2012 at 10:50 AM, Sam Panzer
> >> >> >>>>>>>>> <panzer at google.com>
> >> >> >>>>>>>>> wrote:
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>> I'm that intern :)
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>> -Sam
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>> On Wed, Jun 27, 2012 at 9:48 PM, John Wiegley
> >> >> >>>>>>>>>> <johnw at boostpro.com>
> >> >> >>>>>>>>>> wrote:
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> >>>>> Sam Panzer <panzer at google.com> writes:
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> > In particular, I am working on a tool to convert
> existing
> >> >> >>>>>>>>>>> > C++
> >> >> >>>>>>>>>>> > for loops to
> >> >> >>>>>>>>>>> > take advantage of the new C++11 range-based syntax. I
> can
> >> >> >>>>>>>>>>> > imagine similar
> >> >> >>>>>>>>>>> > tools to replace const with constexpr, macro hacks with
> >> >> >>>>>>>>>>> > static_assert, and
> >> >> >>>>>>>>>>> > potentially other common refactorings.
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> > Thoughts? Suggestions?
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> You really must watch this presentation, if you haven't
> >> >> >>>>>>>>>>> already:
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>>     http://www.youtube.com/watch?v=yuIOGfcOH0k
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> --
> >> >> >>>>>>>>>>> John Wiegley
> >> >> >>>>>>>>>>> BoostPro Computing
> >> >> >>>>>>>>>>> http://www.boostpro.com
> >> >> >>>>>>>>>>> _______________________________________________
> >> >> >>>>>>>>>>> cfe-dev mailing list
> >> >> >>>>>>>>>>> cfe-dev at cs.uiuc.edu
> >> >> >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> _______________________________________________
> >> >> >>>>>>>>> cfe-dev mailing list
> >> >> >>>>>>>>> cfe-dev at cs.uiuc.edu
> >> >> >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >> >> >>>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>> _______________________________________________
> >> >> >>>>>>>> cfe-dev mailing list
> >> >> >>>>>>>> cfe-dev at cs.uiuc.edu
> >> >> >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >> >> >>>>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > cfe-dev mailing list
> >> >> > cfe-dev at cs.uiuc.edu
> >> >> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >> >> >
> >> >
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120712/ea56a663/attachment.html>


More information about the cfe-dev mailing list