[cfe-dev] For loop conversion (was: C++11 migration tools)

Sam Panzer panzer at google.com
Wed Jul 11 14:17:34 PDT 2012


On Wed, Jul 11, 2012 at 11:20 AM, Manuel Klimek <klimek at google.com> wrote:

> On Wed, Jul 11, 2012 at 8:08 PM, Sam Panzer <panzer at google.com> wrote:
> > Good point. I currently have a matcher that finds for loops that appear
> to
> > have the right shape for conversion, and I do some extra checking along
> with
> > the conversion work in the callback. I can completely handle array-style
> > loops with a Preprocessor, which is used when generating the names of new
>
> I don't understand yet what the Preprocessor is needed for here...
>

I use Preprocessor in exactly one place right now, namely to create an
IdentifierResolver. When I am generating the name of a new loop variable, I
want to make sure that the identifier is unique, which is an approximation
to checking that the identifier neither shadows an existing definition nor
is shadowed in an inner scope. If there's a better way to do this, I would
be happy to change it.


>
> > variables. For iterator-style loops, I use Sema to determine the return
> type
> > of operator*, which will be the type of the loop variable. Finally
> (though
>
> Isn't the return type of the oeprator* already in the AST?
>

It's actually possible that operator* is never called. My initial
implementation tried to infer the type from  expressions involving the
iterator, but this didn't work correctly for CXXMemberCallExpr. I can
conceivably work around this without checking operator*, but


> > this isn't implemented yet), containers that are used as if they're
> arrays
> > will require Sema to check if suitable begin() and end() functions exist.
>
> Ah, I see - basically you want to trigger overload resolution to see
> whether the conversion would work. That makes sense. I'm not sure we
> want to get all of Sema available though - it's an insanely huge
> interface, and it's rather hard to figure out what's safe to do and
> what's not.
>
> Perhaps we can create a class that encapsulates Sema for those
> "what-if" type questions we'll need to answer for refactoring tools?
>

This sounds like a good idea. I can try to identify exactly the features I
need (most likely just overload resolution).


>
> > For now, I can force the use of auto rather than an explicit type listing
> > for iterator-based loops and get most of the work done without Sema. The
> > question then becomes the correct way to avoid repeatedly creating a
> > Preprocessor object, though if this is a cheap operation, I can create
> one
> > in the callback.
> >
> > Does this make the use clearer?
>
> Yep, thx.
>
> Cheers,
> /Manuel
>
> > Thanks!
> >
> >
> > On Wed, Jul 11, 2012 at 12:29 AM, Manuel Klimek <klimek at google.com>
> wrote:
> >>
> >> On Wed, Jul 11, 2012 at 6:03 AM, Sam Panzer <panzer at google.com> wrote:
> >> > I'm trying to port my code to take advantage of matchers, now that
> they
> >> > are
> >> > in mainline. Some of the work I want to do involves semantic analysis
> of
> >> > the
> >> > results (i.e. in the callback). What would be the best way to get a
> Sema
> >> > or
> >> > CompilerInstance out of either RefactoringTool or MatchResult? I'm
> >> > currently
> >> > playing with changing MatchASTConsumer to inherit from SemaConsumer,
> so
> >> > that
> >> > MatchFinder can track a Sema object the same way it does an
> ASTContext.
> >>
> >> I'd be interested in what the use cases for the semantic analysis are
> >> first. Often it seems like it would be better to have the information
> >> available in the AST instead of rerunning (potentially expensive)
> >> semantic analysis.
> >>
> >> Cheers,
> >> /Manuel
> >>
> >> >
> >> > Thanks!
> >> >
> >> >
> >> > On Mon, Jul 2, 2012 at 7:22 AM, Manuel Klimek <klimek at google.com>
> wrote:
> >> >>
> >> >> On Mon, Jul 2, 2012 at 4:16 PM, Sam Panzer <panzer at google.com>
> wrote:
> >> >>>
> >> >>> On Sun, Jul 1, 2012 at 10:45 PM, Manuel Klimek <klimek at google.com>
> >> >>> wrote:
> >> >>>>
> >> >>>> On Fri, Jun 29, 2012 at 8:17 PM, Sam Panzer <panzer at google.com>
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> Thanks for the input!
> >> >>>>>
> >> >>>>> Tooling/Refactoring is definitely the right way to go - dumping to
> >> >>>>> stdout was just a holdover from the example on LibTooling. I'll
> >> >>>>> change it
> >> >>>>> once I figure out how it works - and a clean way to arrange the
> >> >>>>> tests.
> >> >>>>>
> >> >>>>> As for the use of matchers vs. visitors, I decided to use a
> visitor
> >> >>>>> because this is a relatively complex transformation. I would
> happily
> >> >>>>> use
> >> >>>>> matchers if I thought I could - and I think that some other c++11
> >> >>>>> migrations
> >> >>>>> can easily be written with matchers - but I think the for loop
> >> >>>>
> >> >>>>
> >> >>>> I'm not claiming that the matchers needed to match those constructs
> >> >>>> are
> >> >>>> all already written - but if we write the questions you need to ask
> >> >>>> into
> >> >>>> matchers, other people who want to match similar things can reuse
> >> >>>> them, thus
> >> >>>> amplifying the impact of the code you write ;)
> >> >>>>
> >> >>>>>
> >> >>>>> checks need some features that matchers don't have (correct me if
> >> >>>>> I'm
> >> >>>>> wrong!). For example, the check for statically allocated
> array-based
> >> >>>>> loops
> >> >>>>> does this:
> >> >>>>> Given a for loop, determine if:
> >> >>>>>  - The increment statement increments (via ++) exactly one
> integral
> >> >>>>> index variable, such that the variable is declared and initialized
> >> >>>>> to zero
> >> >>>>> in the init portion of the loop, and that this variable's value
> is a
> >> >>>>> compared (via <, > or !=) to a nonnegative compile-time constant N
> >> >>>>> in the
> >> >>>>> compare portion.
> >> >>>>>  - The index variable is only ever used in an ArrayIndexExpession
> >> >>>>> indexing a single, statically allocated array A.
> >> >>>>>  - The array A has exactly N elements.
> >> >>>>>  - Additionally, if the ArrayIndexExpression A[index] is ever
> >> >>>>> assigned,
> >> >>>>> passed to a function or copied as a non-const reference, or its
> >> >>>>> address
> >> >>>>> taken with & (I still need to add a check for calls to non-const
> >> >>>>> member
> >> >>>>> functions), the loop variable in the converted version needs to
> be a
> >> >>>>> non-const reference so that the value will be correctly updated
> >> >>>>> (this step
> >> >>>>> adds the most complexity).
> >> >>>>
> >> >>>>
> >> >>>> ... and the matcher I would want to write for this looks something
> >> >>>> like
> >> >>>> that:
> >> >>>> ForLoop(
> >> >>>>   HasInitialization(Declaration(Id("loopvar",
> >> >>>> HasType(IsIntegral())))),
> >> >>>>   HasCondition(BinaryOperator(
> >> >>>>     HasAnyOperand(DeclarationReference(Id("condref",
> >> >>>> To(Variable())))),
> >> >>>>     HasAnyOperand(IntegerLiteral()))),
> >> >>>>
> >> >>>>
> >> >>>>
> HasIncrement(UnaryOperator(HasUnaryOperand(DeclarationReference(Id("incref",
> >> >>>> To(Variable()))))), ...),
> >> >>>> )
> >> >>>>
> >> >>>> In general, the complex stuff can stay complex, but the simple
> stuff
> >> >>>> shouldn't be lots of code.
> >> >>>
> >> >>>
> >> >>> Good point - and this is much easier to read than the equivalent
> code
> >> >>> I
> >> >>> had written.
> >> >>>
> >> >>>>
> >> >>>>
> >> >>>>>
> >> >>>>> The other types of loop (iterator-based, and array-like container)
> >> >>>>> are
> >> >>>>> more complicated to detect, since there are more permitted ways to
> >> >>>>> define
> >> >>>>> and use the index/iterators. What makes this difficult to do
> >> >>>>> entirely with
> >> >>>>> matchers is the number of back- and cross-references, as well as
> the
> >> >>>>> different local behaviors based on semantic properties. On the
> other
> >> >>>>> hand,
> >> >>>>> if there were some kind of backreference-enabled matcher that
> >> >>>>
> >> >>>>
> >> >>>> The way to handle the back-refs is to bind the nodes you want, and
> >> >>>> then
> >> >>>> pull them out and compare them in the callback.
> >> >>>
> >> >>>
> >> >>> I see - make the matcher slightly more general, then filter the
> >> >>> results,
> >> >>> perhaps with a visitor.
> >> >>>
> >> >>>>
> >> >>>>
> >> >>>> Thoughts?
> >> >>>> /Manuel
> >> >>>
> >> >>>
> >> >>> This sounds like it would make at least half the work much easier,
> so
> >> >>> I
> >> >>> think it would definitely be worth it to try switching to a
> >> >>> matcher-based
> >> >>> solution. When are matchers supposed to hit mainline (or some extra
> >> >>> cloneable repo) :) ?
> >> >>
> >> >>
> >> >> Matchers are currently in
> >> >> ^cfe/branches/tooling/include/clang/ASTMatchers/...
> >> >>
> >> >> I'm currently working on renaming them to camelCase from CamelCase;
> >> >> there's a Tool to help with the conversion though, so no problem in
> >> >> starting
> >> >> now ...
> >> >>
> >> >> Cheers,
> >> >> /Manuel
> >> >>
> >> >>>
> >> >>>
> >> >>> -Sam
> >> >>>
> >> >>>>
> >> >>>>
> >> >>>>>
> >> >>>>> allowed me to locate all matches in a given Stmt, it could be
> *much*
> >> >>>>> easier to express some parts of the logic, such as the first step
> in
> >> >>>>> the
> >> >>>>> above list. I also suspect that a single-Stmt matcher would better
> >> >>>>> way to
> >> >>>>> handle the last step; currently I track  whether the visitor is
> >> >>>>> looking at a
> >> >>>>> statement or expression which fits any of the const-disqualifying
> >> >>>>> conditions, and a note is made if I run into A[index].
> >> >>>>>
> >> >>>>> Does this make the use case clearer? I don't really see a better
> way
> >> >>>>> to
> >> >>>>> approach this problem, but you guys know the available toolkit far
> >> >>>>> better
> >> >>>>> than I do.
> >> >>>>>
> >> >>>>> On Fri, Jun 29, 2012 at 2:48 AM, Manuel Klimek <klimek at google.com
> >
> >> >>>>> wrote:
> >> >>>>>>
> >> >>>>>> On Fri, Jun 29, 2012 at 11:45 AM, Chandler Carruth
> >> >>>>>> <chandlerc at google.com> wrote:
> >> >>>>>>>
> >> >>>>>>> I tend to agree that this should use the Tooling/Refactoring
> >> >>>>>>> stuff.
> >> >>>>>>>
> >> >>>>>>> However, I'm curious about the best way to structure the
> location
> >> >>>>>>> of
> >> >>>>>>> migration candidates: AST matchers vs. visitor.
> >> >>>>>>>
> >> >>>>>>> I can almost see the visitor pattern working really well here as
> >> >>>>>>> each
> >> >>>>>>> different construct can have a pile of migration logic dropped
> >> >>>>>>> in.... But if
> >> >>>>>>> there is a need to connect dots between more distant constructs,
> >> >>>>>>> that
> >> >>>>>>> wouldn't work so well.... Not at all sure what would be best
> here.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> I've used a combination before - use matchers for the stuff where
> >> >>>>>> they
> >> >>>>>> work well, then write a very small easy-to-understand visitor if
> >> >>>>>> you need
> >> >>>>>> more. I think that brings down code size by quite a bit -
> obviously
> >> >>>>>> just a
> >> >>>>>> gut feeling here.
> >> >>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> On Fri, Jun 29, 2012 at 1:37 AM, Manuel Klimek <
> klimek at google.com>
> >> >>>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>> On Fri, Jun 29, 2012 at 4:06 AM, Sam Panzer <panzer at google.com
> >
> >> >>>>>>>> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>> In case anyone wanted to take a look, the attached patch
> >> >>>>>>>>> includes
> >> >>>>>>>>> the tool I've been working on. I create a new binary,
> >> >>>>>>>>> c++migrate, which
> >> >>>>>>>>> attempts to convert for loops in the source files given to it.
> >> >>>>>>>>> Most of my
> >> >>>>>>>>> focus has been on the FrontedAction, so I skirted all of the
> >> >>>>>>>>> issues
> >> >>>>>>>>> mentioned above by keeping the frontend interaction minimal
> >> >>>>>>>>> (i.e. I just
> >> >>>>>>>>> call Tooling::ClangTool::run), and the changes are just
> reported
> >> >>>>>>>>> on standard
> >> >>>>>>>>> output, if there are any to be made.
> >> >>>>>>>>>
> >> >>>>>>>>> The tool can currently convert for loops that range over (1)
> >> >>>>>>>>> statically allocated arrays, and (2) Clang-style
> iterator-based
> >> >>>>>>>>> loops (with
> >> >>>>>>>>> begin and end iterators defined). All loop variables need to
> be
> >> >>>>>>>>> declared
> >> >>>>>>>>> within the loop's initialization step in order for it to be
> >> >>>>>>>>> converted,
> >> >>>>>>>>> though this requirement can potentially be eliminated. I'm
> >> >>>>>>>>> working on
> >> >>>>>>>>> converting iterator-based loops that call someContainer.end()
> on
> >> >>>>>>>>> each
> >> >>>>>>>>> iteration, since they're probably the common case in many
> >> >>>>>>>>> codebases.
> >> >>>>>>>>>
> >> >>>>>>>>> Just for fun, I ran the tool over the 41 .cpp files in
> lib/Sema,
> >> >>>>>>>>> and my tool found 71 convertible loops in 17 files. There is
> >> >>>>>>>>> plenty more
> >> >>>>>>>>> work to go, because it clearly missed some easy ones.
> >> >>>>>>>>>
> >> >>>>>>>>> Any input or feedback is welcome!
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> High-level observations:
> >> >>>>>>>> 1. the handling of the rewrites; any reason not to use the
> >> >>>>>>>> Tooling/Refactoring stuff? Currently in the patch it looks to
> me
> >> >>>>>>>> like the
> >> >>>>>>>> files are not rewritten, but dumped to stdout
> >> >>>>>>>> 2. is the reason not to use the matchers here that they're not
> >> >>>>>>>> landed in mainline yet?
> >> >>>>>>>>
> >> >>>>>>>> Cheers,
> >> >>>>>>>> /Manuel
> >> >>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> -Sam
> >> >>>>>>>>>
> >> >>>>>>>>> On Thu, Jun 28, 2012 at 10:50 AM, Sam Panzer <
> panzer at google.com>
> >> >>>>>>>>> wrote:
> >> >>>>>>>>>>
> >> >>>>>>>>>> I'm that intern :)
> >> >>>>>>>>>>
> >> >>>>>>>>>> -Sam
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> On Wed, Jun 27, 2012 at 9:48 PM, John Wiegley
> >> >>>>>>>>>> <johnw at boostpro.com>
> >> >>>>>>>>>> wrote:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> >>>>> Sam Panzer <panzer at google.com> writes:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> > In particular, I am working on a tool to convert existing
> >> >>>>>>>>>>> > C++
> >> >>>>>>>>>>> > for loops to
> >> >>>>>>>>>>> > take advantage of the new C++11 range-based syntax. I can
> >> >>>>>>>>>>> > imagine similar
> >> >>>>>>>>>>> > tools to replace const with constexpr, macro hacks with
> >> >>>>>>>>>>> > static_assert, and
> >> >>>>>>>>>>> > potentially other common refactorings.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> > Thoughts? Suggestions?
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> You really must watch this presentation, if you haven't
> >> >>>>>>>>>>> already:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>     http://www.youtube.com/watch?v=yuIOGfcOH0k
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> --
> >> >>>>>>>>>>> John Wiegley
> >> >>>>>>>>>>> BoostPro Computing
> >> >>>>>>>>>>> http://www.boostpro.com
> >> >>>>>>>>>>> _______________________________________________
> >> >>>>>>>>>>> cfe-dev mailing list
> >> >>>>>>>>>>> cfe-dev at cs.uiuc.edu
> >> >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> _______________________________________________
> >> >>>>>>>>> cfe-dev mailing list
> >> >>>>>>>>> cfe-dev at cs.uiuc.edu
> >> >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> _______________________________________________
> >> >>>>>>>> cfe-dev mailing list
> >> >>>>>>>> cfe-dev at cs.uiuc.edu
> >> >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >> >
> >> > _______________________________________________
> >> > cfe-dev mailing list
> >> > cfe-dev at cs.uiuc.edu
> >> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120711/926882b6/attachment.html>


More information about the cfe-dev mailing list