[cfe-dev] For loop conversion (was: C++11 migration tools)

Wed Jul 11 11:20:56 PDT 2012

On Wed, Jul 11, 2012 at 8:08 PM, Sam Panzer <panzer at google.com> wrote:
> Good point. I currently have a matcher that finds for loops that appear to
> have the right shape for conversion, and I do some extra checking along with
> the conversion work in the callback. I can completely handle array-style
> loops with a Preprocessor, which is used when generating the names of new

I don't understand yet what the Preprocessor is needed for here...

> variables. For iterator-style loops, I use Sema to determine the return type
> of operator*, which will be the type of the loop variable. Finally (though

Isn't the return type of the oeprator* already in the AST?

> this isn't implemented yet), containers that are used as if they're arrays
> will require Sema to check if suitable begin() and end() functions exist.

Ah, I see - basically you want to trigger overload resolution to see
whether the conversion would work. That makes sense. I'm not sure we
want to get all of Sema available though - it's an insanely huge
interface, and it's rather hard to figure out what's safe to do and
what's not.

Perhaps we can create a class that encapsulates Sema for those
"what-if" type questions we'll need to answer for refactoring tools?

> For now, I can force the use of auto rather than an explicit type listing
> for iterator-based loops and get most of the work done without Sema. The
> question then becomes the correct way to avoid repeatedly creating a
> Preprocessor object, though if this is a cheap operation, I can create one
> in the callback.
>
> Does this make the use clearer?

Yep, thx.

Cheers,
/Manuel

> Thanks!
>
>
> On Wed, Jul 11, 2012 at 12:29 AM, Manuel Klimek <klimek at google.com> wrote:
>>
>> On Wed, Jul 11, 2012 at 6:03 AM, Sam Panzer <panzer at google.com> wrote:
>> > I'm trying to port my code to take advantage of matchers, now that they
>> > are
>> > in mainline. Some of the work I want to do involves semantic analysis of
>> > the
>> > results (i.e. in the callback). What would be the best way to get a Sema
>> > or
>> > CompilerInstance out of either RefactoringTool or MatchResult? I'm
>> > currently
>> > playing with changing MatchASTConsumer to inherit from SemaConsumer, so
>> > that
>> > MatchFinder can track a Sema object the same way it does an ASTContext.
>>
>> I'd be interested in what the use cases for the semantic analysis are
>> first. Often it seems like it would be better to have the information
>> available in the AST instead of rerunning (potentially expensive)
>> semantic analysis.
>>
>> Cheers,
>> /Manuel
>>
>> >
>> > Thanks!
>> >
>> >
>> > On Mon, Jul 2, 2012 at 7:22 AM, Manuel Klimek <klimek at google.com> wrote:
>> >>
>> >> On Mon, Jul 2, 2012 at 4:16 PM, Sam Panzer <panzer at google.com> wrote:
>> >>>
>> >>> On Sun, Jul 1, 2012 at 10:45 PM, Manuel Klimek <klimek at google.com>
>> >>> wrote:
>> >>>>
>> >>>> On Fri, Jun 29, 2012 at 8:17 PM, Sam Panzer <panzer at google.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Thanks for the input!
>> >>>>>
>> >>>>> Tooling/Refactoring is definitely the right way to go - dumping to
>> >>>>> stdout was just a holdover from the example on LibTooling. I'll
>> >>>>> change it
>> >>>>> once I figure out how it works - and a clean way to arrange the
>> >>>>> tests.
>> >>>>>
>> >>>>> As for the use of matchers vs. visitors, I decided to use a visitor
>> >>>>> because this is a relatively complex transformation. I would happily
>> >>>>> use
>> >>>>> matchers if I thought I could - and I think that some other c++11
>> >>>>> migrations
>> >>>>> can easily be written with matchers - but I think the for loop
>> >>>>
>> >>>>
>> >>>> I'm not claiming that the matchers needed to match those constructs
>> >>>> are
>> >>>> all already written - but if we write the questions you need to ask
>> >>>> into
>> >>>> matchers, other people who want to match similar things can reuse
>> >>>> them, thus
>> >>>> amplifying the impact of the code you write ;)
>> >>>>
>> >>>>>
>> >>>>> checks need some features that matchers don't have (correct me if
>> >>>>> I'm
>> >>>>> wrong!). For example, the check for statically allocated array-based
>> >>>>> loops
>> >>>>> does this:
>> >>>>> Given a for loop, determine if:
>> >>>>>  - The increment statement increments (via ++) exactly one integral
>> >>>>> index variable, such that the variable is declared and initialized
>> >>>>> to zero
>> >>>>> in the init portion of the loop, and that this variable's value is a
>> >>>>> compared (via <, > or !=) to a nonnegative compile-time constant N
>> >>>>> in the
>> >>>>> compare portion.
>> >>>>>  - The index variable is only ever used in an ArrayIndexExpession
>> >>>>> indexing a single, statically allocated array A.
>> >>>>>  - The array A has exactly N elements.
>> >>>>>  - Additionally, if the ArrayIndexExpression A[index] is ever
>> >>>>> assigned,
>> >>>>> passed to a function or copied as a non-const reference, or its
>> >>>>> address
>> >>>>> taken with & (I still need to add a check for calls to non-const
>> >>>>> member
>> >>>>> functions), the loop variable in the converted version needs to be a
>> >>>>> non-const reference so that the value will be correctly updated
>> >>>>> (this step
>> >>>>> adds the most complexity).
>> >>>>
>> >>>>
>> >>>> ... and the matcher I would want to write for this looks something
>> >>>> like
>> >>>> that:
>> >>>> ForLoop(
>> >>>>   HasInitialization(Declaration(Id("loopvar",
>> >>>> HasType(IsIntegral())))),
>> >>>>   HasCondition(BinaryOperator(
>> >>>>     HasAnyOperand(DeclarationReference(Id("condref",
>> >>>> To(Variable())))),
>> >>>>     HasAnyOperand(IntegerLiteral()))),
>> >>>>
>> >>>>
>> >>>> HasIncrement(UnaryOperator(HasUnaryOperand(DeclarationReference(Id("incref",
>> >>>> To(Variable()))))), ...),
>> >>>> )
>> >>>>
>> >>>> In general, the complex stuff can stay complex, but the simple stuff
>> >>>> shouldn't be lots of code.
>> >>>
>> >>>
>> >>> Good point - and this is much easier to read than the equivalent code
>> >>> I
>> >>> had written.
>> >>>
>> >>>>
>> >>>>
>> >>>>>
>> >>>>> The other types of loop (iterator-based, and array-like container)
>> >>>>> are
>> >>>>> more complicated to detect, since there are more permitted ways to
>> >>>>> define
>> >>>>> and use the index/iterators. What makes this difficult to do
>> >>>>> entirely with
>> >>>>> matchers is the number of back- and cross-references, as well as the
>> >>>>> different local behaviors based on semantic properties. On the other
>> >>>>> hand,
>> >>>>> if there were some kind of backreference-enabled matcher that
>> >>>>
>> >>>>
>> >>>> The way to handle the back-refs is to bind the nodes you want, and
>> >>>> then
>> >>>> pull them out and compare them in the callback.
>> >>>
>> >>>
>> >>> I see - make the matcher slightly more general, then filter the
>> >>> results,
>> >>> perhaps with a visitor.
>> >>>
>> >>>>
>> >>>>
>> >>>> Thoughts?
>> >>>> /Manuel
>> >>>
>> >>>
>> >>> This sounds like it would make at least half the work much easier, so
>> >>> I
>> >>> think it would definitely be worth it to try switching to a
>> >>> matcher-based
>> >>> solution. When are matchers supposed to hit mainline (or some extra
>> >>> cloneable repo) :) ?
>> >>
>> >>
>> >> Matchers are currently in
>> >> ^cfe/branches/tooling/include/clang/ASTMatchers/...
>> >>
>> >> I'm currently working on renaming them to camelCase from CamelCase;
>> >> there's a Tool to help with the conversion though, so no problem in
>> >> starting
>> >> now ...
>> >>
>> >> Cheers,
>> >> /Manuel
>> >>
>> >>>
>> >>>
>> >>> -Sam
>> >>>
>> >>>>
>> >>>>
>> >>>>>
>> >>>>> allowed me to locate all matches in a given Stmt, it could be *much*
>> >>>>> easier to express some parts of the logic, such as the first step in
>> >>>>> the
>> >>>>> above list. I also suspect that a single-Stmt matcher would better
>> >>>>> way to
>> >>>>> handle the last step; currently I track  whether the visitor is
>> >>>>> looking at a
>> >>>>> statement or expression which fits any of the const-disqualifying
>> >>>>> conditions, and a note is made if I run into A[index].
>> >>>>>
>> >>>>> Does this make the use case clearer? I don't really see a better way
>> >>>>> to
>> >>>>> approach this problem, but you guys know the available toolkit far
>> >>>>> better
>> >>>>> than I do.
>> >>>>>
>> >>>>> On Fri, Jun 29, 2012 at 2:48 AM, Manuel Klimek <klimek at google.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> On Fri, Jun 29, 2012 at 11:45 AM, Chandler Carruth
>> >>>>>> <chandlerc at google.com> wrote:
>> >>>>>>>
>> >>>>>>> I tend to agree that this should use the Tooling/Refactoring
>> >>>>>>> stuff.
>> >>>>>>>
>> >>>>>>> However, I'm curious about the best way to structure the location
>> >>>>>>> of
>> >>>>>>> migration candidates: AST matchers vs. visitor.
>> >>>>>>>
>> >>>>>>> I can almost see the visitor pattern working really well here as
>> >>>>>>> each
>> >>>>>>> different construct can have a pile of migration logic dropped
>> >>>>>>> in.... But if
>> >>>>>>> there is a need to connect dots between more distant constructs,
>> >>>>>>> that
>> >>>>>>> wouldn't work so well.... Not at all sure what would be best here.
>> >>>>>>
>> >>>>>>
>> >>>>>> I've used a combination before - use matchers for the stuff where
>> >>>>>> they
>> >>>>>> work well, then write a very small easy-to-understand visitor if
>> >>>>>> you need
>> >>>>>> more. I think that brings down code size by quite a bit - obviously
>> >>>>>> just a
>> >>>>>> gut feeling here.
>> >>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Fri, Jun 29, 2012 at 1:37 AM, Manuel Klimek <klimek at google.com>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> On Fri, Jun 29, 2012 at 4:06 AM, Sam Panzer <panzer at google.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> In case anyone wanted to take a look, the attached patch
>> >>>>>>>>> includes
>> >>>>>>>>> the tool I've been working on. I create a new binary,
>> >>>>>>>>> c++migrate, which
>> >>>>>>>>> attempts to convert for loops in the source files given to it.
>> >>>>>>>>> Most of my
>> >>>>>>>>> focus has been on the FrontedAction, so I skirted all of the
>> >>>>>>>>> issues
>> >>>>>>>>> mentioned above by keeping the frontend interaction minimal
>> >>>>>>>>> (i.e. I just
>> >>>>>>>>> call Tooling::ClangTool::run), and the changes are just reported
>> >>>>>>>>> on standard
>> >>>>>>>>> output, if there are any to be made.
>> >>>>>>>>>
>> >>>>>>>>> The tool can currently convert for loops that range over (1)
>> >>>>>>>>> statically allocated arrays, and (2) Clang-style iterator-based
>> >>>>>>>>> loops (with
>> >>>>>>>>> begin and end iterators defined). All loop variables need to be
>> >>>>>>>>> declared
>> >>>>>>>>> within the loop's initialization step in order for it to be
>> >>>>>>>>> converted,
>> >>>>>>>>> though this requirement can potentially be eliminated. I'm
>> >>>>>>>>> working on
>> >>>>>>>>> converting iterator-based loops that call someContainer.end() on
>> >>>>>>>>> each
>> >>>>>>>>> iteration, since they're probably the common case in many
>> >>>>>>>>> codebases.
>> >>>>>>>>>
>> >>>>>>>>> Just for fun, I ran the tool over the 41 .cpp files in lib/Sema,
>> >>>>>>>>> and my tool found 71 convertible loops in 17 files. There is
>> >>>>>>>>> plenty more
>> >>>>>>>>> work to go, because it clearly missed some easy ones.
>> >>>>>>>>>
>> >>>>>>>>> Any input or feedback is welcome!
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> High-level observations:
>> >>>>>>>> 1. the handling of the rewrites; any reason not to use the
>> >>>>>>>> Tooling/Refactoring stuff? Currently in the patch it looks to me
>> >>>>>>>> like the
>> >>>>>>>> files are not rewritten, but dumped to stdout
>> >>>>>>>> 2. is the reason not to use the matchers here that they're not
>> >>>>>>>> landed in mainline yet?
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> /Manuel
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> -Sam
>> >>>>>>>>>
>> >>>>>>>>> On Thu, Jun 28, 2012 at 10:50 AM, Sam Panzer <panzer at google.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> I'm that intern :)
>> >>>>>>>>>>
>> >>>>>>>>>> -Sam
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Wed, Jun 27, 2012 at 9:48 PM, John Wiegley
>> >>>>>>>>>> <johnw at boostpro.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> >>>>> Sam Panzer <panzer at google.com> writes:
>> >>>>>>>>>>>
>> >>>>>>>>>>> > In particular, I am working on a tool to convert existing
>> >>>>>>>>>>> > C++
>> >>>>>>>>>>> > for loops to
>> >>>>>>>>>>> > take advantage of the new C++11 range-based syntax. I can
>> >>>>>>>>>>> > imagine similar
>> >>>>>>>>>>> > tools to replace const with constexpr, macro hacks with
>> >>>>>>>>>>> > static_assert, and
>> >>>>>>>>>>> > potentially other common refactorings.
>> >>>>>>>>>>>
>> >>>>>>>>>>> > Thoughts? Suggestions?
>> >>>>>>>>>>>
>> >>>>>>>>>>> You really must watch this presentation, if you haven't
>> >>>>>>>>>>> already:
>> >>>>>>>>>>>
>> >>>>>>>>>>>     http://www.youtube.com/watch?v=yuIOGfcOH0k
>> >>>>>>>>>>>
>> >>>>>>>>>>> --
>> >>>>>>>>>>> John Wiegley
>> >>>>>>>>>>> BoostPro Computing
>> >>>>>>>>>>> http://www.boostpro.com
>> >>>>>>>>>>> _______________________________________________
>> >>>>>>>>>>> cfe-dev mailing list
>> >>>>>>>>>>> cfe-dev at cs.uiuc.edu
>> >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> _______________________________________________
>> >>>>>>>>> cfe-dev mailing list
>> >>>>>>>>> cfe-dev at cs.uiuc.edu
>> >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> _______________________________________________
>> >>>>>>>> cfe-dev mailing list
>> >>>>>>>> cfe-dev at cs.uiuc.edu
>> >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>> >
>> > _______________________________________________
>> > cfe-dev mailing list
>> > cfe-dev at cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >
>
>