[cfe-dev] For loop conversion (was: C++11 migration tools)
    Manuel Klimek 
    klimek at google.com
       
    Thu Jul 12 01:06:03 PDT 2012
    
    
  
On Wed, Jul 11, 2012 at 11:17 PM, Sam Panzer <panzer at google.com> wrote:
>
>
>
> On Wed, Jul 11, 2012 at 11:20 AM, Manuel Klimek <klimek at google.com> wrote:
>>
>> On Wed, Jul 11, 2012 at 8:08 PM, Sam Panzer <panzer at google.com> wrote:
>> > Good point. I currently have a matcher that finds for loops that appear
>> > to
>> > have the right shape for conversion, and I do some extra checking along
>> > with
>> > the conversion work in the callback. I can completely handle array-style
>> > loops with a Preprocessor, which is used when generating the names of
>> > new
>>
>> I don't understand yet what the Preprocessor is needed for here...
>
>
> I use Preprocessor in exactly one place right now, namely to create an
> IdentifierResolver. When I am generating the name of a new loop variable, I
> want to make sure that the identifier is unique, which is an approximation
> to checking that the identifier neither shadows an existing definition nor
> is shadowed in an inner scope. If there's a better way to do this, I would
> be happy to change it.
The ASTContext has the identifier table. (ASTContext::Idents).
>> > variables. For iterator-style loops, I use Sema to determine the return
>> > type
>> > of operator*, which will be the type of the loop variable. Finally
>> > (though
>>
>> Isn't the return type of the oeprator* already in the AST?
>
>
> It's actually possible that operator* is never called. My initial
> implementation tried to infer the type from  expressions involving the
> iterator, but this didn't work correctly for CXXMemberCallExpr. I can
> conceivably work around this without checking operator*, but
did you want to finish that
>> > this isn't implemented yet), containers that are used as if they're
>> > arrays
>> > will require Sema to check if suitable begin() and end() functions
>> > exist.
>>
>> Ah, I see - basically you want to trigger overload resolution to see
>> whether the conversion would work. That makes sense. I'm not sure we
>> want to get all of Sema available though - it's an insanely huge
>> interface, and it's rather hard to figure out what's safe to do and
>> what's not.
>>
>> Perhaps we can create a class that encapsulates Sema for those
>> "what-if" type questions we'll need to answer for refactoring tools?
>
>
> This sounds like a good idea. I can try to identify exactly the features I
> need (most likely just overload resolution).
That would sound like most of what we need. It has come up a few
times. Let me know if you identify more things we need.
Cheers,
/Manuel
>
>>
>>
>> > For now, I can force the use of auto rather than an explicit type
>> > listing
>> > for iterator-based loops and get most of the work done without Sema. The
>> > question then becomes the correct way to avoid repeatedly creating a
>> > Preprocessor object, though if this is a cheap operation, I can create
>> > one
>> > in the callback.
>> >
>> > Does this make the use clearer?
>>
>> Yep, thx.
>>
>> Cheers,
>> /Manuel
>>
>> > Thanks!
>> >
>> >
>> > On Wed, Jul 11, 2012 at 12:29 AM, Manuel Klimek <klimek at google.com>
>> > wrote:
>> >>
>> >> On Wed, Jul 11, 2012 at 6:03 AM, Sam Panzer <panzer at google.com> wrote:
>> >> > I'm trying to port my code to take advantage of matchers, now that
>> >> > they
>> >> > are
>> >> > in mainline. Some of the work I want to do involves semantic analysis
>> >> > of
>> >> > the
>> >> > results (i.e. in the callback). What would be the best way to get a
>> >> > Sema
>> >> > or
>> >> > CompilerInstance out of either RefactoringTool or MatchResult? I'm
>> >> > currently
>> >> > playing with changing MatchASTConsumer to inherit from SemaConsumer,
>> >> > so
>> >> > that
>> >> > MatchFinder can track a Sema object the same way it does an
>> >> > ASTContext.
>> >>
>> >> I'd be interested in what the use cases for the semantic analysis are
>> >> first. Often it seems like it would be better to have the information
>> >> available in the AST instead of rerunning (potentially expensive)
>> >> semantic analysis.
>> >>
>> >> Cheers,
>> >> /Manuel
>> >>
>> >> >
>> >> > Thanks!
>> >> >
>> >> >
>> >> > On Mon, Jul 2, 2012 at 7:22 AM, Manuel Klimek <klimek at google.com>
>> >> > wrote:
>> >> >>
>> >> >> On Mon, Jul 2, 2012 at 4:16 PM, Sam Panzer <panzer at google.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> On Sun, Jul 1, 2012 at 10:45 PM, Manuel Klimek <klimek at google.com>
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> On Fri, Jun 29, 2012 at 8:17 PM, Sam Panzer <panzer at google.com>
>> >> >>>> wrote:
>> >> >>>>>
>> >> >>>>> Thanks for the input!
>> >> >>>>>
>> >> >>>>> Tooling/Refactoring is definitely the right way to go - dumping
>> >> >>>>> to
>> >> >>>>> stdout was just a holdover from the example on LibTooling. I'll
>> >> >>>>> change it
>> >> >>>>> once I figure out how it works - and a clean way to arrange the
>> >> >>>>> tests.
>> >> >>>>>
>> >> >>>>> As for the use of matchers vs. visitors, I decided to use a
>> >> >>>>> visitor
>> >> >>>>> because this is a relatively complex transformation. I would
>> >> >>>>> happily
>> >> >>>>> use
>> >> >>>>> matchers if I thought I could - and I think that some other c++11
>> >> >>>>> migrations
>> >> >>>>> can easily be written with matchers - but I think the for loop
>> >> >>>>
>> >> >>>>
>> >> >>>> I'm not claiming that the matchers needed to match those
>> >> >>>> constructs
>> >> >>>> are
>> >> >>>> all already written - but if we write the questions you need to
>> >> >>>> ask
>> >> >>>> into
>> >> >>>> matchers, other people who want to match similar things can reuse
>> >> >>>> them, thus
>> >> >>>> amplifying the impact of the code you write ;)
>> >> >>>>
>> >> >>>>>
>> >> >>>>> checks need some features that matchers don't have (correct me if
>> >> >>>>> I'm
>> >> >>>>> wrong!). For example, the check for statically allocated
>> >> >>>>> array-based
>> >> >>>>> loops
>> >> >>>>> does this:
>> >> >>>>> Given a for loop, determine if:
>> >> >>>>>  - The increment statement increments (via ++) exactly one
>> >> >>>>> integral
>> >> >>>>> index variable, such that the variable is declared and
>> >> >>>>> initialized
>> >> >>>>> to zero
>> >> >>>>> in the init portion of the loop, and that this variable's value
>> >> >>>>> is a
>> >> >>>>> compared (via <, > or !=) to a nonnegative compile-time constant
>> >> >>>>> N
>> >> >>>>> in the
>> >> >>>>> compare portion.
>> >> >>>>>  - The index variable is only ever used in an ArrayIndexExpession
>> >> >>>>> indexing a single, statically allocated array A.
>> >> >>>>>  - The array A has exactly N elements.
>> >> >>>>>  - Additionally, if the ArrayIndexExpression A[index] is ever
>> >> >>>>> assigned,
>> >> >>>>> passed to a function or copied as a non-const reference, or its
>> >> >>>>> address
>> >> >>>>> taken with & (I still need to add a check for calls to non-const
>> >> >>>>> member
>> >> >>>>> functions), the loop variable in the converted version needs to
>> >> >>>>> be a
>> >> >>>>> non-const reference so that the value will be correctly updated
>> >> >>>>> (this step
>> >> >>>>> adds the most complexity).
>> >> >>>>
>> >> >>>>
>> >> >>>> ... and the matcher I would want to write for this looks something
>> >> >>>> like
>> >> >>>> that:
>> >> >>>> ForLoop(
>> >> >>>>   HasInitialization(Declaration(Id("loopvar",
>> >> >>>> HasType(IsIntegral())))),
>> >> >>>>   HasCondition(BinaryOperator(
>> >> >>>>     HasAnyOperand(DeclarationReference(Id("condref",
>> >> >>>> To(Variable())))),
>> >> >>>>     HasAnyOperand(IntegerLiteral()))),
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> HasIncrement(UnaryOperator(HasUnaryOperand(DeclarationReference(Id("incref",
>> >> >>>> To(Variable()))))), ...),
>> >> >>>> )
>> >> >>>>
>> >> >>>> In general, the complex stuff can stay complex, but the simple
>> >> >>>> stuff
>> >> >>>> shouldn't be lots of code.
>> >> >>>
>> >> >>>
>> >> >>> Good point - and this is much easier to read than the equivalent
>> >> >>> code
>> >> >>> I
>> >> >>> had written.
>> >> >>>
>> >> >>>>
>> >> >>>>
>> >> >>>>>
>> >> >>>>> The other types of loop (iterator-based, and array-like
>> >> >>>>> container)
>> >> >>>>> are
>> >> >>>>> more complicated to detect, since there are more permitted ways
>> >> >>>>> to
>> >> >>>>> define
>> >> >>>>> and use the index/iterators. What makes this difficult to do
>> >> >>>>> entirely with
>> >> >>>>> matchers is the number of back- and cross-references, as well as
>> >> >>>>> the
>> >> >>>>> different local behaviors based on semantic properties. On the
>> >> >>>>> other
>> >> >>>>> hand,
>> >> >>>>> if there were some kind of backreference-enabled matcher that
>> >> >>>>
>> >> >>>>
>> >> >>>> The way to handle the back-refs is to bind the nodes you want, and
>> >> >>>> then
>> >> >>>> pull them out and compare them in the callback.
>> >> >>>
>> >> >>>
>> >> >>> I see - make the matcher slightly more general, then filter the
>> >> >>> results,
>> >> >>> perhaps with a visitor.
>> >> >>>
>> >> >>>>
>> >> >>>>
>> >> >>>> Thoughts?
>> >> >>>> /Manuel
>> >> >>>
>> >> >>>
>> >> >>> This sounds like it would make at least half the work much easier,
>> >> >>> so
>> >> >>> I
>> >> >>> think it would definitely be worth it to try switching to a
>> >> >>> matcher-based
>> >> >>> solution. When are matchers supposed to hit mainline (or some extra
>> >> >>> cloneable repo) :) ?
>> >> >>
>> >> >>
>> >> >> Matchers are currently in
>> >> >> ^cfe/branches/tooling/include/clang/ASTMatchers/...
>> >> >>
>> >> >> I'm currently working on renaming them to camelCase from CamelCase;
>> >> >> there's a Tool to help with the conversion though, so no problem in
>> >> >> starting
>> >> >> now ...
>> >> >>
>> >> >> Cheers,
>> >> >> /Manuel
>> >> >>
>> >> >>>
>> >> >>>
>> >> >>> -Sam
>> >> >>>
>> >> >>>>
>> >> >>>>
>> >> >>>>>
>> >> >>>>> allowed me to locate all matches in a given Stmt, it could be
>> >> >>>>> *much*
>> >> >>>>> easier to express some parts of the logic, such as the first step
>> >> >>>>> in
>> >> >>>>> the
>> >> >>>>> above list. I also suspect that a single-Stmt matcher would
>> >> >>>>> better
>> >> >>>>> way to
>> >> >>>>> handle the last step; currently I track  whether the visitor is
>> >> >>>>> looking at a
>> >> >>>>> statement or expression which fits any of the const-disqualifying
>> >> >>>>> conditions, and a note is made if I run into A[index].
>> >> >>>>>
>> >> >>>>> Does this make the use case clearer? I don't really see a better
>> >> >>>>> way
>> >> >>>>> to
>> >> >>>>> approach this problem, but you guys know the available toolkit
>> >> >>>>> far
>> >> >>>>> better
>> >> >>>>> than I do.
>> >> >>>>>
>> >> >>>>> On Fri, Jun 29, 2012 at 2:48 AM, Manuel Klimek
>> >> >>>>> <klimek at google.com>
>> >> >>>>> wrote:
>> >> >>>>>>
>> >> >>>>>> On Fri, Jun 29, 2012 at 11:45 AM, Chandler Carruth
>> >> >>>>>> <chandlerc at google.com> wrote:
>> >> >>>>>>>
>> >> >>>>>>> I tend to agree that this should use the Tooling/Refactoring
>> >> >>>>>>> stuff.
>> >> >>>>>>>
>> >> >>>>>>> However, I'm curious about the best way to structure the
>> >> >>>>>>> location
>> >> >>>>>>> of
>> >> >>>>>>> migration candidates: AST matchers vs. visitor.
>> >> >>>>>>>
>> >> >>>>>>> I can almost see the visitor pattern working really well here
>> >> >>>>>>> as
>> >> >>>>>>> each
>> >> >>>>>>> different construct can have a pile of migration logic dropped
>> >> >>>>>>> in.... But if
>> >> >>>>>>> there is a need to connect dots between more distant
>> >> >>>>>>> constructs,
>> >> >>>>>>> that
>> >> >>>>>>> wouldn't work so well.... Not at all sure what would be best
>> >> >>>>>>> here.
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> I've used a combination before - use matchers for the stuff
>> >> >>>>>> where
>> >> >>>>>> they
>> >> >>>>>> work well, then write a very small easy-to-understand visitor if
>> >> >>>>>> you need
>> >> >>>>>> more. I think that brings down code size by quite a bit -
>> >> >>>>>> obviously
>> >> >>>>>> just a
>> >> >>>>>> gut feeling here.
>> >> >>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> On Fri, Jun 29, 2012 at 1:37 AM, Manuel Klimek
>> >> >>>>>>> <klimek at google.com>
>> >> >>>>>>> wrote:
>> >> >>>>>>>>
>> >> >>>>>>>> On Fri, Jun 29, 2012 at 4:06 AM, Sam Panzer
>> >> >>>>>>>> <panzer at google.com>
>> >> >>>>>>>> wrote:
>> >> >>>>>>>>>
>> >> >>>>>>>>> In case anyone wanted to take a look, the attached patch
>> >> >>>>>>>>> includes
>> >> >>>>>>>>> the tool I've been working on. I create a new binary,
>> >> >>>>>>>>> c++migrate, which
>> >> >>>>>>>>> attempts to convert for loops in the source files given to
>> >> >>>>>>>>> it.
>> >> >>>>>>>>> Most of my
>> >> >>>>>>>>> focus has been on the FrontedAction, so I skirted all of the
>> >> >>>>>>>>> issues
>> >> >>>>>>>>> mentioned above by keeping the frontend interaction minimal
>> >> >>>>>>>>> (i.e. I just
>> >> >>>>>>>>> call Tooling::ClangTool::run), and the changes are just
>> >> >>>>>>>>> reported
>> >> >>>>>>>>> on standard
>> >> >>>>>>>>> output, if there are any to be made.
>> >> >>>>>>>>>
>> >> >>>>>>>>> The tool can currently convert for loops that range over (1)
>> >> >>>>>>>>> statically allocated arrays, and (2) Clang-style
>> >> >>>>>>>>> iterator-based
>> >> >>>>>>>>> loops (with
>> >> >>>>>>>>> begin and end iterators defined). All loop variables need to
>> >> >>>>>>>>> be
>> >> >>>>>>>>> declared
>> >> >>>>>>>>> within the loop's initialization step in order for it to be
>> >> >>>>>>>>> converted,
>> >> >>>>>>>>> though this requirement can potentially be eliminated. I'm
>> >> >>>>>>>>> working on
>> >> >>>>>>>>> converting iterator-based loops that call someContainer.end()
>> >> >>>>>>>>> on
>> >> >>>>>>>>> each
>> >> >>>>>>>>> iteration, since they're probably the common case in many
>> >> >>>>>>>>> codebases.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Just for fun, I ran the tool over the 41 .cpp files in
>> >> >>>>>>>>> lib/Sema,
>> >> >>>>>>>>> and my tool found 71 convertible loops in 17 files. There is
>> >> >>>>>>>>> plenty more
>> >> >>>>>>>>> work to go, because it clearly missed some easy ones.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Any input or feedback is welcome!
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>> High-level observations:
>> >> >>>>>>>> 1. the handling of the rewrites; any reason not to use the
>> >> >>>>>>>> Tooling/Refactoring stuff? Currently in the patch it looks to
>> >> >>>>>>>> me
>> >> >>>>>>>> like the
>> >> >>>>>>>> files are not rewritten, but dumped to stdout
>> >> >>>>>>>> 2. is the reason not to use the matchers here that they're not
>> >> >>>>>>>> landed in mainline yet?
>> >> >>>>>>>>
>> >> >>>>>>>> Cheers,
>> >> >>>>>>>> /Manuel
>> >> >>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>> -Sam
>> >> >>>>>>>>>
>> >> >>>>>>>>> On Thu, Jun 28, 2012 at 10:50 AM, Sam Panzer
>> >> >>>>>>>>> <panzer at google.com>
>> >> >>>>>>>>> wrote:
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> I'm that intern :)
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> -Sam
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> On Wed, Jun 27, 2012 at 9:48 PM, John Wiegley
>> >> >>>>>>>>>> <johnw at boostpro.com>
>> >> >>>>>>>>>> wrote:
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> >>>>> Sam Panzer <panzer at google.com> writes:
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> > In particular, I am working on a tool to convert existing
>> >> >>>>>>>>>>> > C++
>> >> >>>>>>>>>>> > for loops to
>> >> >>>>>>>>>>> > take advantage of the new C++11 range-based syntax. I can
>> >> >>>>>>>>>>> > imagine similar
>> >> >>>>>>>>>>> > tools to replace const with constexpr, macro hacks with
>> >> >>>>>>>>>>> > static_assert, and
>> >> >>>>>>>>>>> > potentially other common refactorings.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> > Thoughts? Suggestions?
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> You really must watch this presentation, if you haven't
>> >> >>>>>>>>>>> already:
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>     http://www.youtube.com/watch?v=yuIOGfcOH0k
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> --
>> >> >>>>>>>>>>> John Wiegley
>> >> >>>>>>>>>>> BoostPro Computing
>> >> >>>>>>>>>>> http://www.boostpro.com
>> >> >>>>>>>>>>> _______________________________________________
>> >> >>>>>>>>>>> cfe-dev mailing list
>> >> >>>>>>>>>>> cfe-dev at cs.uiuc.edu
>> >> >>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>> _______________________________________________
>> >> >>>>>>>>> cfe-dev mailing list
>> >> >>>>>>>>> cfe-dev at cs.uiuc.edu
>> >> >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >> >>>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>> _______________________________________________
>> >> >>>>>>>> cfe-dev mailing list
>> >> >>>>>>>> cfe-dev at cs.uiuc.edu
>> >> >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >> >>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > cfe-dev mailing list
>> >> > cfe-dev at cs.uiuc.edu
>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >> >
>> >
>> >
>
>
    
    
More information about the cfe-dev
mailing list