[cfe-commits] [PATCH] X-TU Refactoring support

Manuel Klimek klimek at google.com
Thu May 10 06:55:15 PDT 2012


After pondering about this some more, my concrete proposal would
probably be to have 2 methods on the Lexer:
makeLargestFullMatchingFileCharRange(...)
makeSmallestFullMatchingFileCharRange(...)

(names are prolly crap, suggestiongs welcome :)

both taking a range; the first trying to find the outermost range that
exactly covers the given range, the second trying to find the
innermost range that exactly covers the given range.h

The first one is basically makeFileCharRange, plus being able to get
the spelling location,
the second one is basically getting the spelling location, plus some
magic when the AST node is created by from multiple different macro
expansions.

Thoughts?
/Manuel

On Thu, May 10, 2012 at 3:35 PM, Manuel Klimek <klimek at google.com> wrote:
> On Fri, May 4, 2012 at 7:56 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com> wrote:
>> On May 3, 2012, at 11:10 AM, Manuel Klimek wrote:
>>
>>> On Fri, Apr 27, 2012 at 7:49 PM, Douglas Gregor <dgregor at apple.com> wrote:
>>>> On Apr 19, 2012, at 4:30 AM, Manuel Klimek <klimek at google.com> wrote:
>>>>
>>>>> The attached patch adds a layer on top of the tooling support for
>>>>> cross-TU refactorings.
>>>>
>>>>
>>>> +static const char * const InvalidLocation = "invalid-location";
>>>> +
>>>> +Replacement::Replacement()
>>>> +  : FilePath(InvalidLocation), Offset(0), Length(0) {}
>>>> +
>>>>
>>>> If we have to pick a sentinel, can it at least be an empty FilePath?
>>>>
>>>> +bool saveRewrittenFiles(Rewriter &Rewrite) {
>>>> +  for (Rewriter::buffer_iterator I = Rewrite.buffer_begin(),
>>>> +                                 E = Rewrite.buffer_end();
>>>> +       I != E; ++I) {
>>>> +    // FIXME: This code is copied from the FixItRewriter.cpp - I think it should
>>>> +    // go into directly into Rewriter (there we also have the Diagnostics to
>>>> +    // handle the error cases better).
>>>> +    const FileEntry *Entry =
>>>> +        Rewrite.getSourceMgr().getFileEntryForID(I->first);
>>>> +    std::string ErrorInfo;
>>>> +    llvm::raw_fd_ostream FileStream(
>>>> +        Entry->getName(), ErrorInfo, llvm::raw_fd_ostream::F_Binary);
>>>> +    if (!ErrorInfo.empty())
>>>> +      return false;
>>>> +    I->second.write(FileStream);
>>>> +    FileStream.flush();
>>>> +  }
>>>> +  return true;
>>>> +}
>>>>
>>>> I agree that this should go into the Rewriter. Go for it :)
>>>>
>>>> +int getRangeSize(SourceManager &Sources, const CharSourceRange &Range) {
>>>> +  SourceLocation SpellingBegin = Sources.getSpellingLoc(Range.getBegin());
>>>> +  SourceLocation SpellingEnd = Sources.getSpellingLoc(Range.getEnd());
>>>> +  std::pair<FileID, unsigned> Start = Sources.getDecomposedLoc(SpellingBegin);
>>>> +  std::pair<FileID, unsigned> End = Sources.getDecomposedLoc(SpellingEnd);
>>>> +  if (Start.first != End.first) return -1;
>>>> +  if (Range.isTokenRange())
>>>> +    // FIXME: Bring in the correct LangOptions.
>>>> +    End.second += Lexer::MeasureTokenLength(SpellingEnd, Sources,
>>>> +                                            LangOptions());
>>>> +  return End.second - Start.second;
>>>> +}
>>>>
>>>> This should probably return an 'unsigned'. It belongs in the Lexer, IMO.
>>>
>>> So return bool and use and unsigned reference to return the value?
>>>
>>> My main concern with moving that right now is that we recently
>>> discovered that this might not be the right solution yet to create a
>>> replacement from an AST node - we'll probably want to drill through
>>> macros and find the right point inside the expansion closest to the
>>> spelling where the full node is expressed.
>>> I've looked at makeFileCharRange in the Lexer, which seemed related at
>>> a first glance, but I have no idea what it really does :)
>>
>> Ok, I consider that a failure in documenting makeFileCharRange.
>>
>> That function is trying to deal with macros and return a range based on file locations, which I assume is what you are trying to do as well.
>> Dealing with macros in general is of course an open problem, note that the function won't give you a range inside the macro definition.
>> The cases where it can successfully handle macros are:
>>
>> -begin or end range lies at the start or end of a macro expansion, in which case the location will be set to the expansion point, e.g:
>>
>> #define M 1 2
>> a M
>>
>> If you have a range [a, 2] (where 2 came from the macro), the function will return a range for "a M"
>> if you have range [a, 1], the function will fail because the range overlaps with only a part of the macro
>>
>> -The macro is a function macro and the range can be mapped to the macro arguments, e.g:
>>
>> #define M 1 2
>> #define FM(x) x
>>
>> FM(a b M)
>>
>> if you have range [b, 2], the function will return the file range "b M" inside the macro arguments
>> if you have range [a, 2], the function will return the file range "FM(a b M)" since the range includes all of the macro expansion.
>>
>> Interested in thoughts on improving this. I guess we could allow conditionally mapping inside the macro definition if the range is part of the macro definition, and leave it to the caller to make sure changing the definition will be valid for all uses of it.
>
> So, for the case where I want to generate Replacements based on an AST
> nodes, I think there are 2 cases where makeFileCharRange does not what
> I need.
>
> One case is mapping inside the macro definition - in this case I agree
> with your assessment; it is definitely cool when you can rename a
> function and all calls inside macros are automatically updated.
>
> The second case is that I mostly want the innermost location, where
> the full node is in the text. Consider:
> #define A(X) X
> A(class Y)
> In this case when I want to replace the CXXRecordDecl, I usually want
> to replace the text inside the macro argument, *unless* I want to
> completely delete the code.
>
> I realize that those might be pretty special cases. Not sure we can
> find a common abstraction to pull out, but if we can, that would
> definitely help.
>
> Also, I think there's no good way to represent source manager
> independent character accurate ranges - am I missing something? If
> not, would we want to introduce a class for that?
>
> Thanks,
> /Manuel
>
>
>
>
>>> My problem is currently that I could need some ideas on what to do
>>> here, as the general case seems really hard to implement and I'm a
>>> little lost.
>>>
>>> On the other hand, I think using the spelling location is a good first
>>> approximation (as it actually works most of the time), so I think this
>>> is a good first step, and I don't think keeping the implementation of
>>> how to figure out what range is covered by a node a detail of the
>>> refactoring lib is a big problem right now.
>>>
>>> Thoughts?
>>>
>>>> I'm not sure whether you've looked at the Edit library or not, but its functionality fits well with what you're doing here, because it can order rewrites such that a given set of rewrite operations is more likely to succeed. For example, if one rewrite edits part of a function body and another rewrite removes that function entirely, the edit library will deal with it whereas the rewriter by itself could fail. There's no specific action item here, other than a suggestion that there's some potential goodness in combining these features.
>>>
>>> I've looked and found that it didn't completely fit what we're trying
>>> to do with the refactoring library (mainly creating intermediate
>>> representations that are not depending on a sourcemanager to then
>>> apply them later in bulk over different TUs). I'll take a deeper look
>>> to figure out how we might get some of the benefits you mentioned out
>>> of it.
>>
>> That library is mainly useful for composing multiple rewrites in a single translation-unit, making sure the rewrites won't mess up the source:
>>
>> -Keeps track of what is getting removed so that if rewrites inadvertently remove something more than once, the rewrites compose correctly
>> -Makes use of the aforementioned makeFileCharRange to deal with macros
>> -Makes sure a rewrite won't mess up the source because of #if/#endif conditional blocks (the preprocessor must be enabled to keep track of those)
>> -Allows for "transaction" rewrites, in that either all of the rewrites in a "transaction" can be applied correctly (no messing up the source) or none of the rewrites will be applied.
>>
>> At the end it can give a list of changes with file ranges for that translation unit, but composing multiple rewrites across different TUs still needs to be handled one level higher than that.
>>
>>>
>>> Thanks for the review!
>>> /Manuel
>>>
>>> _______________________________________________
>>> cfe-commits mailing list
>>> cfe-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>




More information about the cfe-commits mailing list