[cfe-dev] FW: RFC: YAML as an intermediate format for clang::tooling::Replacement data on disk

Wed Jul 31 10:27:54 PDT 2013

On Wed, Jul 31, 2013 at 7:20 PM, Du Toit, Stefanus <
stefanus.du.toit at intel.com> wrote:

> From:  Manuel Klimek <klimek at google.com>:
> >
> > On Wed, Jul 31, 2013 at 6:33 PM, Du Toit, Stefanus
> > <stefanus.du.toit at intel.com> wrote:
> >
> > > On Wed, Jul 31, 2013 at 5:40 PM, Vane, Edwin
> ><edwin.vane at intel.com<mailto:edwin.vane at intel.com>> wrote:
> > > -----Original Message-----
> > > From: Vane, Edwin
> > > Sent: Wednesday, July 31, 2013 11:40 AM
> > >
> > > To: Clang Dev List (cfe-dev at cs.uiuc.edu<mailto:cfe-dev at cs.uiuc.edu>)
> > > Subject: RFC: YAML as an intermediate format for
> >clang::tooling::Replacement data on disk
> > >
> > > > Hi all,
> > > >
> > > > This discussion began on cfe-commits as the result of a commit
> >(Tareq's poor header replacement patch that keeps getting reverted due to
> >Windows build-bot issues). The start of the thread is here if you want
> >background info:
> > > >
> > > >
> >
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130729/084881
> >.html
> ><
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130729/08488
> >1.html>.
> > > >
> > > > The proposal: The C++11 Migrator has a need to write Replacement
> >data: offset, length, and replacement text, to disk. The replacement data
> >describes changes made to a header while transforming one or more TU's.
> >All the replacement data would be gathered up
> > > > after an entire code-base is transformed by a separate tool and
> >merged together to produce actual changes to headers. So the point is to
> >serialize Replacement data as a form of inter-process communication using
> >the file system as the communication link. Real
> > > > inter-process communication is a possibility but not portable.
> >
> > > I have to wonder whether it's not easier to just ensure that headers
> >are only transformed once.
> > >
> > > I understand there's the issue of deciding what compiler flags to use
> >when processing a header. My thoughts on that:
> > > * For some projects, there aren't any per-file compiler flags, so it
> >would be sufficient to just pass a general set of flags to the tool on
> >the command line (e.g., with made up parameter syntax, something like
> >'cpp11-migrate *.h ‹compile-flags="-I../include
> > > DFOO"'Š)
> > > * For other projects, a simple heuristic of matching
> >"foo.{cpp,cc,cxx,Š}" to "foo.{h,hh,hpp,Š}" might be enough (lots of
> >details to sort out here, like how to specify the directory structure,
> >but hopefully you get the idea)
> > > * For more complicated cases, one could add (whether manually or using
> >some tool) entries to the compilation database for header files
> > >
> > > With that in mind, why not treat header files like source files and
> >process them separately?
> >
> > How do you propose to treat template instantiations?
> >
> > For example:
> > a.h:
> > template <typename T> class A { void x(T t) { t.y(); }}
> >
> > x.cc:
> > A<C> a; a.x();
> >
> > Imagine we want to change C::y -> C::z. Now depending on which types A
> >is instantiated with, it might be totally safe to refactor t.y() in A or
> >not. So there needs to be a postprocessing step that figures that out
> >anyway.
>
> Given a class like:
>
> template<typename T>
> class MyVector {
>   MyVector() : m_begin(0), m_end(0) {}
>   MyVector(std::size_t size) : m_begin(new T[size]), m_end(m_begin + size)
> {}
>
>   // ...
>
> private:
>   T* m_begin;
>   T* m_end;
> };
>
> I would love for cpp11-migrate to be able to turn that into something like:
>
> template<typename T>
> class MyVector {
>   MyVector() = default;
>   MyVector(std::size_t size) : m_begin(new T[size]), m_end(m_begin + size)
> {}
>
>   // ...
>
>
> private:
>   T* m_begin = nullptr;
>   T* m_end = nullptr;
> };
>
> And I contend that it doesn't need to know what MyVector could be
> instantiated with in order to do that.
>

I'm not sure what an example of a change that doesn't need to know the
instantiations has to do with it. I fully agree that those exist :) I'm
merely proposing that the others exist, too.

> Now, I totally understand that I may be asking for something very
> difficult to do in Clang today. In which case, I'll accept whatever
> limitations there are. But I don't think that assuming we'll see the full
> set of instantiations is a great solution either.
>

I think it'll be a necessary solution for many refactoring tools.

> > > If the issue is parallel compilation, deferring the replacements makes
> >perfect sense as a way to resolve any read-write conflicts (transforming
> >one header while it's being parsed as part of another TU). However, if
> >you ensure that a header isn't touched by
> > > multiple transformations, and generally ensure that transformations
> >don't clobber each other by design, there's no need to merge anything.
> > >
> > > Personally I would even accept a slightly more limited set of
> >transformations in exchange for never having to worry about merging.
> >
> >
> > "Merging" is usually merely deduplication, which is not hard. I have the
> >feeling that you think there's lots of complexity where it isn't. I'd
> >definitely say that the heuristics you propose in order to be able to
> >process headers on their own are much higher
> > than the issue of deduplicating edits.
>
> If it's just deduplication, it's no big deal - totally agreed.
>
> I may have been thrown off by this: https://github.com/revane/migmerge_git
>
> That tool seems to (aim to) handle a lot more than deduplication.
>
> More importantly: I certainly hope it's not going to be necessary to
> download a separate "merge tool" in order to use cpp11-migrate.
>

I actually agree that it makes sense to have cpp11-migrate work on single
TUs (or multiple TUs) without the need for an extra tool, by just running
in process over all the TUs and applying all the changes.

> To me, cpp11-migrate is the kind of tool I'm just not going to use if it's
> not dead simple to use. As a user, I don't really care how it works
> internally, and I don't want to care :). I _am_ willing to tell it (a
> reasonable amount of) things about my code base that it can't easily infer
> itself.
>

cpp11-migrate is something that I'd expect most users to run *once* in
their life, over a whole code-base. I don't see that the extra overhead of
running a special tool would be too much effort. I also think it'd be great
if it still worked without the extra tool (as noted above), but I don't
think it's a necessity.

Cheers,
/Manuel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130731/00c85cdb/attachment.html>