[cfe-dev] FW: RFC: YAML as an intermediate format for clang::tooling::Replacement data on disk

Wed Jul 31 10:20:17 PDT 2013

From:  Manuel Klimek <klimek at google.com>:
> 
> On Wed, Jul 31, 2013 at 6:33 PM, Du Toit, Stefanus
> <stefanus.du.toit at intel.com> wrote:
> 
> > On Wed, Jul 31, 2013 at 5:40 PM, Vane, Edwin
><edwin.vane at intel.com<mailto:edwin.vane at intel.com>> wrote:
> > -----Original Message-----
> > From: Vane, Edwin
> > Sent: Wednesday, July 31, 2013 11:40 AM
> > 
> > To: Clang Dev List (cfe-dev at cs.uiuc.edu<mailto:cfe-dev at cs.uiuc.edu>)
> > Subject: RFC: YAML as an intermediate format for
>clang::tooling::Replacement data on disk
> > 
> > > Hi all,
> > > 
> > > This discussion began on cfe-commits as the result of a commit
>(Tareq's poor header replacement patch that keeps getting reverted due to
>Windows build-bot issues). The start of the thread is here if you want
>background info:
> > > 
> > > 
>http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130729/084881
>.html 
><http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130729/08488
>1.html>.
> > > 
> > > The proposal: The C++11 Migrator has a need to write Replacement
>data: offset, length, and replacement text, to disk. The replacement data
>describes changes made to a header while transforming one or more TU's.
>All the replacement data would be gathered up
> > > after an entire code-base is transformed by a separate tool and
>merged together to produce actual changes to headers. So the point is to
>serialize Replacement data as a form of inter-process communication using
>the file system as the communication link. Real
> > > inter-process communication is a possibility but not portable.
> 
> > I have to wonder whether it's not easier to just ensure that headers
>are only transformed once.
> > 
> > I understand there's the issue of deciding what compiler flags to use
>when processing a header. My thoughts on that:
> > * For some projects, there aren't any per-file compiler flags, so it
>would be sufficient to just pass a general set of flags to the tool on
>the command line (e.g., with made up parameter syntax, something like
>'cpp11-migrate *.h ‹compile-flags="-I../include
> > DFOO"'Š)
> > * For other projects, a simple heuristic of matching
>"foo.{cpp,cc,cxx,Š}" to "foo.{h,hh,hpp,Š}" might be enough (lots of
>details to sort out here, like how to specify the directory structure,
>but hopefully you get the idea)
> > * For more complicated cases, one could add (whether manually or using
>some tool) entries to the compilation database for header files
> > 
> > With that in mind, why not treat header files like source files and
>process them separately?
> 
> How do you propose to treat template instantiations?
> 
> For example:
> a.h:
> template <typename T> class A { void x(T t) { t.y(); }}
> 
> x.cc:
> A<C> a; a.x();
> 
> Imagine we want to change C::y -> C::z. Now depending on which types A
>is instantiated with, it might be totally safe to refactor t.y() in A or
>not. So there needs to be a postprocessing step that figures that out
>anyway.

Given a class like:

template<typename T>
class MyVector {
  MyVector() : m_begin(0), m_end(0) {}
  MyVector(std::size_t size) : m_begin(new T[size]), m_end(m_begin + size)
{}

  // ...

private:
  T* m_begin;
  T* m_end;
};

I would love for cpp11-migrate to be able to turn that into something like:

template<typename T>
class MyVector {
  MyVector() = default;
  MyVector(std::size_t size) : m_begin(new T[size]), m_end(m_begin + size)
{}

  // ...

private:
  T* m_begin = nullptr;
  T* m_end = nullptr;
};

And I contend that it doesn't need to know what MyVector could be
instantiated with in order to do that.

Now, I totally understand that I may be asking for something very
difficult to do in Clang today. In which case, I'll accept whatever
limitations there are. But I don't think that assuming we'll see the full
set of instantiations is a great solution either.

> > If the issue is parallel compilation, deferring the replacements makes
>perfect sense as a way to resolve any read-write conflicts (transforming
>one header while it's being parsed as part of another TU). However, if
>you ensure that a header isn't touched by
> > multiple transformations, and generally ensure that transformations
>don't clobber each other by design, there's no need to merge anything.
> > 
> > Personally I would even accept a slightly more limited set of
>transformations in exchange for never having to worry about merging.
> 
>
> "Merging" is usually merely deduplication, which is not hard. I have the
>feeling that you think there's lots of complexity where it isn't. I'd
>definitely say that the heuristics you propose in order to be able to
>process headers on their own are much higher
> than the issue of deduplicating edits.

If it's just deduplication, it's no big deal - totally agreed.

I may have been thrown off by this: https://github.com/revane/migmerge_git

That tool seems to (aim to) handle a lot more than deduplication.

More importantly: I certainly hope it's not going to be necessary to
download a separate "merge tool" in order to use cpp11-migrate.

To me, cpp11-migrate is the kind of tool I'm just not going to use if it's
not dead simple to use. As a user, I don't really care how it works
internally, and I don't want to care :). I _am_ willing to tell it (a
reasonable amount of) things about my code base that it can't easily infer
itself.

Stefanus

--
Stefanus Du Toit <stefanus.du.toit at intel.com>
Intel Waterloo