On Tue, Jun 14, 2011 at 10:46 PM, Chris Lattner <span dir="ltr"><<a href="mailto:clattner@apple.com">clattner@apple.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div style="word-wrap:break-word"><div><div class="im"><div>On Jun 14, 2011, at 10:10 PM, Manuel Klimek wrote:</div><blockquote type="cite"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

<div style="word-wrap:break-word"><div><div><blockquote type="cite"><div>The library we implemented based on those findings allows a very declarative way of expressing interesting parts of the AST that you want to work on, with a callback mechanism to have an imperative part that does the actual transformation. The way this library uses templates to build up a reverse hierarchy of the clang AST allows us to expand the library of matchers without needing to manually copy the structure of the AST while still providing full compile time type checking.</div>


<div>We have used this library for many internal transformations (one more broadly applicable of them being included as an example tool), and Nico may be able to explain more how he's using the infrastructure for Chromium.</div>


</blockquote><br></div></div><div>My primary concern with this work is that it is built as a series of *compile-time* code constructs.  This means that someone working on a rewriter needs to rebuild the clang executable to do this.  Instead of doing this as a compile-time construct, have you considered building this as an extensible pattern matching tool where the pattern and replacement can be specified at runtime?  I'm envisioning something like a "regular expression for an AST".  You don't need to rebuild grep every time you want to search for something new in text.</div>


<div><br></div><div>Even in the case where compiled-in code is required, having a more dynamic matcher would greatly simplify the code doing the matching.  Have you considered this approach?</div></div></blockquote><div>

<br>

</div><div>Yes, we've considered this approach early on. We looked into both some Java  and C-based solutions (see for example <a href="http://nighthacks.com/roller/jag/" target="_blank">http://nighthacks.com/roller/jag/</a>; for Java there are really really bad examples that match Java to XML and do xpath queries). The problem is that building that pattern matcher language would not be straight-forward (simply writing C++ with a few globs would not be enough for our current use cases, since C++ has a lot of implicit stuff going on which we want to match, and just creating an arbitrary new language doesn't necessarily seem better than the in-language DSL).</div>

</div></blockquote><div><br></div></div><div>I'm not suggesting that you parse "C++ with holes in it" as the pattern matching language.  It would be perfectly acceptable to represent the patterns as S expressions for example.  You currently use this sort of thing at compile time:</div>

<div><br></div><div><span style="font-family:Times"><pre>ConstructorCall(HasDeclaration(Method(HasName(StringConstructor))),

                ArgumentCountIs(2),...</pre></span><div><br></div><div>This is basically already s expressions, you could either use this sort of thing unmodified if you think it is nice looking, or convert to proper s exprs, as in:</div>

<div><br></div><div>(constructorcall (hasdeclaration (method (hasname stringconstructor)))</div><div>                             (argumentcountis 2) ...</div></div><div><br></div><div><br></div><div>The point is to make the matching language have really simple and trivial syntax, but syntax that is usable without rebuilding the compiler.  There are tons of examples of tree pattern matches (including things like Burg) to draw inspiration from.</div>

</div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div><div class="im"><br><blockquote type="cite"><div class="gmail_quote">


<div>Considering that we want to eventually get a dynamic pattern matching language, but we also want to get it right, we are currently spending our time on the in-language DSL, and especially for the large scale stuff the developers we work with need surprisingly little help (the included example for replacing reduncant c_str() calls was created by a contributor who's not worked on the implementation of the matcher library).</div>

</div></blockquote><div><br></div></div><div>I'm not sure I get this logic.  You're saying that you're afraid you won't get the matching language right, so you'd avoiding it and doing something you know is wrong ;-).  I expect much iteration on this, but all that requires is to tell people to expect breakage as you get experience with it and evolve things.</div>

</div></div></blockquote><div><br></div><div>I don't think our current code is "wrong". I think it is useful for a certain amount of tools, and as the basis of more dynamic tools. The main problem with building a dynamic language is that it's significantly more effort, as it will require to be a lot more complete before being useful. In the C++ world the user can always fall back to using the AST methods available.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div><div class="im"><br><blockquote type="cite"><div class="gmail_quote">


<div>And when we get to the higher-level refactoring tools, the dynamic aspect will be parameters to the refactoring, so the non-dynamic nature of the AST matchers does not matter for that case.</div></div></blockquote><div>

<br></div></div><div>Well, the second step is to be able to specify rewrites dynamically the same way you specify predicates.  In the case of a "real" refactoring engine, you'll probably want the power of a scripting language or something to write your predicates in.</div>

</div></div></blockquote><div><br></div><div>Again, a lot more effort, and I think we can base a more dynamic solution on what we currently develop.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div style="word-wrap:break-word"><div><div class="im"><blockquote type="cite"><div class="gmail_quote"><div>When we look at the actual transformations, being in C++ again provides the benefit that we can just work with the AST nodes we matched instead of having to define some new way of dynamically specifying the transformations - and re-binding the AST in a dynamic language is definitely out-of-scope for us...</div>

</div></blockquote><div><br></div></div><div>I see the convenience in this, but still think it is the wrong way to go.</div><div class="im"><br><blockquote type="cite"><div class="gmail_quote">

<div>In the end, I agree that the vision to have a really nice dynamic description of the matches is the ultimate goal, but for us this is currently still a few quarters out. The C++ code provides really useful abstractions to quickly describe matches and transformations on the AST, with little code (as we can use C++ to provide the type safety and thus the error checking on the AST nodes). The cost of a link-step while writing the tools has so far not been a big obstacle, especially considering that our main target users currently are a) C++ experts doing large scale code transformations and b) writing refactoring tools that end users can use without any knowledge about the AST.</div>


</div></blockquote><br></div></div><div>Beyond being "the wrong way to go" IMHO, there are several other problems with the code as proposed:</div><div><br></div><div>1. It doesn't following the LLVM coding standards, particularly around naming, using std::map<std::string, using C headers like <assert.h>, and a bunch of other stuff.</div>

</div></blockquote><div><br></div><div>I tried to make it llvm coding style conforming, and had Chandler and Zhanyong review it - I'd be happy to change whatever we missed.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div style="word-wrap:break-word"><div><br></div><div>2. You're building substantial new functionality into clang.  The clang binary is already overly bloated with the static analyzer and other functionality that it keeps accreting .  It would be better to use (and improve) the clang plugin mechanism to build this as a plugin.  I'd also like the analyzer to move off to being a plugin as well.  One carrot that we can give for people to build things as plugins is that they can use C++'0x features even though the clang compiler has to stay C++'98 for the forseeable future. </div>

</div></blockquote><div><br></div><div>The point why I don't want to run tools as a plugin is that the command line syntax for tools can be significantly different from running the compiler. I don't think this needs to be linked into the clang binary though - as nothing in clang depends on it (confused...)</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><br><div>3. The tooling infrastructure adds python stuff to do the rewrites.  This seems pretty half-baked to me.  If the whole reason to compile stuff in is to make things simpler, why do we need external scripts?</div>

</div></blockquote><div><br></div><div>Yes, this is an intermediate step that will not be necessary soon. This is work in progress...</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div style="word-wrap:break-word"><div><br></div><div>4. Building this as compile-time stuff requires things like VariadicFunctions.h, which (if generally useful) should be in LLVM, not clang.  It is better to define away the problem though by not doing this stuff at compile time.</div>

<div><br></div><div>5. Adding major new stuff like JSON parsing, etc. All of these (if they even make sense) should be independently reviewed and submitted, not taking as one mega patch.</div></div></blockquote><div><br></div>

<div>This was actually going in as multiple smaller patches. I just deleted everything at once, as it was all submitted with Doug saying that we should go ahead and he'll review post-submit, but after you intervened I thought it doesn't make sense to leave half of the changes in. Then I re-diffed the whole thing and put it in as one giant patch that would revert the delete.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div>Overall, this is exactly the sort of thing that happens when someone develops a large amount of code out of tree, without input from other contributors, and then tries to spring it on an open source project.  While I really laud your goals and really want to push refactoring forward, this is not the right direction to start from. Trying to push a huge patch in isn't the way to get to something that is truly great in the mainline tree.</div>

</div></blockquote><div><br></div><div>I certainly don't want to "spring it" on clang - and I'm sorry if I gave this impression. I really tried to make sure that each step I did was ok, trying to split it up into manageable pieces, confirming each check in with Doug, and trying to get feedback - again, I'm sorry if that was not the right approach. If you don't think this is useful to host in clang, because you think it's not the right approach, than this is completely your call.</div>

<div><br></div><div>I definitely think what we have is useful for developers who use clang, and as such I think the clang code base would be a good place to host it, especially since I'd really appreciate feedback from you guys while developing the code. But if you decide that we should host this code elsewhere I'm fine with that, too.</div>

<div><br></div><div>Cheers,</div><div>/Manuel</div></div>