[cfe-commits] [PATCH] Change counter-intuitive behavior of isDerivedFrom

Sat Sep 8 02:09:36 PDT 2012

On Sat, Sep 8, 2012 at 1:37 AM, Daniel Jasper <djasper at google.com> wrote:

> On Fri, Sep 7, 2012 at 10:18 PM, Manuel Klimek <klimek at google.com> wrote:
>
>> On Fri, Sep 7, 2012 at 9:55 PM, Chandler Carruth <chandlerc at google.com>
>> wrote:
>> > On Fri, Sep 7, 2012 at 8:06 AM, Daniel Jasper
>> > <reviews at llvm-reviews.chandlerc.com> wrote:
>> >>
>> >> Hi klimek,
>> >>
>> >> Change the behavior of the isDerivedFrom-matcher to not match on the
>> class
>> >> itself. This caused some confusion (intuitively, a class is not
>> derived from
>> >> itself) and makes it hard to write certain matchers, e.g. "match and
>> bind
>> >> any pair of base and subclass".
>> >>
>> >> The original behavior can be achieved with a new isA-matcher.  Similar
>> to
>> >> all other matchers, this matcher has the same behavior and name as the
>> >> corresponding AST-entity - in this case the isa<>() function.
>> >
>> >
>> > This is not a new idea. In fact, when we first built the isDerivedFrom
>> > matcher over a year ago Manuel and others argued for what you propose. I
>> > argued firmly against it, and my feelings have not changed. Manuel
>> should
>> > have remembered that.
>>
>> I have remembered that; I also think the situation has changed - the
>> argument that isA would conflict with other names seems not to hold in
>> the light of the matchers reflecting functions that you would use with
>> other matchers - there is a duplication of domain naming inherent in
>> what the matchers are nowadays.
>>
>
I can't really parse this sentence, but I don't think I agree with where
you're going here...

But mostly I'm confused by this: the discussion we had involved more than
the name "isA". It also involved the semantics themselves. So no naming
change alone is sufficient.

>
> Additionally, it is not easy to confuse the two, I think. First of, isA
> lives in clang::ast_matchers. So you would either have to work in that
> namespace or explicitly import it with using directives.
>

Which I expect roughly everyone to do. In fact, even if they don't the
namespaces won't be necessary in most cases except for the
inner-most-matcher.

> Why is the similarity in name (with one letter being different in case)
> desirable for all other AST matchers, but not for isA? Is there an inherent
> difference between isA <-> isa and expr <-> Expr?
>

Yes, a massive one.

The 'isa' template used throughout LLVM and Clang's implementation is an
implementation detail. It part of a more efficient RTTI system built around
the presupposition of total knowledge of all members of the type hierarchy.
See include/llvm/Support/Casting.h for the details here. It has absolutely
nothing to do with the AST, Clang, or the C++ code being parsed. It most
certainly has no relationship with establishing 'isa' relationships between
AST nodes of the parsed program, it is used to establish 'isa'
relationships between Clang's internal type system.

On the other hand, the mirroring between the matchers and the AST nodes are
quite different. The matchers are functions, and the AST nodes are
universally types. The AST node type names would never be written in the
same context as a function realistically, and in fact the AST node names
themselves occur relatively infrequently. Most importantly, you *want* the
cognitive overloading between the two because they represent the same
construct. The matcher form matches the node form. There is no 'isa' node,
and so having an 'isA' matcher just makes no sense at all.

I want to be clear: I view the discussion of whether "isA" is a good name
for a matcher as entirely orthogonal to the rest. I don't think that it is
a good name for any matcher under any circumstances, regardless of the
semantic model we end up with...

Now, on to the semantic debate.

>
>
>> > See Sebastien Redl's comment for why I firmly believe that the current
>> > behavior is the correct behavior: std::is_base_of behaves the exact same
>> > way.
>>
>
> This I did not know. I still don't think isDerivedFrom matching the class
> itself is intuitive at all, but this sheds some doubt on it. I firmly
> believe that neither behavior can be called "correct" as there is no
> definition of what correct is. There is std::is_base_of, but there is also
> what the standard says (e.g. "A class B is a base class of a class D if it
> is a direct base class of D or a direct base class of one of D’s base
> classes") and what people (not essentially working on a C++ compiler)
> intuitively think.
>

I think the exported interface of the standard library, which is clearly
meant to be consumed by a "normal" C++ programmer, not by standards experts
or compiler engineers exclusively, is the most likely to work long term.
These type traits are things that the type of C++ programmer we expect to
use matchers and tools will be intimately familiar with: strong C++
programmers working with large and/or complex libraries. Frankly, the Boost
people, or their counterparts here at Google.

Another perspective is this: I would like to ensure that all of the C++11
standardized type trait predicates (like std::is_base_of, among others)
have analogous matchers because these type traits have already evolved over
a long period of people wanting to know these particular properties of C++
entities. They seems like perfect fodder for predicate matchers, in the
same way that the AST is the perfect fodder for node matchers.

Now, that would argue for us eventually having an "isBaseOf" matcher, which
I certainly hope behaves exactly the same way std::is_base_of behaves. To
do anything else would be hopelessly confusing I think... And the existence
of such a production makes me *really* want isDerviedFrom to be precisely
symmetrical to isBaseOf. Anything else, I again think leads to more
confusion than the initial confusion of "oh, wow, isDerivedFrom matches
more than I expected". The latter is a confusion users will likely hit once
and never again because the consistency of the rule will make it easy to
remember. On the other hand, the confusion that results from divergent
semantics, between standardized predicates and matchers, or between two
sets of matchers, seems more likely to be an ongoing problem.

>
>> > I don't see why "math and bind any pair of base and subclass" is that
>> hard
>> > to write. It only requires a not matcher that tests for type equality.
>> Can
>> > you give an example that is hard to write, and maybe there is a better
>> way
>> > to write it?
>>
>> You cannot test for type equality unless you know the type of what
>> you're looking for.
>>
>
> To give more concrete examples. I would like to match any parent->direct
> base relationship.
>

The first time around, we also came up with piles of examples. Most of the
obvious uses for isDerivedFrom we came up with actually wanted the
same-class-works behavior. When we wrote the GWS refactoring, we found
multiple places within the code where we implemented the exact logic of
isDerived From. I'm not saying your example is bad or invalid, I like it a
lot. I just don't want to neglect the use cases which actually wanted the
old behavior. I think *both* tasks are valid.

> With the new matcher, I could do:
>
>   recordDecl(isDerivedFrom(recordDecl.bind("base"))).bind("derived")
>
> With the old behavior, I can't implement this as part of the matcher
> language. And I think there is a more general case here. Basically, you
> can't match base classes if the derived class might also have the matched
> property.
>

Yes, this is the core problem, and it does indeed lead inevitably to this
conclusion:

> #1 just means we need the two matchers, and I think nobody is against that.
>

Correct, we need two matchers. None of what I have said argues against
supporting the use case you have in mind.

> #2 basically concerns the names of the matchers. I am open to discussion,
> but I quite strongly feel that this is the most intuitive interface.
>

I think there is ample evidence that at least *some* people do not find
this to be intuitive. Lets assume for a moment (and discuss above if there
is serious disagreement) that "isA" is not a viable name for either of
these matchers.

What seems more *consistent* even when it is not intuitive would be:

isDerivedFrom -- works the way std::is_base_of does, includes self.
isStrictlyDerivedFrom -- does not include self. uses the word "strictly" in
the same sense that set arithmetic uses it: isDerivedFrom is a superset
test, and isStrictlyDerivedFrom is a strict superset test.

This seems consistent with the standard library, and any mirroring matchers
we ever get. It seems to use nice distinct names. It seems to be consistent
and reasonably easy to explain. It may be surprising the first time you
read the description of the matcher, but is this likely to cause long-term
confusion?

Also, maybe there are still better names that we should consider. It'd be
worth getting further input from the community there...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120908/37e1c950/attachment.html>