[cfe-commits] [PATCH] Change counter-intuitive behavior of isDerivedFrom

Sat Sep 8 10:44:23 PDT 2012

On Sat, Sep 8, 2012 at 11:09 AM, Chandler Carruth <chandlerc at google.com>wrote:

> On Sat, Sep 8, 2012 at 1:37 AM, Daniel Jasper <djasper at google.com> wrote:
>
>> On Fri, Sep 7, 2012 at 10:18 PM, Manuel Klimek <klimek at google.com> wrote:
>>
>>> On Fri, Sep 7, 2012 at 9:55 PM, Chandler Carruth <chandlerc at google.com>
>>> wrote:
>>> > On Fri, Sep 7, 2012 at 8:06 AM, Daniel Jasper
>>> > <reviews at llvm-reviews.chandlerc.com> wrote:
>>> >>
>>> >> Hi klimek,
>>> >>
>>> >> Change the behavior of the isDerivedFrom-matcher to not match on the
>>> class
>>> >> itself. This caused some confusion (intuitively, a class is not
>>> derived from
>>> >> itself) and makes it hard to write certain matchers, e.g. "match and
>>> bind
>>> >> any pair of base and subclass".
>>> >>
>>> >> The original behavior can be achieved with a new isA-matcher.
>>>  Similar to
>>> >> all other matchers, this matcher has the same behavior and name as the
>>> >> corresponding AST-entity - in this case the isa<>() function.
>>> >
>>> >
>>> > This is not a new idea. In fact, when we first built the isDerivedFrom
>>> > matcher over a year ago Manuel and others argued for what you propose.
>>> I
>>> > argued firmly against it, and my feelings have not changed. Manuel
>>> should
>>> > have remembered that.
>>>
>>> I have remembered that; I also think the situation has changed - the
>>> argument that isA would conflict with other names seems not to hold in
>>> the light of the matchers reflecting functions that you would use with
>>> other matchers - there is a duplication of domain naming inherent in
>>> what the matchers are nowadays.
>>>
>>
> I can't really parse this sentence, but I don't think I agree with where
> you're going here...
>
> But mostly I'm confused by this: the discussion we had involved more than
> the name "isA". It also involved the semantics themselves. So no naming
> change alone is sufficient.
>
>
>>
>> Additionally, it is not easy to confuse the two, I think. First of, isA
>> lives in clang::ast_matchers. So you would either have to work in that
>> namespace or explicitly import it with using directives.
>>
>
> Which I expect roughly everyone to do. In fact, even if they don't the
> namespaces won't be necessary in most cases except for the
> inner-most-matcher.
>
>
>> Why is the similarity in name (with one letter being different in case)
>> desirable for all other AST matchers, but not for isA? Is there an inherent
>> difference between isA <-> isa and expr <-> Expr?
>>
>
> Yes, a massive one.
>
> The 'isa' template used throughout LLVM and Clang's implementation is an
> implementation detail. It part of a more efficient RTTI system built around
> the presupposition of total knowledge of all members of the type hierarchy.
> See include/llvm/Support/Casting.h for the details here. It has absolutely
> nothing to do with the AST, Clang, or the C++ code being parsed. It most
> certainly has no relationship with establishing 'isa' relationships between
> AST nodes of the parsed program, it is used to establish 'isa'
> relationships between Clang's internal type system.
>
> On the other hand, the mirroring between the matchers and the AST nodes
> are quite different. The matchers are functions, and the AST nodes are
> universally types. The AST node type names would never be written in the
> same context as a function realistically, and in fact the AST node names
> themselves occur relatively infrequently. Most importantly, you *want* the
> cognitive overloading between the two because they represent the same
> construct. The matcher form matches the node form. There is no 'isa' node,
> and so having an 'isA' matcher just makes no sense at all.
>
> I want to be clear: I view the discussion of whether "isA" is a good name
> for a matcher as entirely orthogonal to the rest. I don't think that it is
> a good name for any matcher under any circumstances, regardless of the
> semantic model we end up with...
>
> Now, on to the semantic debate.
>
>
>>
>>
>>> > See Sebastien Redl's comment for why I firmly believe that the current
>>> > behavior is the correct behavior: std::is_base_of behaves the exact
>>> same
>>> > way.
>>>
>>
>> This I did not know. I still don't think isDerivedFrom matching the class
>> itself is intuitive at all, but this sheds some doubt on it. I firmly
>> believe that neither behavior can be called "correct" as there is no
>> definition of what correct is. There is std::is_base_of, but there is also
>> what the standard says (e.g. "A class B is a base class of a class D if it
>> is a direct base class of D or a direct base class of one of D’s base
>> classes") and what people (not essentially working on a C++ compiler)
>> intuitively think.
>>
>
> I think the exported interface of the standard library, which is clearly
> meant to be consumed by a "normal" C++ programmer, not by standards experts
> or compiler engineers exclusively, is the most likely to work long term.
> These type traits are things that the type of C++ programmer we expect to
> use matchers and tools will be intimately familiar with: strong C++
> programmers working with large and/or complex libraries. Frankly, the Boost
> people, or their counterparts here at Google.
>
> Another perspective is this: I would like to ensure that all of the C++11
> standardized type trait predicates (like std::is_base_of, among others)
> have analogous matchers because these type traits have already evolved over
> a long period of people wanting to know these particular properties of C++
> entities. They seems like perfect fodder for predicate matchers, in the
> same way that the AST is the perfect fodder for node matchers.
>
> Now, that would argue for us eventually having an "isBaseOf" matcher,
> which I certainly hope behaves exactly the same way std::is_base_of
> behaves. To do anything else would be hopelessly confusing I think... And
> the existence of such a production makes me *really* want isDerviedFrom to
> be precisely symmetrical to isBaseOf. Anything else, I again think leads to
> more confusion than the initial confusion of "oh, wow, isDerivedFrom
> matches more than I expected". The latter is a confusion users will likely
> hit once and never again because the consistency of the rule will make it
> easy to remember. On the other hand, the confusion that results from
> divergent semantics, between standardized predicates and matchers, or
> between two sets of matchers, seems more likely to be an ongoing problem.
>
>
>>
>>> > I don't see why "math and bind any pair of base and subclass" is that
>>> hard
>>> > to write. It only requires a not matcher that tests for type equality.
>>> Can
>>> > you give an example that is hard to write, and maybe there is a better
>>> way
>>> > to write it?
>>>
>>> You cannot test for type equality unless you know the type of what
>>> you're looking for.
>>>
>>
>> To give more concrete examples. I would like to match any parent->direct
>> base relationship.
>>
>
> The first time around, we also came up with piles of examples. Most of the
> obvious uses for isDerivedFrom we came up with actually wanted the
> same-class-works behavior. When we wrote the GWS refactoring, we found
> multiple places within the code where we implemented the exact logic of
> isDerived From. I'm not saying your example is bad or invalid, I like it a
> lot. I just don't want to neglect the use cases which actually wanted the
> old behavior. I think *both* tasks are valid.
>
>
>> With the new matcher, I could do:
>>
>>   recordDecl(isDerivedFrom(recordDecl.bind("base"))).bind("derived")
>>
>> With the old behavior, I can't implement this as part of the matcher
>> language. And I think there is a more general case here. Basically, you
>> can't match base classes if the derived class might also have the matched
>> property.
>>
>
> Yes, this is the core problem, and it does indeed lead inevitably to this
> conclusion:
>
>
>> #1 just means we need the two matchers, and I think nobody is against
>> that.
>>
>
> Correct, we need two matchers. None of what I have said argues against
> supporting the use case you have in mind.
>
>
>> #2 basically concerns the names of the matchers. I am open to discussion,
>> but I quite strongly feel that this is the most intuitive interface.
>>
>
> I think there is ample evidence that at least *some* people do not find
> this to be intuitive. Lets assume for a moment (and discuss above if there
> is serious disagreement) that "isA" is not a viable name for either of
> these matchers.
>
> What seems more *consistent* even when it is not intuitive would be:
>
> isDerivedFrom -- works the way std::is_base_of does, includes self.
> isStrictlyDerivedFrom -- does not include self. uses the word "strictly"
> in the same sense that set arithmetic uses it: isDerivedFrom is a superset
> test, and isStrictlyDerivedFrom is a strict superset test.
>
> This seems consistent with the standard library, and any mirroring
> matchers we ever get. It seems to use nice distinct names. It seems to be
> consistent and reasonably easy to explain. It may be surprising the first
> time you read the description of the matcher, but is this likely to cause
> long-term confusion?
>
> Also, maybe there are still better names that we should consider. It'd be
> worth getting further input from the community there...
>

Ok, so there are two arguments:
1) We don't want to reuse "isa".
2) The "intuitive" behavior of "isDerivedFrom" is debatable.

As for #1, I don't see it as such a big problem, as I don't see users
confusing them. However, you are much more experienced within the clang
sources and I am happy to accept your judgement. However, we first need to
solve #2, because that decides whether we need to find an alternative name
for isA or for the new isDerivedFrom.

As for #2, I think the old behavior of isDerivedFrom is not intuitive from
the natural language point of view and we sort of try to make
matcher-constructs form natural sentences. However, I see the argument that
we should follow the established behavior of is_base_of. On the other hand:
If you go back to one of the examples where you used isDerivedFrom with its
old behavior, does the matcher really form an intuitively understandable
sentence? I tried to find good examples, but couldn't. I think in those
cases you actually don't care about class inheritance, you just want to
find all classes that "are"/"can act as" a certain class.

So, we need to decide #2 and only after that we need to argue about names.
I can see both sides of the argument and am torn. I still lean a bit
towards the without-itself-behavior.

Regarding your naming suggestions:
I still think this naming will give us some headaches. Lets assume we will
one day need more matchers (experience suggests we will ;-)). Lets assume
we need a matcher to only match the direct base class. Both natural
language and the terms used in the standard suggest that this should be
called "isDirectlyDerivedFrom" or "isDirectBaseOf". Then we would have
isDirectlyDerivedFrom, isStrictlyDerivedFrom and isDerivedFrom. They have a
valid superset relationship (isDirectlyDerivedFrom < isStrictlyDerivedFrom
< isDerivedFrom), but e.g. isDirectlyDerivedFrom and isStrictlyDerivedFrom
would be easily confused. I actually think isDerivedFromButNot might be an
ugly but less confusing alternative.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120908/a417891e/attachment.html>