[cfe-commits] [patch] TransformIterator (motivating use case in Clang)

Tue May 15 14:33:58 PDT 2012

On Tue, May 15, 2012 at 2:23 PM, Chandler Carruth <chandlerc at google.com> wrote:
> On Tue, May 15, 2012 at 3:11 PM, Douglas Gregor <dgregor at apple.com> wrote:
>>
>>
>> On May 15, 2012, at 11:57 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>> > [-llvm-commits since it's not too relevant to that list at this point]
>> >
>> > On Tue, May 15, 2012 at 11:24 AM, Douglas Gregor <dgregor at apple.com>
>> > wrote:
>> >>
>> >> On May 15, 2012, at 9:31 AM, David Blaikie <dblaikie at gmail.com> wrote:
>> >>
>> >>> On Tue, May 15, 2012 at 9:19 AM, Douglas Gregor <dgregor at apple.com>
>> >>> wrote:
>> >>>>
>> >>>> On May 6, 2012, at 7:02 PM, David Blaikie wrote:
>> >>>>
>> >>>>> This patch adds an iterator much like boost's transform_iterator
>> >>>>> (without some extra bells & whistles) for use in some changes I'd
>> >>>>> like
>> >>>>> to make to Clang to correct some misimplemented iterators there.
>> >>>>>
>> >>>>> A few gotchas that could be improved/changed depending on opinions:
>> >>>>> * I may be playing a little loose with the return type of the
>> >>>>> functor
>> >>>>> (see the example/motivating functor, to_pointer) - the return type
>> >>>>> actually must be a reference, though the result_type provides the
>> >>>>> underlying value type, not the reference type. If this is violated
>> >>>>> Clang will emit a warning, but I could make it more robust with a
>> >>>>> compile time assertion in the TransformIterator that the result_type
>> >>>>> is actually a reference type, and strip that off to provide the
>> >>>>> value_type of the TransformIterator.
>> >>>>
>> >>>> It's much more correct for the value and reference types of the
>> >>>> iterator type to be, e.g.,
>> >>>>
>> >>>>  typedef typename Functor::result_type reference;
>> >>>>  typedef typename remove_reference<reference>::type value_type;
>> >>>>
>> >>>>> * I realize adding pointer-yness back onto an iterator over
>> >>>>> references
>> >>>>> when the underlying data structure (in the Clang patch you can see
>> >>>>> this situation) is a sequence of pointers may be a bit overkill -
>> >>>>> though the alternatives are writing different algorithms in the two
>> >>>>> cases where I've used TransformIterator, or having the decl_iterator
>> >>>>> iterate over pointers (in which case I should probably change the
>> >>>>> other iterators I've modified to iterate over pointers for
>> >>>>> symmetry).
>> >>>>
>> >>>> diff --git include/clang/AST/DeclBase.h include/clang/AST/DeclBase.h
>> >>>> index 6aef681..0a16ea5 100644
>> >>>> --- include/clang/AST/DeclBase.h
>> >>>> +++ include/clang/AST/DeclBase.h
>> >>>> @@ -1175,17 +1175,18 @@ public:
>> >>>>     Decl *Current;
>> >>>>
>> >>>>   public:
>> >>>> -    typedef Decl*                     value_type;
>> >>>> -    typedef Decl*                     reference;
>> >>>> -    typedef Decl*                     pointer;
>> >>>> +    typedef Decl                     value_type;
>> >>>> +    typedef value_type&              reference;
>> >>>> +    typedef value_type*              pointer;
>> >>>>
>> >>>> Since we tend to traffic in declaration pointers, not references,
>> >>>> it's really beneficial to have value_type be Decl*.
>> >>>
>> >>> Fair enough - I was hoping to reduce the amount of pointers handed
>> >>> around, but I realize in this case especially (where the underlying
>> >>> sequence is a sequence of pointers) it's really an awkward fit.
>> >>
>> >> Why is reducing the number of pointers a specific goal?
>> >
>> > Evidently not something everyone finds to be a useful goal - and on
>> > that basis perhaps I should just leave it alone.
>> >
>> > But for myself I find functions taking references (& locals that are
>> > references) easier to read/reason about because it's clear that
>> > they're not null. Trying to decide which functions are accepting
>> > optional values and which ones aren't makes me pause a little - though
>> > perhaps more out of habit than real curiosity about whether something
>> > is null. If switching things to references causes pointer-habitual
>> > people to have the same kind of pause then that's clearly not an
>> > outright improvement.
>>
>> Unless we're seeing lots of problems due to the pointer-centric style, I
>> don't see a motivation to make this kind of change in the code base.
>
>
> Since there was some mention of wanting other perspectives here, I wanted to
> chime in with my two cents...

Thanks

> This seems like a lot of work and a disruptive change in API patterning, and
> I'm really not clear what problem it's trying to solve. My suspicion is that
> there isn't one other than a preference for references instead of pointers.

Well the underlying problem was the inconsistency of the iterators in
Decl/DeclBase. I wanted to rationalize their value/reference/pointer
types so I could then extract the common filtering functionality out
of filtered/specific_decl_iterator as a proof-of-concept for a general
adapting/filtering iterator device which I could then re-use in my
efforts to improve the CFG fidelity and -Wunreachable-code diagnostic.
This particular path (generalizing the existing filtering iterators to
a reusable adapter before reusing that in the new use case) was
suggested by Ted Kremmenek & I was happy enough to have an excuse to
tidy up some existing code.

Then when I went to extract that filtering functionality I came across
the fact that they weren't quite meeting the iterator requirements
(specifically, value_type, reference, and pointer were all the same
type) & so I set about fixing that. It could go either way - pointer
value types or reference value types. I asked, I waited, and
eventually Richard Smith chimed in to say he was OK with the reference
version.

Yes, my preference is towards the reference version because it seems
to match the semantics (non-null-ness) and usage (iter->x) that's
desired.

> I actually don't even agree with this movement. While yes, it is nice to use
> simpler constructs when there is no possibility of a null pointer,
> references aren't strictly superior. We can't assign to them to re-point
> them to other entities, which is a fundamental pattern in Clang and LLVM. I
> don't see soft "non-null" constraints as a sufficient reason to lose this.

I'm not suggesting losing that flexibility where it's
desired/necessary. Indeed I think that helps readability to know which
things are repointing over some algorithm & which things aren't. It's
certainly not uncommon for algorithms/types to have pointers inside
them that are reassigned to based on some ongoing algorithm or
condition & those would have to remain pointers, certainly.

> What's more, it is trivial to add non-null constraints to pointers. In fact,
> dyn_cast/cast/isa all provide such constraints. We could teach Clang about
> [[nonnull]] for parameters and get much more than a reference gives us:
>
> - Optimizing based on non-null-ness

Clang doesn't optimize references on the basis of non-null-ness? For
those things that need to be reassigned, but are guaranteed to never
be null (I'm not sure we have (m)any of those - mostly if we're
reassigning them, they start out/may be null for some time)

> - Checking in non-optimized builds for non-null-ness

Granted, adding this to reference usage might be a bit harder -
knowing where checks are beneficial & where they aren't, but I imagine
if we went all out with non-null annotations we might have some
similar perf problems anyway.

> - Still able to overwrite the argument within the funciton

That usually seems to be a source of errors (as we've seen with people
thinking they have an out parameter & assigning to the parameter
directly, rather than the dereferenced parameter). Are there many
deliberate uses of this in Clang?

- David