[llvm-dev] RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.

Mon Nov 16 18:44:50 PST 2020

On Mon, Nov 16, 2020 at 6:14 PM Sean Silva <chisophugis at gmail.com> wrote:

>
>
> On Mon, Nov 16, 2020 at 4:48 PM Mehdi AMINI <joker.eph at gmail.com> wrote:
>
>>
>>
>> On Mon, Nov 16, 2020 at 4:10 PM David Blaikie <dblaikie at gmail.com> wrote:
>>
>>> On Mon, Nov 16, 2020 at 2:44 PM Mehdi AMINI <joker.eph at gmail.com> wrote:
>>> >
>>> >
>>> >
>>> > On Mon, Nov 16, 2020 at 2:12 PM David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>> >>
>>> >> On Mon, Nov 16, 2020 at 1:55 PM Mehdi AMINI <joker.eph at gmail.com>
>>> wrote:
>>> >> > On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>> >> >>
>>> >> >> I will say I'm not a huge fan of adding even more names for things
>>> in
>>> >> >> this fairly core/common use case (now we'll have even more vector
>>> >> >> names to pick from) - can we use default template arguments so we
>>> can
>>> >> >> write SmallVector<T> instead of SmallVector<T, N> and would that
>>> >> >> address some of the use cases proposed here?
>>> >> >
>>> >> >
>>> >> > I won't claim it is perfect, but the added names are a compromise
>>> over rounds of reviews with the folks in the revision. In particular I'm
>>> quite concerned that a default value for N on the SmallVector does not
>>> carry the intent the same way, and is too easy to miss in review (or while
>>> reading code).
>>> >>
>>> >> I'm not quite following here - what sort of problems do you anticipate
>>> >> by readers missing the default value for N?
>>> >
>>> >
>>> > Reading code `SmallVector<SomeType>` does not express that it is
>>> *intentional* to leave of the value for N. It can easily be just forgotten,
>>> and it could easily be implicitly `0`, and as a reviewer (or code reader
>>> later) I don't know if this is was intentional or not. This is why I am
>>> quite opposed to "loosen" the current requirement on SmallVector: a
>>> different name for a different intent is better fitting to me.
>>>
>>> I don't know why it being implicitly zero is noteworthy - anymore than
>>> it being implicitly 1 or 3, etc.
>>>
>>> As for whether it's intentional: I think if we're moving in this
>>> direction that's proposed, the idea is that by default we don't want
>>> to be explicit about the N - so in general we won't be. And sometimes
>>> someone will want to be explicit and add an N, but otherwise the
>>> reasonable default is not to have an N. I think the intent is clear
>>> enough - consider other default template parameters like customized
>>> deleters on a unique_ptr: I don't wonder if someone intended to have a
>>> customized deleter on every unique_ptr, one assumes that wasn't
>>> required/intended unless it's added. It seems like the idea is for
>>> non-explicit N to be a safe/common default, and explicit N to be
>>> noteworthy/require some justification or scrutiny.
>>>
>>
>> I get your point, but I don't share the conclusion here: I don't believe
>> that universally not specifying the N on non-trivial types is a good
>> default. It only makes sense to me for small types, hence it is error prone.
>>
>>
>>
>>>
>>> >> >  To me the drawbacks are outweighing the benefits too much.
>>> >> > Also, `SmallVector<LargeType>` would end-up with N==0 implicitly,
>>> without an easy way to figure it out that there is no actual inline storage
>>> while reading the code.
>>> >>
>>> >> Why would it be necessary to figure that out? If we generally feel the
>>> >> right default inline storage happens to be zero in that case and that
>>> >> SmallVector will pick a good default (including possibly 0) that seems
>>> >> like a good thing to me. (why would we call out zero specially, if we
>>> >> want to avoid calling out other values explicitly)
>>> >
>>> >
>>> > I think this is an API that is "easy to misuse", I don't see any
>>> advantage to it while it has drawbacks: why would we do that?
>>>
>>> What kind of misuse do you have in mind?
>>
>>
>> It does not convey the intent, it is too easy to misuse: i.e. you expect
>> a small storage but you don't have one. This is enough to me to outweigh
>> the benefits of the direction and prefer the status quo.
>>
>>
>>
>>> To me it seems like it would
>>> be consistent with the idea that we have built rules about what good
>>> default inline sizes are - and there's no need for us to think about
>>> it on every use, we just let the default do what it's meant to do.
>>>
>>> >> > An alternative was to reserve the default to only "small object" so
>>> that N isn't zero, but there isn't a platform independent way of doing that
>>> and keep the code portable I believe. So `SVec<T>` is really saying: "I am
>>> willing to pay for some limite inline storage if possible but I don't have
>>> a N in mind".
>>> >>
>>> >> That quoted statement still sounds like it could encapsulate the
>>> >> possibility of 0 too, though. "limited inline storage if possible"
>>> >> could be "and the answer to that constraint is zero in this case - but
>>> >> if you happen to make the T smaller, it could become non-zero
>>> >> organically/without the need to touch this code" (as the N could vary
>>> >> organically between non-zero values as struct sizes change too)
>>> >
>>> > Yes the quoted statement says exactly that!
>>>
>>> Sorry, it seems we're jumbling up the two issues: Whether the type
>>> name should be different when asking for implicit default inline
>>> storage length and, separately, whether explicit zero length storage
>>> should use a different name. The discussion above, I thought, was
>>> about the latter, not the former - but your response seeems to be
>>> about the former rather than the latter.
>>
>>
>> You're answering my quote about SVec, which is the type that is about
>> implicit N, Vec (no S!) is about the N=0 case. The quote is also about the
>> inline storage implicit N, it isn't clear to me how you link this to the 0
>> case (which does not have inline storage)?
>>
>>
>>
>>>
>>
>>
>>> Two separate discussions/threads/patches may help keep the
>>> communication more clear.
>>>
>>> >  This is why I don't like having this behavior on SmallVector: this is
>>> a different semantics / intent that the programmers have and this is
>>> something that I like being able to read differently.
>>> >  SVec does not accept a `N`: if I read SVec<Type> there can't be a
>>> possible accidental omission of N here. It has to be a choice between
>>> `SmallVector<SomeType, 4>` and `SVec<SomeType>` but not
>>> `SmallVector<SomeType>`.
>>>
>>> To discuss this issue of whether accepting a default size versus
>>> having an explicit size is a different semantic - as above, I think
>>> it's worth considering/comparing this to other templates and their
>>> default template arguments. Things like std::vector's customizable
>>> allocators (you. don't have to use a different name for the container
>>> when you specify a custom allocation scheme for std::vector - but we
>>> don't wonder every time we see std::vector<T> whether the user meant
>>> to customize the allocator - we accept the default is usualy intended
>>> (as the default buffer size would be usually intended) unless it's
>>> specified - if during review we think a custom allocator (or
>>> non-default buffer size) might be merited, we could ask about it - we
>>> probably would even if the user had written SVec<T> the same way we
>>> ask about other data structures today "did you mean SmallVector<T,
>>> 3>"?), unique_ptr's custom deleters, etc.
>>>
>>
>> I don't necessarily connect with the custom allocator / custom delete
>> analogy right now, they seem too different to me.
>> I'd look instead at SmallDenseMap and DenseMap maybe, or `std::map` and
>> `std::unordered_map` which have different class names and aren't just
>> differentiated with a trait passed as template argument.
>>
>>
>>>
>>> >> > Finally the simple `llvm::Vector` case to replace `SmallVector<T,
>>> 0>` is because it is frequently preferable to `std::vector` but still isn't
>>> readable or immediately intuitive and so is rarely used in practice (see
>>> https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h
>>> for the documented points on N=0).
>>> >>
>>> >> I'm not sure llvm::Vector/Vec would necessarily make it more often
>>> >> used in practice. Maybe a bunch of cleanup and more specific wording
>>> >> in the style guide that would ban std::vector might help more there.
>>> >>
>>> >> Though I agree that it may be suitable to have a clear name for
>>> >> SmallVector<T, 0> since the "Small" is slightly misleading (though not
>>> >> entirely - it's important for the "SmallVectorImpl" benefits of
>>> >> SmallVector - so renaming it to Vector and still using it via
>>> >> SmallVectorImpl<T>& might also be confusing).
>>> >>
>>> >> >> Got any sense of the total value here? Major savings to be had?
>>> >> >>
>>> >> >> (if SmallVector<T> can do what your proposed SVec<T> does, that
>>> leaves
>>> >> >> the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
>>> >> >> over std::vector<T>? I guess the SmallVectorImpl generic algorithm
>>> >> >> opportunities? Though that's rarely needed compared to ArrayRef.)
>>> >> >> If SmallVector<T> would suffice then maybe Vec<T> could be
>>> >> >> ZeroSmallVector<T>? Not sure.
>>> >> >
>>> >> >
>>> >> > ZeroSmallVector<T> does not really address your "more vector names
>>> to pick from" concerns,
>>> >>
>>> >> Somewhat - if SmallVector<T> can be used instead of SVec<T> then we
>>> >> have one fewer name and less convention to base the Vec<T> on (since
>>> >> it won't have the SVec<T> sibling), so picking a name closer to the
>>> >> existing SmallVector might be more viable/helpful.
>>> >>
>>> >> > and it is longer than `SmallVector<T, 0>`: shouldn't we aim for the
>>> "default case" to be the easiest to reach / most intuitive to pick?
>>> `llvm::Vec` looks like "just a vector".
>>> >>
>>> >> Somewhat - but Vec is an abbreviation that isn't really used otherwise
>>> >> (if we consider SVec as well, I'd still push back on the introduction
>>> >> of the abbreviation for both cases)
>>> >
>>> > One aspect of the naming is to avoid one not being a prefix of the
>>> other (IDE completion, etc.) or them being too close to differentiate.
>>> >
>>> >> and llvm::Vector would be only a
>>> >> case separation away form a standard name (when considering the
>>> >> unqualified name - which is likely to come up a fair bit, as we'll see
>>> >> "Vector" used unqualified a lot).
>>> >
>>> > Note: this is why we converged with llvm::Vec and not llvm::Vector in
>>> the revision
>>>
>>> Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity
>>> with standard names, etc). Perhaps a bit of a discussion of what other
>>> libraries do around this (no doubt there are a bunch of C++ libraries
>>> that have "here's the default vector you shuold use instead of
>>> std::vector for these reasons" situations).
>>>
>>
>> I had looked into this when the revision was sent, but I couldn't find a
>> library with a "vector with inlined storage" and a default N value.
>> Both Abseil and Folly have a SmallVector equivalent, Abseil does not have
>> a default:
>> https://github.com/abseil/abseil-cpp/blob/master/absl/container/inlined_vector.h#L69-L71
>> Folly has a default hard-coded to 1:
>> https://github.com/facebook/folly/blob/master/folly/small_vector.h#L430
>> Folly has also another class, FBVector, which is really an alternative to
>> std::vector with a different memory management strategy (growth factor,
>> etc.). I don't think Abseil has anything else.
>>
>> Do you have other ideas of related work to look for?
>>
>
> Actually, the reason I originally pursued this was that I used a
> SmallDenseMap and happily found that it had a default number of inline
> elements:
> https://github.com/llvm/llvm-project/blob/69cd776e1ee79e72ccbdad30749eac04579715ee/llvm/include/llvm/ADT/DenseMap.h#L877
> I thought "why not SmallVector, since I use that a lot more often"? But
> based on historical misuse of SmallVector, using a hardcoded number like
> "4" (like SmallDenseMap does) as the default doesn't make much sense, so I
> pursued a bounded-sizeof approach.
>
>
>>
>>
>>
>>>
>>> >> I wouldn't entirely object to SmallVector<T> getting a smart default
>>> >> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
>>> >> don't feel /super/ great about it, but yeah, if we're going to push
>>> >> the Programmer's Manual encouragement further in terms of
>>> >> reducing/removing use of std::vector, then maybe llvm::Vector isn't
>>> >> the worst way to do that.
>>> >>
>>> >> (it might be that this would benefit from being two separate
>>> >> discussions, though - one on how to provide smart defaults for
>>> >> SmallVector<T> and one on if/how to push std::vector deprecation in
>>> >> favor of SmallVector<T, 0> (semantically speaking - no matter how it's
>>> >> spelled) further along)
>>> >
>>> >
>>> > For historical purpose: the review was actually originally only adding
>>> a default for SmallVector and nothing else :)
>>> > We only converged to the current proposal after a few rounds of
>>> reviews trying to tradeoff amongst folks in the review.
>>>
>>> Perhaps you could summarize some of those discussions/decisions in
>>> more detail here? As the RFC stands, these are my comments - I know
>>> it's always a tradeoff of which audiences are brought in at what
>>> stages (though often folks send ADT/Support changes my way during
>>> review - I should probably just set up a Herald rule for these
>>> things).
>>>
>>> Sounds like maybe I am in agreement with the original version of the
>>> review, then.
>>>
>>
>> Likely: the review evolved because I opposed to changing the status quo
>> in this direction I guess.
>>
>
> It sounds like there is pretty good consensus here to just let
> `SmallVector<T>` "just work" and choose a default N. The one exception
> seems to be that Mehdi has a strong objection having the default N possibly
> be 0 (to ensure the bounded-sizeof property). So let's shift the discussion
> with a laser focus to specifically that point.
>
> Responding to the arguments I've seen presented so far:
> 1. I'm not convinced that SmallVector<T> with defaulted N==0 is really
> error prone. Mehdi, I like your definition of "I am willing to pay for some
> limite inline storage if possible but I don't have a N in mind" and I think
> we can apply it to SmallVector<T>; I don't think adding SVec<T> is needed
> for that.
>

I understand your point, I still strongly disagree for the reason
previously discussed: in particular I don't understand why applying it to
SmallVector would be better than what we converged on with SVec.

> 2. I don't understand the concern about "forgetting the N". In the "new
> world" of `SmallVector<T>`, any `SmallVector<T, N>` in new code would be
> treated as "I profiled this code and this explicit N is the right choice".
> That is, adding an explicit `N` is going to be rare and deliberate
> (possibly with reviewers insisting on a comment!) -- this is not something
> that one can "forget".
>

> For point 2., I think we will have some transition period where reviewers
> will organically start to propagate the new advice (and patch authors
> enjoying not having to write N), but I think the direction is effectively
> what I stated.
>
> -- Sean Silva
>
>
>>
>> Best,
>>
>> --
>> Mehdi
>>
>>
>>
>>
>>>
>>> - Dave
>>>
>>>
>>> >> >
>>> >> > Cheers,
>>> >> >
>>> >> > --
>>> >> > Mehdi
>>> >> >
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> On Fri, Nov 13, 2020 at 2:06 PM Sean Silva via llvm-dev
>>> >> >> <llvm-dev at lists.llvm.org> wrote:
>>> >> >> >
>>> >> >> > We've pretty happy now with a patch that adds two wrappers
>>> around SmallVector that make it 1) more convenient to use and 2) will tend
>>> to mitigate misuse of SmallVector. We think it's ready for wider
>>> discussion: https://reviews.llvm.org/D90884
>>> >> >> >
>>> >> >> > SVec<T> is a convenience alias for SmallVector<T, N> with N
>>> chosen automatically to keep its size under 64 Bytes (that heuristic is
>>> easy to change though). The recommendation in the patch is to use this "on
>>> the stack, where a "small" number of elements are expected".
>>> >> >> >
>>> >> >> > Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us
>>> get the (little-known?) benefits of SmallVector even when it has no inline
>>> elements (see
>>> https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h).
>>> The recommendation in the patch is to use this when the SmallVector is on
>>> the heap.
>>> >> >> >
>>> >> >> > A lot of this is boiled out from the discussion in
>>> https://groups.google.com/g/llvm-dev/c/q1OyHZy8KVc/m/1l_AasOLBAAJ?pli=1
>>> >> >> >
>>> >> >> > The goals here are twofold:
>>> >> >> >
>>> >> >> > 1. convenience: not having to read/write "N", or do an extra
>>> edit/recompile cycle if you forgot it
>>> >> >> >
>>> >> >> > 2. avoiding pathological cases: The choice of N is usually
>>> semi-arbitrary in our experience, and if one isn't careful, can result in
>>> sizeof(SmallVector) becoming huge, especially in the case of nested
>>> SmallVectors. This patch avoids pathological cases in two ways:
>>> >> >> >   A. SVec<T>'s heuristic keeps sizeof(SVec<T>) bounded, which
>>> prevents pathological size amplifications like in `SmallVector<struct
>>> {SmallVector<T, 4> a, b; }, 4>`, where the small sizes effectively multiply
>>> together. Of course, users can always write SmallVector<T, N> explicitly to
>>> bypass this, but the need for that seems rare.
>>> >> >> >   B. SmallVector<T, 0> feels "weird to write" for most folks,
>>> even though it is frequently the right choice. Vec<T> mitigates that by
>>> "looking natural".
>>> >> >> >
>>> >> >> > I'm surfacing this as an RFC to get feedback on a couple
>>> higher-level points:
>>> >> >> > - does everybody agree that SVec<T> and Vec<T> are useful to
>>> have?
>>> >> >> > - get wider consensus around suggesting these as "defaults" (see
>>> my updates to ProgrammersManual.rst in the patch)
>>> >> >> > - how much we want to bulk migrate code vs let it organically
>>> grow. Replacing SmallVector<T, 0> with Vec<T> should be completely
>>> mechanical. Replacing SmallVector<T, N> for general N would be a lot more
>>> work.
>>> >> >> > - of course: naming. SVector/Vector were floated in the patch as
>>> well and seem ok. SmallVec was rejected as it was a prefix of SmallVector
>>> (messes up autocomplete).
>>> >> >> >
>>> >> >> > Looking forward to a world with fewer guessed SmallVector sizes,
>>> >> >> >
>>> >> >> > -- Sean Silva
>>> >> >> > _______________________________________________
>>> >> >> > LLVM Developers mailing list
>>> >> >> > llvm-dev at lists.llvm.org
>>> >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201116/3b93a2ad/attachment.html>