[llvm-dev] RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.

Sean Silva via llvm-dev llvm-dev at lists.llvm.org
Mon Nov 16 18:14:18 PST 2020


On Mon, Nov 16, 2020 at 4:48 PM Mehdi AMINI <joker.eph at gmail.com> wrote:

>
>
> On Mon, Nov 16, 2020 at 4:10 PM David Blaikie <dblaikie at gmail.com> wrote:
>
>> On Mon, Nov 16, 2020 at 2:44 PM Mehdi AMINI <joker.eph at gmail.com> wrote:
>> >
>> >
>> >
>> > On Mon, Nov 16, 2020 at 2:12 PM David Blaikie <dblaikie at gmail.com>
>> wrote:
>> >>
>> >> On Mon, Nov 16, 2020 at 1:55 PM Mehdi AMINI <joker.eph at gmail.com>
>> wrote:
>> >> > On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dblaikie at gmail.com>
>> wrote:
>> >> >>
>> >> >> I will say I'm not a huge fan of adding even more names for things
>> in
>> >> >> this fairly core/common use case (now we'll have even more vector
>> >> >> names to pick from) - can we use default template arguments so we
>> can
>> >> >> write SmallVector<T> instead of SmallVector<T, N> and would that
>> >> >> address some of the use cases proposed here?
>> >> >
>> >> >
>> >> > I won't claim it is perfect, but the added names are a compromise
>> over rounds of reviews with the folks in the revision. In particular I'm
>> quite concerned that a default value for N on the SmallVector does not
>> carry the intent the same way, and is too easy to miss in review (or while
>> reading code).
>> >>
>> >> I'm not quite following here - what sort of problems do you anticipate
>> >> by readers missing the default value for N?
>> >
>> >
>> > Reading code `SmallVector<SomeType>` does not express that it is
>> *intentional* to leave of the value for N. It can easily be just forgotten,
>> and it could easily be implicitly `0`, and as a reviewer (or code reader
>> later) I don't know if this is was intentional or not. This is why I am
>> quite opposed to "loosen" the current requirement on SmallVector: a
>> different name for a different intent is better fitting to me.
>>
>> I don't know why it being implicitly zero is noteworthy - anymore than
>> it being implicitly 1 or 3, etc.
>>
>> As for whether it's intentional: I think if we're moving in this
>> direction that's proposed, the idea is that by default we don't want
>> to be explicit about the N - so in general we won't be. And sometimes
>> someone will want to be explicit and add an N, but otherwise the
>> reasonable default is not to have an N. I think the intent is clear
>> enough - consider other default template parameters like customized
>> deleters on a unique_ptr: I don't wonder if someone intended to have a
>> customized deleter on every unique_ptr, one assumes that wasn't
>> required/intended unless it's added. It seems like the idea is for
>> non-explicit N to be a safe/common default, and explicit N to be
>> noteworthy/require some justification or scrutiny.
>>
>
> I get your point, but I don't share the conclusion here: I don't believe
> that universally not specifying the N on non-trivial types is a good
> default. It only makes sense to me for small types, hence it is error prone.
>
>
>
>>
>> >> >  To me the drawbacks are outweighing the benefits too much.
>> >> > Also, `SmallVector<LargeType>` would end-up with N==0 implicitly,
>> without an easy way to figure it out that there is no actual inline storage
>> while reading the code.
>> >>
>> >> Why would it be necessary to figure that out? If we generally feel the
>> >> right default inline storage happens to be zero in that case and that
>> >> SmallVector will pick a good default (including possibly 0) that seems
>> >> like a good thing to me. (why would we call out zero specially, if we
>> >> want to avoid calling out other values explicitly)
>> >
>> >
>> > I think this is an API that is "easy to misuse", I don't see any
>> advantage to it while it has drawbacks: why would we do that?
>>
>> What kind of misuse do you have in mind?
>
>
> It does not convey the intent, it is too easy to misuse: i.e. you expect a
> small storage but you don't have one. This is enough to me to outweigh the
> benefits of the direction and prefer the status quo.
>
>
>
>> To me it seems like it would
>> be consistent with the idea that we have built rules about what good
>> default inline sizes are - and there's no need for us to think about
>> it on every use, we just let the default do what it's meant to do.
>>
>> >> > An alternative was to reserve the default to only "small object" so
>> that N isn't zero, but there isn't a platform independent way of doing that
>> and keep the code portable I believe. So `SVec<T>` is really saying: "I am
>> willing to pay for some limite inline storage if possible but I don't have
>> a N in mind".
>> >>
>> >> That quoted statement still sounds like it could encapsulate the
>> >> possibility of 0 too, though. "limited inline storage if possible"
>> >> could be "and the answer to that constraint is zero in this case - but
>> >> if you happen to make the T smaller, it could become non-zero
>> >> organically/without the need to touch this code" (as the N could vary
>> >> organically between non-zero values as struct sizes change too)
>> >
>> > Yes the quoted statement says exactly that!
>>
>> Sorry, it seems we're jumbling up the two issues: Whether the type
>> name should be different when asking for implicit default inline
>> storage length and, separately, whether explicit zero length storage
>> should use a different name. The discussion above, I thought, was
>> about the latter, not the former - but your response seeems to be
>> about the former rather than the latter.
>
>
> You're answering my quote about SVec, which is the type that is about
> implicit N, Vec (no S!) is about the N=0 case. The quote is also about the
> inline storage implicit N, it isn't clear to me how you link this to the 0
> case (which does not have inline storage)?
>
>
>
>>
>
>
>> Two separate discussions/threads/patches may help keep the
>> communication more clear.
>>
>> >  This is why I don't like having this behavior on SmallVector: this is
>> a different semantics / intent that the programmers have and this is
>> something that I like being able to read differently.
>> >  SVec does not accept a `N`: if I read SVec<Type> there can't be a
>> possible accidental omission of N here. It has to be a choice between
>> `SmallVector<SomeType, 4>` and `SVec<SomeType>` but not
>> `SmallVector<SomeType>`.
>>
>> To discuss this issue of whether accepting a default size versus
>> having an explicit size is a different semantic - as above, I think
>> it's worth considering/comparing this to other templates and their
>> default template arguments. Things like std::vector's customizable
>> allocators (you. don't have to use a different name for the container
>> when you specify a custom allocation scheme for std::vector - but we
>> don't wonder every time we see std::vector<T> whether the user meant
>> to customize the allocator - we accept the default is usualy intended
>> (as the default buffer size would be usually intended) unless it's
>> specified - if during review we think a custom allocator (or
>> non-default buffer size) might be merited, we could ask about it - we
>> probably would even if the user had written SVec<T> the same way we
>> ask about other data structures today "did you mean SmallVector<T,
>> 3>"?), unique_ptr's custom deleters, etc.
>>
>
> I don't necessarily connect with the custom allocator / custom delete
> analogy right now, they seem too different to me.
> I'd look instead at SmallDenseMap and DenseMap maybe, or `std::map` and
> `std::unordered_map` which have different class names and aren't just
> differentiated with a trait passed as template argument.
>
>
>>
>> >> > Finally the simple `llvm::Vector` case to replace `SmallVector<T,
>> 0>` is because it is frequently preferable to `std::vector` but still isn't
>> readable or immediately intuitive and so is rarely used in practice (see
>> https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h
>> for the documented points on N=0).
>> >>
>> >> I'm not sure llvm::Vector/Vec would necessarily make it more often
>> >> used in practice. Maybe a bunch of cleanup and more specific wording
>> >> in the style guide that would ban std::vector might help more there.
>> >>
>> >> Though I agree that it may be suitable to have a clear name for
>> >> SmallVector<T, 0> since the "Small" is slightly misleading (though not
>> >> entirely - it's important for the "SmallVectorImpl" benefits of
>> >> SmallVector - so renaming it to Vector and still using it via
>> >> SmallVectorImpl<T>& might also be confusing).
>> >>
>> >> >> Got any sense of the total value here? Major savings to be had?
>> >> >>
>> >> >> (if SmallVector<T> can do what your proposed SVec<T> does, that
>> leaves
>> >> >> the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
>> >> >> over std::vector<T>? I guess the SmallVectorImpl generic algorithm
>> >> >> opportunities? Though that's rarely needed compared to ArrayRef.)
>> >> >> If SmallVector<T> would suffice then maybe Vec<T> could be
>> >> >> ZeroSmallVector<T>? Not sure.
>> >> >
>> >> >
>> >> > ZeroSmallVector<T> does not really address your "more vector names
>> to pick from" concerns,
>> >>
>> >> Somewhat - if SmallVector<T> can be used instead of SVec<T> then we
>> >> have one fewer name and less convention to base the Vec<T> on (since
>> >> it won't have the SVec<T> sibling), so picking a name closer to the
>> >> existing SmallVector might be more viable/helpful.
>> >>
>> >> > and it is longer than `SmallVector<T, 0>`: shouldn't we aim for the
>> "default case" to be the easiest to reach / most intuitive to pick?
>> `llvm::Vec` looks like "just a vector".
>> >>
>> >> Somewhat - but Vec is an abbreviation that isn't really used otherwise
>> >> (if we consider SVec as well, I'd still push back on the introduction
>> >> of the abbreviation for both cases)
>> >
>> > One aspect of the naming is to avoid one not being a prefix of the
>> other (IDE completion, etc.) or them being too close to differentiate.
>> >
>> >> and llvm::Vector would be only a
>> >> case separation away form a standard name (when considering the
>> >> unqualified name - which is likely to come up a fair bit, as we'll see
>> >> "Vector" used unqualified a lot).
>> >
>> > Note: this is why we converged with llvm::Vec and not llvm::Vector in
>> the revision
>>
>> Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity
>> with standard names, etc). Perhaps a bit of a discussion of what other
>> libraries do around this (no doubt there are a bunch of C++ libraries
>> that have "here's the default vector you shuold use instead of
>> std::vector for these reasons" situations).
>>
>
> I had looked into this when the revision was sent, but I couldn't find a
> library with a "vector with inlined storage" and a default N value.
> Both Abseil and Folly have a SmallVector equivalent, Abseil does not have
> a default:
> https://github.com/abseil/abseil-cpp/blob/master/absl/container/inlined_vector.h#L69-L71
> Folly has a default hard-coded to 1:
> https://github.com/facebook/folly/blob/master/folly/small_vector.h#L430
> Folly has also another class, FBVector, which is really an alternative to
> std::vector with a different memory management strategy (growth factor,
> etc.). I don't think Abseil has anything else.
>
> Do you have other ideas of related work to look for?
>

Actually, the reason I originally pursued this was that I used a
SmallDenseMap and happily found that it had a default number of inline
elements:
https://github.com/llvm/llvm-project/blob/69cd776e1ee79e72ccbdad30749eac04579715ee/llvm/include/llvm/ADT/DenseMap.h#L877
I thought "why not SmallVector, since I use that a lot more often"? But
based on historical misuse of SmallVector, using a hardcoded number like
"4" (like SmallDenseMap does) as the default doesn't make much sense, so I
pursued a bounded-sizeof approach.


>
>
>
>>
>> >> I wouldn't entirely object to SmallVector<T> getting a smart default
>> >> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
>> >> don't feel /super/ great about it, but yeah, if we're going to push
>> >> the Programmer's Manual encouragement further in terms of
>> >> reducing/removing use of std::vector, then maybe llvm::Vector isn't
>> >> the worst way to do that.
>> >>
>> >> (it might be that this would benefit from being two separate
>> >> discussions, though - one on how to provide smart defaults for
>> >> SmallVector<T> and one on if/how to push std::vector deprecation in
>> >> favor of SmallVector<T, 0> (semantically speaking - no matter how it's
>> >> spelled) further along)
>> >
>> >
>> > For historical purpose: the review was actually originally only adding
>> a default for SmallVector and nothing else :)
>> > We only converged to the current proposal after a few rounds of reviews
>> trying to tradeoff amongst folks in the review.
>>
>> Perhaps you could summarize some of those discussions/decisions in
>> more detail here? As the RFC stands, these are my comments - I know
>> it's always a tradeoff of which audiences are brought in at what
>> stages (though often folks send ADT/Support changes my way during
>> review - I should probably just set up a Herald rule for these
>> things).
>>
>> Sounds like maybe I am in agreement with the original version of the
>> review, then.
>>
>
> Likely: the review evolved because I opposed to changing the status quo in
> this direction I guess.
>

It sounds like there is pretty good consensus here to just let
`SmallVector<T>` "just work" and choose a default N. The one exception
seems to be that Mehdi has a strong objection having the default N possibly
be 0 (to ensure the bounded-sizeof property). So let's shift the discussion
with a laser focus to specifically that point.

Responding to the arguments I've seen presented so far:
1. I'm not convinced that SmallVector<T> with defaulted N==0 is really
error prone. Mehdi, I like your definition of "I am willing to pay for some
limite inline storage if possible but I don't have a N in mind" and I think
we can apply it to SmallVector<T>; I don't think adding SVec<T> is needed
for that.
2. I don't understand the concern about "forgetting the N". In the "new
world" of `SmallVector<T>`, any `SmallVector<T, N>` in new code would be
treated as "I profiled this code and this explicit N is the right choice".
That is, adding an explicit `N` is going to be rare and deliberate
(possibly with reviewers insisting on a comment!) -- this is not something
that one can "forget".

For point 2., I think we will have some transition period where reviewers
will organically start to propagate the new advice (and patch authors
enjoying not having to write N), but I think the direction is effectively
what I stated.

-- Sean Silva


>
> Best,
>
> --
> Mehdi
>
>
>
>
>>
>> - Dave
>>
>>
>> >> >
>> >> > Cheers,
>> >> >
>> >> > --
>> >> > Mehdi
>> >> >
>> >> >
>> >> >>
>> >> >>
>> >> >> On Fri, Nov 13, 2020 at 2:06 PM Sean Silva via llvm-dev
>> >> >> <llvm-dev at lists.llvm.org> wrote:
>> >> >> >
>> >> >> > We've pretty happy now with a patch that adds two wrappers around
>> SmallVector that make it 1) more convenient to use and 2) will tend to
>> mitigate misuse of SmallVector. We think it's ready for wider discussion:
>> https://reviews.llvm.org/D90884
>> >> >> >
>> >> >> > SVec<T> is a convenience alias for SmallVector<T, N> with N
>> chosen automatically to keep its size under 64 Bytes (that heuristic is
>> easy to change though). The recommendation in the patch is to use this "on
>> the stack, where a "small" number of elements are expected".
>> >> >> >
>> >> >> > Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us
>> get the (little-known?) benefits of SmallVector even when it has no inline
>> elements (see
>> https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h).
>> The recommendation in the patch is to use this when the SmallVector is on
>> the heap.
>> >> >> >
>> >> >> > A lot of this is boiled out from the discussion in
>> https://groups.google.com/g/llvm-dev/c/q1OyHZy8KVc/m/1l_AasOLBAAJ?pli=1
>> >> >> >
>> >> >> > The goals here are twofold:
>> >> >> >
>> >> >> > 1. convenience: not having to read/write "N", or do an extra
>> edit/recompile cycle if you forgot it
>> >> >> >
>> >> >> > 2. avoiding pathological cases: The choice of N is usually
>> semi-arbitrary in our experience, and if one isn't careful, can result in
>> sizeof(SmallVector) becoming huge, especially in the case of nested
>> SmallVectors. This patch avoids pathological cases in two ways:
>> >> >> >   A. SVec<T>'s heuristic keeps sizeof(SVec<T>) bounded, which
>> prevents pathological size amplifications like in `SmallVector<struct
>> {SmallVector<T, 4> a, b; }, 4>`, where the small sizes effectively multiply
>> together. Of course, users can always write SmallVector<T, N> explicitly to
>> bypass this, but the need for that seems rare.
>> >> >> >   B. SmallVector<T, 0> feels "weird to write" for most folks,
>> even though it is frequently the right choice. Vec<T> mitigates that by
>> "looking natural".
>> >> >> >
>> >> >> > I'm surfacing this as an RFC to get feedback on a couple
>> higher-level points:
>> >> >> > - does everybody agree that SVec<T> and Vec<T> are useful to have?
>> >> >> > - get wider consensus around suggesting these as "defaults" (see
>> my updates to ProgrammersManual.rst in the patch)
>> >> >> > - how much we want to bulk migrate code vs let it organically
>> grow. Replacing SmallVector<T, 0> with Vec<T> should be completely
>> mechanical. Replacing SmallVector<T, N> for general N would be a lot more
>> work.
>> >> >> > - of course: naming. SVector/Vector were floated in the patch as
>> well and seem ok. SmallVec was rejected as it was a prefix of SmallVector
>> (messes up autocomplete).
>> >> >> >
>> >> >> > Looking forward to a world with fewer guessed SmallVector sizes,
>> >> >> >
>> >> >> > -- Sean Silva
>> >> >> > _______________________________________________
>> >> >> > LLVM Developers mailing list
>> >> >> > llvm-dev at lists.llvm.org
>> >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201116/8ae25c3b/attachment-0001.html>


More information about the llvm-dev mailing list