[llvm-dev] RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.

Mehdi AMINI via llvm-dev llvm-dev at lists.llvm.org
Mon Nov 16 16:47:48 PST 2020


On Mon, Nov 16, 2020 at 4:10 PM David Blaikie <dblaikie at gmail.com> wrote:

> On Mon, Nov 16, 2020 at 2:44 PM Mehdi AMINI <joker.eph at gmail.com> wrote:
> >
> >
> >
> > On Mon, Nov 16, 2020 at 2:12 PM David Blaikie <dblaikie at gmail.com>
> wrote:
> >>
> >> On Mon, Nov 16, 2020 at 1:55 PM Mehdi AMINI <joker.eph at gmail.com>
> wrote:
> >> > On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dblaikie at gmail.com>
> wrote:
> >> >>
> >> >> I will say I'm not a huge fan of adding even more names for things in
> >> >> this fairly core/common use case (now we'll have even more vector
> >> >> names to pick from) - can we use default template arguments so we can
> >> >> write SmallVector<T> instead of SmallVector<T, N> and would that
> >> >> address some of the use cases proposed here?
> >> >
> >> >
> >> > I won't claim it is perfect, but the added names are a compromise
> over rounds of reviews with the folks in the revision. In particular I'm
> quite concerned that a default value for N on the SmallVector does not
> carry the intent the same way, and is too easy to miss in review (or while
> reading code).
> >>
> >> I'm not quite following here - what sort of problems do you anticipate
> >> by readers missing the default value for N?
> >
> >
> > Reading code `SmallVector<SomeType>` does not express that it is
> *intentional* to leave of the value for N. It can easily be just forgotten,
> and it could easily be implicitly `0`, and as a reviewer (or code reader
> later) I don't know if this is was intentional or not. This is why I am
> quite opposed to "loosen" the current requirement on SmallVector: a
> different name for a different intent is better fitting to me.
>
> I don't know why it being implicitly zero is noteworthy - anymore than
> it being implicitly 1 or 3, etc.
>
> As for whether it's intentional: I think if we're moving in this
> direction that's proposed, the idea is that by default we don't want
> to be explicit about the N - so in general we won't be. And sometimes
> someone will want to be explicit and add an N, but otherwise the
> reasonable default is not to have an N. I think the intent is clear
> enough - consider other default template parameters like customized
> deleters on a unique_ptr: I don't wonder if someone intended to have a
> customized deleter on every unique_ptr, one assumes that wasn't
> required/intended unless it's added. It seems like the idea is for
> non-explicit N to be a safe/common default, and explicit N to be
> noteworthy/require some justification or scrutiny.
>

I get your point, but I don't share the conclusion here: I don't believe
that universally not specifying the N on non-trivial types is a good
default. It only makes sense to me for small types, hence it is error prone.



>
> >> >  To me the drawbacks are outweighing the benefits too much.
> >> > Also, `SmallVector<LargeType>` would end-up with N==0 implicitly,
> without an easy way to figure it out that there is no actual inline storage
> while reading the code.
> >>
> >> Why would it be necessary to figure that out? If we generally feel the
> >> right default inline storage happens to be zero in that case and that
> >> SmallVector will pick a good default (including possibly 0) that seems
> >> like a good thing to me. (why would we call out zero specially, if we
> >> want to avoid calling out other values explicitly)
> >
> >
> > I think this is an API that is "easy to misuse", I don't see any
> advantage to it while it has drawbacks: why would we do that?
>
> What kind of misuse do you have in mind?


It does not convey the intent, it is too easy to misuse: i.e. you expect a
small storage but you don't have one. This is enough to me to outweigh the
benefits of the direction and prefer the status quo.



> To me it seems like it would
> be consistent with the idea that we have built rules about what good
> default inline sizes are - and there's no need for us to think about
> it on every use, we just let the default do what it's meant to do.
>
> >> > An alternative was to reserve the default to only "small object" so
> that N isn't zero, but there isn't a platform independent way of doing that
> and keep the code portable I believe. So `SVec<T>` is really saying: "I am
> willing to pay for some limite inline storage if possible but I don't have
> a N in mind".
> >>
> >> That quoted statement still sounds like it could encapsulate the
> >> possibility of 0 too, though. "limited inline storage if possible"
> >> could be "and the answer to that constraint is zero in this case - but
> >> if you happen to make the T smaller, it could become non-zero
> >> organically/without the need to touch this code" (as the N could vary
> >> organically between non-zero values as struct sizes change too)
> >
> > Yes the quoted statement says exactly that!
>
> Sorry, it seems we're jumbling up the two issues: Whether the type
> name should be different when asking for implicit default inline
> storage length and, separately, whether explicit zero length storage
> should use a different name. The discussion above, I thought, was
> about the latter, not the former - but your response seeems to be
> about the former rather than the latter.


You're answering my quote about SVec, which is the type that is about
implicit N, Vec (no S!) is about the N=0 case. The quote is also about the
inline storage implicit N, it isn't clear to me how you link this to the 0
case (which does not have inline storage)?



>


> Two separate discussions/threads/patches may help keep the
> communication more clear.
>
> >  This is why I don't like having this behavior on SmallVector: this is a
> different semantics / intent that the programmers have and this is
> something that I like being able to read differently.
> >  SVec does not accept a `N`: if I read SVec<Type> there can't be a
> possible accidental omission of N here. It has to be a choice between
> `SmallVector<SomeType, 4>` and `SVec<SomeType>` but not
> `SmallVector<SomeType>`.
>
> To discuss this issue of whether accepting a default size versus
> having an explicit size is a different semantic - as above, I think
> it's worth considering/comparing this to other templates and their
> default template arguments. Things like std::vector's customizable
> allocators (you. don't have to use a different name for the container
> when you specify a custom allocation scheme for std::vector - but we
> don't wonder every time we see std::vector<T> whether the user meant
> to customize the allocator - we accept the default is usualy intended
> (as the default buffer size would be usually intended) unless it's
> specified - if during review we think a custom allocator (or
> non-default buffer size) might be merited, we could ask about it - we
> probably would even if the user had written SVec<T> the same way we
> ask about other data structures today "did you mean SmallVector<T,
> 3>"?), unique_ptr's custom deleters, etc.
>

I don't necessarily connect with the custom allocator / custom delete
analogy right now, they seem too different to me.
I'd look instead at SmallDenseMap and DenseMap maybe, or `std::map` and
`std::unordered_map` which have different class names and aren't just
differentiated with a trait passed as template argument.


>
> >> > Finally the simple `llvm::Vector` case to replace `SmallVector<T, 0>`
> is because it is frequently preferable to `std::vector` but still isn't
> readable or immediately intuitive and so is rarely used in practice (see
> https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h
> for the documented points on N=0).
> >>
> >> I'm not sure llvm::Vector/Vec would necessarily make it more often
> >> used in practice. Maybe a bunch of cleanup and more specific wording
> >> in the style guide that would ban std::vector might help more there.
> >>
> >> Though I agree that it may be suitable to have a clear name for
> >> SmallVector<T, 0> since the "Small" is slightly misleading (though not
> >> entirely - it's important for the "SmallVectorImpl" benefits of
> >> SmallVector - so renaming it to Vector and still using it via
> >> SmallVectorImpl<T>& might also be confusing).
> >>
> >> >> Got any sense of the total value here? Major savings to be had?
> >> >>
> >> >> (if SmallVector<T> can do what your proposed SVec<T> does, that
> leaves
> >> >> the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
> >> >> over std::vector<T>? I guess the SmallVectorImpl generic algorithm
> >> >> opportunities? Though that's rarely needed compared to ArrayRef.)
> >> >> If SmallVector<T> would suffice then maybe Vec<T> could be
> >> >> ZeroSmallVector<T>? Not sure.
> >> >
> >> >
> >> > ZeroSmallVector<T> does not really address your "more vector names to
> pick from" concerns,
> >>
> >> Somewhat - if SmallVector<T> can be used instead of SVec<T> then we
> >> have one fewer name and less convention to base the Vec<T> on (since
> >> it won't have the SVec<T> sibling), so picking a name closer to the
> >> existing SmallVector might be more viable/helpful.
> >>
> >> > and it is longer than `SmallVector<T, 0>`: shouldn't we aim for the
> "default case" to be the easiest to reach / most intuitive to pick?
> `llvm::Vec` looks like "just a vector".
> >>
> >> Somewhat - but Vec is an abbreviation that isn't really used otherwise
> >> (if we consider SVec as well, I'd still push back on the introduction
> >> of the abbreviation for both cases)
> >
> > One aspect of the naming is to avoid one not being a prefix of the other
> (IDE completion, etc.) or them being too close to differentiate.
> >
> >> and llvm::Vector would be only a
> >> case separation away form a standard name (when considering the
> >> unqualified name - which is likely to come up a fair bit, as we'll see
> >> "Vector" used unqualified a lot).
> >
> > Note: this is why we converged with llvm::Vec and not llvm::Vector in
> the revision
>
> Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity
> with standard names, etc). Perhaps a bit of a discussion of what other
> libraries do around this (no doubt there are a bunch of C++ libraries
> that have "here's the default vector you shuold use instead of
> std::vector for these reasons" situations).
>

I had looked into this when the revision was sent, but I couldn't find a
library with a "vector with inlined storage" and a default N value.
Both Abseil and Folly have a SmallVector equivalent, Abseil does not have a
default:
https://github.com/abseil/abseil-cpp/blob/master/absl/container/inlined_vector.h#L69-L71
Folly has a default hard-coded to 1:
https://github.com/facebook/folly/blob/master/folly/small_vector.h#L430
Folly has also another class, FBVector, which is really an alternative to
std::vector with a different memory management strategy (growth factor,
etc.). I don't think Abseil has anything else.

Do you have other ideas of related work to look for?



>
> >> I wouldn't entirely object to SmallVector<T> getting a smart default
> >> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
> >> don't feel /super/ great about it, but yeah, if we're going to push
> >> the Programmer's Manual encouragement further in terms of
> >> reducing/removing use of std::vector, then maybe llvm::Vector isn't
> >> the worst way to do that.
> >>
> >> (it might be that this would benefit from being two separate
> >> discussions, though - one on how to provide smart defaults for
> >> SmallVector<T> and one on if/how to push std::vector deprecation in
> >> favor of SmallVector<T, 0> (semantically speaking - no matter how it's
> >> spelled) further along)
> >
> >
> > For historical purpose: the review was actually originally only adding a
> default for SmallVector and nothing else :)
> > We only converged to the current proposal after a few rounds of reviews
> trying to tradeoff amongst folks in the review.
>
> Perhaps you could summarize some of those discussions/decisions in
> more detail here? As the RFC stands, these are my comments - I know
> it's always a tradeoff of which audiences are brought in at what
> stages (though often folks send ADT/Support changes my way during
> review - I should probably just set up a Herald rule for these
> things).
>
> Sounds like maybe I am in agreement with the original version of the
> review, then.
>

Likely: the review evolved because I opposed to changing the status quo in
this direction I guess.

Best,

-- 
Mehdi




>
> - Dave
>
>
> >> >
> >> > Cheers,
> >> >
> >> > --
> >> > Mehdi
> >> >
> >> >
> >> >>
> >> >>
> >> >> On Fri, Nov 13, 2020 at 2:06 PM Sean Silva via llvm-dev
> >> >> <llvm-dev at lists.llvm.org> wrote:
> >> >> >
> >> >> > We've pretty happy now with a patch that adds two wrappers around
> SmallVector that make it 1) more convenient to use and 2) will tend to
> mitigate misuse of SmallVector. We think it's ready for wider discussion:
> https://reviews.llvm.org/D90884
> >> >> >
> >> >> > SVec<T> is a convenience alias for SmallVector<T, N> with N chosen
> automatically to keep its size under 64 Bytes (that heuristic is easy to
> change though). The recommendation in the patch is to use this "on the
> stack, where a "small" number of elements are expected".
> >> >> >
> >> >> > Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us
> get the (little-known?) benefits of SmallVector even when it has no inline
> elements (see
> https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h). The
> recommendation in the patch is to use this when the SmallVector is on the
> heap.
> >> >> >
> >> >> > A lot of this is boiled out from the discussion in
> https://groups.google.com/g/llvm-dev/c/q1OyHZy8KVc/m/1l_AasOLBAAJ?pli=1
> >> >> >
> >> >> > The goals here are twofold:
> >> >> >
> >> >> > 1. convenience: not having to read/write "N", or do an extra
> edit/recompile cycle if you forgot it
> >> >> >
> >> >> > 2. avoiding pathological cases: The choice of N is usually
> semi-arbitrary in our experience, and if one isn't careful, can result in
> sizeof(SmallVector) becoming huge, especially in the case of nested
> SmallVectors. This patch avoids pathological cases in two ways:
> >> >> >   A. SVec<T>'s heuristic keeps sizeof(SVec<T>) bounded, which
> prevents pathological size amplifications like in `SmallVector<struct
> {SmallVector<T, 4> a, b; }, 4>`, where the small sizes effectively multiply
> together. Of course, users can always write SmallVector<T, N> explicitly to
> bypass this, but the need for that seems rare.
> >> >> >   B. SmallVector<T, 0> feels "weird to write" for most folks, even
> though it is frequently the right choice. Vec<T> mitigates that by "looking
> natural".
> >> >> >
> >> >> > I'm surfacing this as an RFC to get feedback on a couple
> higher-level points:
> >> >> > - does everybody agree that SVec<T> and Vec<T> are useful to have?
> >> >> > - get wider consensus around suggesting these as "defaults" (see
> my updates to ProgrammersManual.rst in the patch)
> >> >> > - how much we want to bulk migrate code vs let it organically
> grow. Replacing SmallVector<T, 0> with Vec<T> should be completely
> mechanical. Replacing SmallVector<T, N> for general N would be a lot more
> work.
> >> >> > - of course: naming. SVector/Vector were floated in the patch as
> well and seem ok. SmallVec was rejected as it was a prefix of SmallVector
> (messes up autocomplete).
> >> >> >
> >> >> > Looking forward to a world with fewer guessed SmallVector sizes,
> >> >> >
> >> >> > -- Sean Silva
> >> >> > _______________________________________________
> >> >> > LLVM Developers mailing list
> >> >> > llvm-dev at lists.llvm.org
> >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201116/119b711e/attachment.html>


More information about the llvm-dev mailing list