[llvm-dev] RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.

Mon Nov 16 14:43:35 PST 2020

On Mon, Nov 16, 2020 at 2:12 PM David Blaikie <dblaikie at gmail.com> wrote:

> On Mon, Nov 16, 2020 at 1:55 PM Mehdi AMINI <joker.eph at gmail.com> wrote:
> > On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dblaikie at gmail.com>
> wrote:
> >>
> >> I will say I'm not a huge fan of adding even more names for things in
> >> this fairly core/common use case (now we'll have even more vector
> >> names to pick from) - can we use default template arguments so we can
> >> write SmallVector<T> instead of SmallVector<T, N> and would that
> >> address some of the use cases proposed here?
> >
> >
> > I won't claim it is perfect, but the added names are a compromise over
> rounds of reviews with the folks in the revision. In particular I'm quite
> concerned that a default value for N on the SmallVector does not carry the
> intent the same way, and is too easy to miss in review (or while reading
> code).
>
> I'm not quite following here - what sort of problems do you anticipate
> by readers missing the default value for N?
>

Reading code `SmallVector<SomeType>` does not express that it is
*intentional* to leave of the value for N. It can easily be just forgotten,
and it could easily be implicitly `0`, and as a reviewer (or code reader
later) I don't know if this is was intentional or not. This is why I am
quite opposed to "loosen" the current requirement on SmallVector: a
different name for a different intent is better fitting to me.

>
> >  To me the drawbacks are outweighing the benefits too much.
> > Also, `SmallVector<LargeType>` would end-up with N==0 implicitly,
> without an easy way to figure it out that there is no actual inline storage
> while reading the code.
>
> Why would it be necessary to figure that out? If we generally feel the
> right default inline storage happens to be zero in that case and that
> SmallVector will pick a good default (including possibly 0) that seems
> like a good thing to me. (why would we call out zero specially, if we
> want to avoid calling out other values explicitly)
>

I think this is an API that is "easy to misuse", I don't see any advantage
to it while it has drawbacks: why would we do that?

>
> > An alternative was to reserve the default to only "small object" so that
> N isn't zero, but there isn't a platform independent way of doing that and
> keep the code portable I believe. So `SVec<T>` is really saying: "I am
> willing to pay for some limite inline storage if possible but I don't have
> a N in mind".
>
> That quoted statement still sounds like it could encapsulate the
> possibility of 0 too, though. "limited inline storage if possible"
> could be "and the answer to that constraint is zero in this case - but
> if you happen to make the T smaller, it could become non-zero
> organically/without the need to touch this code" (as the N could vary
> organically between non-zero values as struct sizes change too)
>

Yes the quoted statement says exactly that! This is why I don't like having
this behavior on SmallVector: this is a different semantics / intent
that the programmers have and this is something that I like being able to
read differently.
 SVec does not accept a `N`: if I read SVec<Type> there can't be a possible
accidental omission of N here. It has to be a choice between
`SmallVector<SomeType, 4>` and `SVec<SomeType>` but not
`SmallVector<SomeType>`.

> Finally the simple `llvm::Vector` case to replace `SmallVector<T, 0>` is
> because it is frequently preferable to `std::vector` but still isn't
> readable or immediately intuitive and so is rarely used in practice (see
> https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h
> for the documented points on N=0).
>
> I'm not sure llvm::Vector/Vec would necessarily make it more often
> used in practice. Maybe a bunch of cleanup and more specific wording
> in the style guide that would ban std::vector might help more there.
>
> Though I agree that it may be suitable to have a clear name for
> SmallVector<T, 0> since the "Small" is slightly misleading (though not
> entirely - it's important for the "SmallVectorImpl" benefits of
> SmallVector - so renaming it to Vector and still using it via
> SmallVectorImpl<T>& might also be confusing).
>
> >> Got any sense of the total value here? Major savings to be had?
> >>
> >> (if SmallVector<T> can do what your proposed SVec<T> does, that leaves
> >> the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
> >> over std::vector<T>? I guess the SmallVectorImpl generic algorithm
> >> opportunities? Though that's rarely needed compared to ArrayRef.)
> >> If SmallVector<T> would suffice then maybe Vec<T> could be
> >> ZeroSmallVector<T>? Not sure.
> >
> >
> > ZeroSmallVector<T> does not really address your "more vector names to
> pick from" concerns,
>
> Somewhat - if SmallVector<T> can be used instead of SVec<T> then we
> have one fewer name and less convention to base the Vec<T> on (since
> it won't have the SVec<T> sibling), so picking a name closer to the
> existing SmallVector might be more viable/helpful.
>
> > and it is longer than `SmallVector<T, 0>`: shouldn't we aim for the
> "default case" to be the easiest to reach / most intuitive to pick?
> `llvm::Vec` looks like "just a vector".
>
> Somewhat - but Vec is an abbreviation that isn't really used otherwise
> (if we consider SVec as well, I'd still push back on the introduction
> of the abbreviation for both cases)

One aspect of the naming is to avoid one not being a prefix of the other
(IDE completion, etc.) or them being too close to differentiate.

> and llvm::Vector would be only a
> case separation away form a standard name (when considering the
> unqualified name - which is likely to come up a fair bit, as we'll see
> "Vector" used unqualified a lot).
>

Note: this is why we converged with llvm::Vec and not llvm::Vector in the
revision

I wouldn't entirely object to SmallVector<T> getting a smart default
> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
> don't feel /super/ great about it, but yeah, if we're going to push
> the Programmer's Manual encouragement further in terms of
> reducing/removing use of std::vector, then maybe llvm::Vector isn't
> the worst way to do that.
>
> (it might be that this would benefit from being two separate
> discussions, though - one on how to provide smart defaults for
> SmallVector<T> and one on if/how to push std::vector deprecation in
> favor of SmallVector<T, 0> (semantically speaking - no matter how it's
> spelled) further along)
>

For historical purpose: the review was actually originally only adding a
default for SmallVector and nothing else :)
We only converged to the current proposal after a few rounds of reviews
trying to tradeoff amongst folks in the review.

>
> >
> > Cheers,
> >
> > --
> > Mehdi
> >
> >
> >>
> >>
> >> On Fri, Nov 13, 2020 at 2:06 PM Sean Silva via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >> >
> >> > We've pretty happy now with a patch that adds two wrappers around
> SmallVector that make it 1) more convenient to use and 2) will tend to
> mitigate misuse of SmallVector. We think it's ready for wider discussion:
> https://reviews.llvm.org/D90884
> >> >
> >> > SVec<T> is a convenience alias for SmallVector<T, N> with N chosen
> automatically to keep its size under 64 Bytes (that heuristic is easy to
> change though). The recommendation in the patch is to use this "on the
> stack, where a "small" number of elements are expected".
> >> >
> >> > Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us get
> the (little-known?) benefits of SmallVector even when it has no inline
> elements (see
> https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h). The
> recommendation in the patch is to use this when the SmallVector is on the
> heap.
> >> >
> >> > A lot of this is boiled out from the discussion in
> https://groups.google.com/g/llvm-dev/c/q1OyHZy8KVc/m/1l_AasOLBAAJ?pli=1
> >> >
> >> > The goals here are twofold:
> >> >
> >> > 1. convenience: not having to read/write "N", or do an extra
> edit/recompile cycle if you forgot it
> >> >
> >> > 2. avoiding pathological cases: The choice of N is usually
> semi-arbitrary in our experience, and if one isn't careful, can result in
> sizeof(SmallVector) becoming huge, especially in the case of nested
> SmallVectors. This patch avoids pathological cases in two ways:
> >> >   A. SVec<T>'s heuristic keeps sizeof(SVec<T>) bounded, which
> prevents pathological size amplifications like in `SmallVector<struct
> {SmallVector<T, 4> a, b; }, 4>`, where the small sizes effectively multiply
> together. Of course, users can always write SmallVector<T, N> explicitly to
> bypass this, but the need for that seems rare.
> >> >   B. SmallVector<T, 0> feels "weird to write" for most folks, even
> though it is frequently the right choice. Vec<T> mitigates that by "looking
> natural".
> >> >
> >> > I'm surfacing this as an RFC to get feedback on a couple higher-level
> points:
> >> > - does everybody agree that SVec<T> and Vec<T> are useful to have?
> >> > - get wider consensus around suggesting these as "defaults" (see my
> updates to ProgrammersManual.rst in the patch)
> >> > - how much we want to bulk migrate code vs let it organically grow.
> Replacing SmallVector<T, 0> with Vec<T> should be completely mechanical.
> Replacing SmallVector<T, N> for general N would be a lot more work.
> >> > - of course: naming. SVector/Vector were floated in the patch as well
> and seem ok. SmallVec was rejected as it was a prefix of SmallVector
> (messes up autocomplete).
> >> >
> >> > Looking forward to a world with fewer guessed SmallVector sizes,
> >> >
> >> > -- Sean Silva
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > llvm-dev at lists.llvm.org
> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201116/37c6bee4/attachment-0001.html>