[llvm-dev] RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.

Mon Nov 16 14:12:30 PST 2020

On Mon, Nov 16, 2020 at 1:55 PM Mehdi AMINI <joker.eph at gmail.com> wrote:
> On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dblaikie at gmail.com> wrote:
>>
>> I will say I'm not a huge fan of adding even more names for things in
>> this fairly core/common use case (now we'll have even more vector
>> names to pick from) - can we use default template arguments so we can
>> write SmallVector<T> instead of SmallVector<T, N> and would that
>> address some of the use cases proposed here?
>
>
> I won't claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I'm quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code).

I'm not quite following here - what sort of problems do you anticipate
by readers missing the default value for N?

>  To me the drawbacks are outweighing the benefits too much.
> Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.

Why would it be necessary to figure that out? If we generally feel the
right default inline storage happens to be zero in that case and that
SmallVector will pick a good default (including possibly 0) that seems
like a good thing to me. (why would we call out zero specially, if we
want to avoid calling out other values explicitly)

> An alternative was to reserve the default to only "small object" so that N isn't zero, but there isn't a platform independent way of doing that and keep the code portable I believe. So `SVec<T>` is really saying: "I am willing to pay for some limite inline storage if possible but I don't have a N in mind".

That quoted statement still sounds like it could encapsulate the
possibility of 0 too, though. "limited inline storage if possible"
could be "and the answer to that constraint is zero in this case - but
if you happen to make the T smaller, it could become non-zero
organically/without the need to touch this code" (as the N could vary
organically between non-zero values as struct sizes change too)

> Finally the simple `llvm::Vector` case to replace `SmallVector<T, 0>` is because it is frequently preferable to `std::vector` but still isn't readable or immediately intuitive and so is rarely used in practice (see https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h for the documented points on N=0).

I'm not sure llvm::Vector/Vec would necessarily make it more often
used in practice. Maybe a bunch of cleanup and more specific wording
in the style guide that would ban std::vector might help more there.

Though I agree that it may be suitable to have a clear name for
SmallVector<T, 0> since the "Small" is slightly misleading (though not
entirely - it's important for the "SmallVectorImpl" benefits of
SmallVector - so renaming it to Vector and still using it via
SmallVectorImpl<T>& might also be confusing).

>> Got any sense of the total value here? Major savings to be had?
>>
>> (if SmallVector<T> can do what your proposed SVec<T> does, that leaves
>> the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
>> over std::vector<T>? I guess the SmallVectorImpl generic algorithm
>> opportunities? Though that's rarely needed compared to ArrayRef.)
>> If SmallVector<T> would suffice then maybe Vec<T> could be
>> ZeroSmallVector<T>? Not sure.
>
>
> ZeroSmallVector<T> does not really address your "more vector names to pick from" concerns,

Somewhat - if SmallVector<T> can be used instead of SVec<T> then we
have one fewer name and less convention to base the Vec<T> on (since
it won't have the SVec<T> sibling), so picking a name closer to the
existing SmallVector might be more viable/helpful.

> and it is longer than `SmallVector<T, 0>`: shouldn't we aim for the "default case" to be the easiest to reach / most intuitive to pick? `llvm::Vec` looks like "just a vector".

Somewhat - but Vec is an abbreviation that isn't really used otherwise
(if we consider SVec as well, I'd still push back on the introduction
of the abbreviation for both cases) and llvm::Vector would be only a
case separation away form a standard name (when considering the
unqualified name - which is likely to come up a fair bit, as we'll see
"Vector" used unqualified a lot).

I wouldn't entirely object to SmallVector<T> getting a smart default
buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
don't feel /super/ great about it, but yeah, if we're going to push
the Programmer's Manual encouragement further in terms of
reducing/removing use of std::vector, then maybe llvm::Vector isn't
the worst way to do that.

(it might be that this would benefit from being two separate
discussions, though - one on how to provide smart defaults for
SmallVector<T> and one on if/how to push std::vector deprecation in
favor of SmallVector<T, 0> (semantically speaking - no matter how it's
spelled) further along)

>
> Cheers,
>
> --
> Mehdi
>
>
>>
>>
>> On Fri, Nov 13, 2020 at 2:06 PM Sean Silva via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> >
>> > We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: https://reviews.llvm.org/D90884
>> >
>> > SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".
>> >
>> > Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us get the (little-known?) benefits of SmallVector even when it has no inline elements (see https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h). The recommendation in the patch is to use this when the SmallVector is on the heap.
>> >
>> > A lot of this is boiled out from the discussion in https://groups.google.com/g/llvm-dev/c/q1OyHZy8KVc/m/1l_AasOLBAAJ?pli=1
>> >
>> > The goals here are twofold:
>> >
>> > 1. convenience: not having to read/write "N", or do an extra edit/recompile cycle if you forgot it
>> >
>> > 2. avoiding pathological cases: The choice of N is usually semi-arbitrary in our experience, and if one isn't careful, can result in sizeof(SmallVector) becoming huge, especially in the case of nested SmallVectors. This patch avoids pathological cases in two ways:
>> >   A. SVec<T>'s heuristic keeps sizeof(SVec<T>) bounded, which prevents pathological size amplifications like in `SmallVector<struct {SmallVector<T, 4> a, b; }, 4>`, where the small sizes effectively multiply together. Of course, users can always write SmallVector<T, N> explicitly to bypass this, but the need for that seems rare.
>> >   B. SmallVector<T, 0> feels "weird to write" for most folks, even though it is frequently the right choice. Vec<T> mitigates that by "looking natural".
>> >
>> > I'm surfacing this as an RFC to get feedback on a couple higher-level points:
>> > - does everybody agree that SVec<T> and Vec<T> are useful to have?
>> > - get wider consensus around suggesting these as "defaults" (see my updates to ProgrammersManual.rst in the patch)
>> > - how much we want to bulk migrate code vs let it organically grow. Replacing SmallVector<T, 0> with Vec<T> should be completely mechanical. Replacing SmallVector<T, N> for general N would be a lot more work.
>> > - of course: naming. SVector/Vector were floated in the patch as well and seem ok. SmallVec was rejected as it was a prefix of SmallVector (messes up autocomplete).
>> >
>> > Looking forward to a world with fewer guessed SmallVector sizes,
>> >
>> > -- Sean Silva
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev