[llvm-dev] RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.

Mehdi AMINI via llvm-dev llvm-dev at lists.llvm.org
Mon Nov 16 18:52:31 PST 2020


On Mon, Nov 16, 2020 at 6:44 PM Mehdi AMINI <joker.eph at gmail.com> wrote:

>
>
> On Mon, Nov 16, 2020 at 6:14 PM Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Mon, Nov 16, 2020 at 4:48 PM Mehdi AMINI <joker.eph at gmail.com> wrote:
>>
>>>
>>>
>>> On Mon, Nov 16, 2020 at 4:10 PM David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>
>>>> On Mon, Nov 16, 2020 at 2:44 PM Mehdi AMINI <joker.eph at gmail.com>
>>>> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On Mon, Nov 16, 2020 at 2:12 PM David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>> >>
>>>> >> On Mon, Nov 16, 2020 at 1:55 PM Mehdi AMINI <joker.eph at gmail.com>
>>>> wrote:
>>>> >> > On Mon, Nov 16, 2020 at 12:55 PM David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>> >> >>
>>>> >> >> I will say I'm not a huge fan of adding even more names for
>>>> things in
>>>> >> >> this fairly core/common use case (now we'll have even more vector
>>>> >> >> names to pick from) - can we use default template arguments so we
>>>> can
>>>> >> >> write SmallVector<T> instead of SmallVector<T, N> and would that
>>>> >> >> address some of the use cases proposed here?
>>>> >> >
>>>> >> >
>>>> >> > I won't claim it is perfect, but the added names are a compromise
>>>> over rounds of reviews with the folks in the revision. In particular I'm
>>>> quite concerned that a default value for N on the SmallVector does not
>>>> carry the intent the same way, and is too easy to miss in review (or while
>>>> reading code).
>>>> >>
>>>> >> I'm not quite following here - what sort of problems do you
>>>> anticipate
>>>> >> by readers missing the default value for N?
>>>> >
>>>> >
>>>> > Reading code `SmallVector<SomeType>` does not express that it is
>>>> *intentional* to leave of the value for N. It can easily be just forgotten,
>>>> and it could easily be implicitly `0`, and as a reviewer (or code reader
>>>> later) I don't know if this is was intentional or not. This is why I am
>>>> quite opposed to "loosen" the current requirement on SmallVector: a
>>>> different name for a different intent is better fitting to me.
>>>>
>>>> I don't know why it being implicitly zero is noteworthy - anymore than
>>>> it being implicitly 1 or 3, etc.
>>>>
>>>> As for whether it's intentional: I think if we're moving in this
>>>> direction that's proposed, the idea is that by default we don't want
>>>> to be explicit about the N - so in general we won't be. And sometimes
>>>> someone will want to be explicit and add an N, but otherwise the
>>>> reasonable default is not to have an N. I think the intent is clear
>>>> enough - consider other default template parameters like customized
>>>> deleters on a unique_ptr: I don't wonder if someone intended to have a
>>>> customized deleter on every unique_ptr, one assumes that wasn't
>>>> required/intended unless it's added. It seems like the idea is for
>>>> non-explicit N to be a safe/common default, and explicit N to be
>>>> noteworthy/require some justification or scrutiny.
>>>>
>>>
>>> I get your point, but I don't share the conclusion here: I don't believe
>>> that universally not specifying the N on non-trivial types is a good
>>> default. It only makes sense to me for small types, hence it is error prone.
>>>
>>>
>>>
>>>>
>>>> >> >  To me the drawbacks are outweighing the benefits too much.
>>>> >> > Also, `SmallVector<LargeType>` would end-up with N==0 implicitly,
>>>> without an easy way to figure it out that there is no actual inline storage
>>>> while reading the code.
>>>> >>
>>>> >> Why would it be necessary to figure that out? If we generally feel
>>>> the
>>>> >> right default inline storage happens to be zero in that case and that
>>>> >> SmallVector will pick a good default (including possibly 0) that
>>>> seems
>>>> >> like a good thing to me. (why would we call out zero specially, if we
>>>> >> want to avoid calling out other values explicitly)
>>>> >
>>>> >
>>>> > I think this is an API that is "easy to misuse", I don't see any
>>>> advantage to it while it has drawbacks: why would we do that?
>>>>
>>>> What kind of misuse do you have in mind?
>>>
>>>
>>> It does not convey the intent, it is too easy to misuse: i.e. you expect
>>> a small storage but you don't have one. This is enough to me to outweigh
>>> the benefits of the direction and prefer the status quo.
>>>
>>>
>>>
>>>> To me it seems like it would
>>>> be consistent with the idea that we have built rules about what good
>>>> default inline sizes are - and there's no need for us to think about
>>>> it on every use, we just let the default do what it's meant to do.
>>>>
>>>> >> > An alternative was to reserve the default to only "small object"
>>>> so that N isn't zero, but there isn't a platform independent way of doing
>>>> that and keep the code portable I believe. So `SVec<T>` is really saying:
>>>> "I am willing to pay for some limite inline storage if possible but I don't
>>>> have a N in mind".
>>>> >>
>>>> >> That quoted statement still sounds like it could encapsulate the
>>>> >> possibility of 0 too, though. "limited inline storage if possible"
>>>> >> could be "and the answer to that constraint is zero in this case -
>>>> but
>>>> >> if you happen to make the T smaller, it could become non-zero
>>>> >> organically/without the need to touch this code" (as the N could vary
>>>> >> organically between non-zero values as struct sizes change too)
>>>> >
>>>> > Yes the quoted statement says exactly that!
>>>>
>>>> Sorry, it seems we're jumbling up the two issues: Whether the type
>>>> name should be different when asking for implicit default inline
>>>> storage length and, separately, whether explicit zero length storage
>>>> should use a different name. The discussion above, I thought, was
>>>> about the latter, not the former - but your response seeems to be
>>>> about the former rather than the latter.
>>>
>>>
>>> You're answering my quote about SVec, which is the type that is about
>>> implicit N, Vec (no S!) is about the N=0 case. The quote is also about the
>>> inline storage implicit N, it isn't clear to me how you link this to the 0
>>> case (which does not have inline storage)?
>>>
>>>
>>>
>>>>
>>>
>>>
>>>> Two separate discussions/threads/patches may help keep the
>>>> communication more clear.
>>>>
>>>> >  This is why I don't like having this behavior on SmallVector: this
>>>> is a different semantics / intent that the programmers have and this is
>>>> something that I like being able to read differently.
>>>> >  SVec does not accept a `N`: if I read SVec<Type> there can't be a
>>>> possible accidental omission of N here. It has to be a choice between
>>>> `SmallVector<SomeType, 4>` and `SVec<SomeType>` but not
>>>> `SmallVector<SomeType>`.
>>>>
>>>> To discuss this issue of whether accepting a default size versus
>>>> having an explicit size is a different semantic - as above, I think
>>>> it's worth considering/comparing this to other templates and their
>>>> default template arguments. Things like std::vector's customizable
>>>> allocators (you. don't have to use a different name for the container
>>>> when you specify a custom allocation scheme for std::vector - but we
>>>> don't wonder every time we see std::vector<T> whether the user meant
>>>> to customize the allocator - we accept the default is usualy intended
>>>> (as the default buffer size would be usually intended) unless it's
>>>> specified - if during review we think a custom allocator (or
>>>> non-default buffer size) might be merited, we could ask about it - we
>>>> probably would even if the user had written SVec<T> the same way we
>>>> ask about other data structures today "did you mean SmallVector<T,
>>>> 3>"?), unique_ptr's custom deleters, etc.
>>>>
>>>
>>> I don't necessarily connect with the custom allocator / custom delete
>>> analogy right now, they seem too different to me.
>>> I'd look instead at SmallDenseMap and DenseMap maybe, or `std::map` and
>>> `std::unordered_map` which have different class names and aren't just
>>> differentiated with a trait passed as template argument.
>>>
>>>
>>>>
>>>> >> > Finally the simple `llvm::Vector` case to replace `SmallVector<T,
>>>> 0>` is because it is frequently preferable to `std::vector` but still isn't
>>>> readable or immediately intuitive and so is rarely used in practice (see
>>>> https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h
>>>> for the documented points on N=0).
>>>> >>
>>>> >> I'm not sure llvm::Vector/Vec would necessarily make it more often
>>>> >> used in practice. Maybe a bunch of cleanup and more specific wording
>>>> >> in the style guide that would ban std::vector might help more there.
>>>> >>
>>>> >> Though I agree that it may be suitable to have a clear name for
>>>> >> SmallVector<T, 0> since the "Small" is slightly misleading (though
>>>> not
>>>> >> entirely - it's important for the "SmallVectorImpl" benefits of
>>>> >> SmallVector - so renaming it to Vector and still using it via
>>>> >> SmallVectorImpl<T>& might also be confusing).
>>>> >>
>>>> >> >> Got any sense of the total value here? Major savings to be had?
>>>> >> >>
>>>> >> >> (if SmallVector<T> can do what your proposed SVec<T> does, that
>>>> leaves
>>>> >> >> the Vec<T> - could you expound on the benefits of SmallVector<T,
>>>> 0>
>>>> >> >> over std::vector<T>? I guess the SmallVectorImpl generic algorithm
>>>> >> >> opportunities? Though that's rarely needed compared to ArrayRef.)
>>>> >> >> If SmallVector<T> would suffice then maybe Vec<T> could be
>>>> >> >> ZeroSmallVector<T>? Not sure.
>>>> >> >
>>>> >> >
>>>> >> > ZeroSmallVector<T> does not really address your "more vector names
>>>> to pick from" concerns,
>>>> >>
>>>> >> Somewhat - if SmallVector<T> can be used instead of SVec<T> then we
>>>> >> have one fewer name and less convention to base the Vec<T> on (since
>>>> >> it won't have the SVec<T> sibling), so picking a name closer to the
>>>> >> existing SmallVector might be more viable/helpful.
>>>> >>
>>>> >> > and it is longer than `SmallVector<T, 0>`: shouldn't we aim for
>>>> the "default case" to be the easiest to reach / most intuitive to pick?
>>>> `llvm::Vec` looks like "just a vector".
>>>> >>
>>>> >> Somewhat - but Vec is an abbreviation that isn't really used
>>>> otherwise
>>>> >> (if we consider SVec as well, I'd still push back on the introduction
>>>> >> of the abbreviation for both cases)
>>>> >
>>>> > One aspect of the naming is to avoid one not being a prefix of the
>>>> other (IDE completion, etc.) or them being too close to differentiate.
>>>> >
>>>> >> and llvm::Vector would be only a
>>>> >> case separation away form a standard name (when considering the
>>>> >> unqualified name - which is likely to come up a fair bit, as we'll
>>>> see
>>>> >> "Vector" used unqualified a lot).
>>>> >
>>>> > Note: this is why we converged with llvm::Vec and not llvm::Vector in
>>>> the revision
>>>>
>>>> Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity
>>>> with standard names, etc). Perhaps a bit of a discussion of what other
>>>> libraries do around this (no doubt there are a bunch of C++ libraries
>>>> that have "here's the default vector you shuold use instead of
>>>> std::vector for these reasons" situations).
>>>>
>>>
>>> I had looked into this when the revision was sent, but I couldn't find a
>>> library with a "vector with inlined storage" and a default N value.
>>> Both Abseil and Folly have a SmallVector equivalent, Abseil does not
>>> have a default:
>>> https://github.com/abseil/abseil-cpp/blob/master/absl/container/inlined_vector.h#L69-L71
>>> Folly has a default hard-coded to 1:
>>> https://github.com/facebook/folly/blob/master/folly/small_vector.h#L430
>>> Folly has also another class, FBVector, which is really an alternative
>>> to std::vector with a different memory management strategy (growth factor,
>>> etc.). I don't think Abseil has anything else.
>>>
>>> Do you have other ideas of related work to look for?
>>>
>>
>> Actually, the reason I originally pursued this was that I used a
>> SmallDenseMap and happily found that it had a default number of inline
>> elements:
>> https://github.com/llvm/llvm-project/blob/69cd776e1ee79e72ccbdad30749eac04579715ee/llvm/include/llvm/ADT/DenseMap.h#L877
>> I thought "why not SmallVector, since I use that a lot more often"? But
>> based on historical misuse of SmallVector, using a hardcoded number like
>> "4" (like SmallDenseMap does) as the default doesn't make much sense, so I
>> pursued a bounded-sizeof approach.
>>
>>
>>>
>>>
>>>
>>>>
>>>> >> I wouldn't entirely object to SmallVector<T> getting a smart default
>>>> >> buffer size, and llvm::Vector being an alias for SmallVector<T, 0> -
>>>> I
>>>> >> don't feel /super/ great about it, but yeah, if we're going to push
>>>> >> the Programmer's Manual encouragement further in terms of
>>>> >> reducing/removing use of std::vector, then maybe llvm::Vector isn't
>>>> >> the worst way to do that.
>>>> >>
>>>> >> (it might be that this would benefit from being two separate
>>>> >> discussions, though - one on how to provide smart defaults for
>>>> >> SmallVector<T> and one on if/how to push std::vector deprecation in
>>>> >> favor of SmallVector<T, 0> (semantically speaking - no matter how
>>>> it's
>>>> >> spelled) further along)
>>>> >
>>>> >
>>>> > For historical purpose: the review was actually originally only
>>>> adding a default for SmallVector and nothing else :)
>>>> > We only converged to the current proposal after a few rounds of
>>>> reviews trying to tradeoff amongst folks in the review.
>>>>
>>>> Perhaps you could summarize some of those discussions/decisions in
>>>> more detail here? As the RFC stands, these are my comments - I know
>>>> it's always a tradeoff of which audiences are brought in at what
>>>> stages (though often folks send ADT/Support changes my way during
>>>> review - I should probably just set up a Herald rule for these
>>>> things).
>>>>
>>>> Sounds like maybe I am in agreement with the original version of the
>>>> review, then.
>>>>
>>>
>>> Likely: the review evolved because I opposed to changing the status quo
>>> in this direction I guess.
>>>
>>
>> It sounds like there is pretty good consensus here to just let
>> `SmallVector<T>` "just work" and choose a default N. The one exception
>> seems to be that Mehdi has a strong objection having the default N possibly
>> be 0 (to ensure the bounded-sizeof property). So let's shift the discussion
>> with a laser focus to specifically that point.
>>
>> Responding to the arguments I've seen presented so far:
>> 1. I'm not convinced that SmallVector<T> with defaulted N==0 is really
>> error prone. Mehdi, I like your definition of "I am willing to pay for some
>> limite inline storage if possible but I don't have a N in mind" and I think
>> we can apply it to SmallVector<T>; I don't think adding SVec<T> is needed
>> for that.
>>
>
> I understand your point, I still strongly disagree for the reason
> previously discussed: in particular I don't understand why applying it to
> SmallVector would be better than what we converged on with SVec.
>

I'd add that we have SmallVector the way it is and it has been serving us
pretty well so far: I'm not unhappy with the status quo for example.
Changing the status quo deserves a stronger case than what I see here, in
particular in light of the downside I see and also the alternative
(SVec<T>).

>
>
>
>> 2. I don't understand the concern about "forgetting the N". In the "new
>> world" of `SmallVector<T>`, any `SmallVector<T, N>` in new code would be
>> treated as "I profiled this code and this explicit N is the right choice".
>> That is, adding an explicit `N` is going to be rare and deliberate
>> (possibly with reviewers insisting on a comment!) -- this is not something
>> that one can "forget".
>>
>
That may be another aspect of our disconnect: you seem to propose a
different approach to SmallVector where N is exceptional. I am seeing your
addition as a convenience for a default case, that does not preclude from
using local knowledge to be more specific. You may be right that on the
long run we'd end up with a majority of unspecified N (I think I proposed
in the revision llvm::Vector for this instead by the way), I just see
SmallVector as something that is more of a conscientious choice at the
moment.


>
>> For point 2., I think we will have some transition period where reviewers
>> will organically start to propagate the new advice (and patch authors
>> enjoying not having to write N), but I think the direction is effectively
>> what I stated.
>>
>> -- Sean Silva
>>
>>
>>>
>>> Best,
>>>
>>> --
>>> Mehdi
>>>
>>>
>>>
>>>
>>>>
>>>> - Dave
>>>>
>>>>
>>>> >> >
>>>> >> > Cheers,
>>>> >> >
>>>> >> > --
>>>> >> > Mehdi
>>>> >> >
>>>> >> >
>>>> >> >>
>>>> >> >>
>>>> >> >> On Fri, Nov 13, 2020 at 2:06 PM Sean Silva via llvm-dev
>>>> >> >> <llvm-dev at lists.llvm.org> wrote:
>>>> >> >> >
>>>> >> >> > We've pretty happy now with a patch that adds two wrappers
>>>> around SmallVector that make it 1) more convenient to use and 2) will tend
>>>> to mitigate misuse of SmallVector. We think it's ready for wider
>>>> discussion: https://reviews.llvm.org/D90884
>>>> >> >> >
>>>> >> >> > SVec<T> is a convenience alias for SmallVector<T, N> with N
>>>> chosen automatically to keep its size under 64 Bytes (that heuristic is
>>>> easy to change though). The recommendation in the patch is to use this "on
>>>> the stack, where a "small" number of elements are expected".
>>>> >> >> >
>>>> >> >> > Vec<T> is a convenience alias for SmallVector<T, 0>. It lets us
>>>> get the (little-known?) benefits of SmallVector even when it has no inline
>>>> elements (see
>>>> https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h).
>>>> The recommendation in the patch is to use this when the SmallVector is on
>>>> the heap.
>>>> >> >> >
>>>> >> >> > A lot of this is boiled out from the discussion in
>>>> https://groups.google.com/g/llvm-dev/c/q1OyHZy8KVc/m/1l_AasOLBAAJ?pli=1
>>>> >> >> >
>>>> >> >> > The goals here are twofold:
>>>> >> >> >
>>>> >> >> > 1. convenience: not having to read/write "N", or do an extra
>>>> edit/recompile cycle if you forgot it
>>>> >> >> >
>>>> >> >> > 2. avoiding pathological cases: The choice of N is usually
>>>> semi-arbitrary in our experience, and if one isn't careful, can result in
>>>> sizeof(SmallVector) becoming huge, especially in the case of nested
>>>> SmallVectors. This patch avoids pathological cases in two ways:
>>>> >> >> >   A. SVec<T>'s heuristic keeps sizeof(SVec<T>) bounded, which
>>>> prevents pathological size amplifications like in `SmallVector<struct
>>>> {SmallVector<T, 4> a, b; }, 4>`, where the small sizes effectively multiply
>>>> together. Of course, users can always write SmallVector<T, N> explicitly to
>>>> bypass this, but the need for that seems rare.
>>>> >> >> >   B. SmallVector<T, 0> feels "weird to write" for most folks,
>>>> even though it is frequently the right choice. Vec<T> mitigates that by
>>>> "looking natural".
>>>> >> >> >
>>>> >> >> > I'm surfacing this as an RFC to get feedback on a couple
>>>> higher-level points:
>>>> >> >> > - does everybody agree that SVec<T> and Vec<T> are useful to
>>>> have?
>>>> >> >> > - get wider consensus around suggesting these as "defaults"
>>>> (see my updates to ProgrammersManual.rst in the patch)
>>>> >> >> > - how much we want to bulk migrate code vs let it organically
>>>> grow. Replacing SmallVector<T, 0> with Vec<T> should be completely
>>>> mechanical. Replacing SmallVector<T, N> for general N would be a lot more
>>>> work.
>>>> >> >> > - of course: naming. SVector/Vector were floated in the patch
>>>> as well and seem ok. SmallVec was rejected as it was a prefix of
>>>> SmallVector (messes up autocomplete).
>>>> >> >> >
>>>> >> >> > Looking forward to a world with fewer guessed SmallVector sizes,
>>>> >> >> >
>>>> >> >> > -- Sean Silva
>>>> >> >> > _______________________________________________
>>>> >> >> > LLVM Developers mailing list
>>>> >> >> > llvm-dev at lists.llvm.org
>>>> >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201116/42d0683b/attachment-0001.html>


More information about the llvm-dev mailing list