[llvm-dev] RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.

Sat Dec 5 12:24:28 PST 2020

On Sat, Dec 5, 2020 at 12:00 PM Chris Lattner <clattner at nondot.org> wrote:

> On Dec 5, 2020, at 10:46 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>> I think I see what you’re going for here, and I agree that std::string is
>> a slightly different situation than std::vector given that some
>> implementations have a small string optimizations already.
>>
>> I feel like you’re prioritizing the iterator invalidation behavior as the
>> core difference between the two types, whereas I’m prioritizing the
>> performance/allocation-behavior aspect.  I feel like this is a major
>> difference in practice that API users should think about, and they should
>> be prepared to deal with the iterator invalidation issues as necessary.
>>
>> Is the "Reduced sizeof(t) and guaranteed iterator validity over swap/move
>> (ie: must not use an inline buffer)” use case important enough to have a
>> “string-y” type?
>>
>
> I think things got confused again - reduced sizeof/guaranteed iterator
> validity isn't a string-y type (because of the iterator validity guarantee
> such a type can't use an inline buffer - unlike std::string), it's what
> we're talking about/potentially calling llvm::Vector.
>
>
> Ok, I thought you were talking about bringing that idea to string-y types
> because std::string doesn’t provide it..
>

Oh, right, sorry, yeah - not proposing any new string abstractions. Instead
trying to draw analogy between how std::string is specified in the standard
with how (name notwithstanding) SmallVector could be specified. Mostly I
just don't have a good name for the entity I'm trying to describe.
"Something like std::vector, but with the iterator invalidation semantics
of std::string such that this thing can use a small inline buffer if it's
beneficial" without it /having/ to use such an inline buffer, if, say, the
sizeof(T) is too large to be reasonable. And that I think this would be a
good thing so users could describe their willingness to accept those
semantics "I guarantee I don't care about iterator validity over move" and
then the implementation would "do the right thing" given that constraint -
if T happens to be large, it'd potentially do the same thing as
llvm::Vector, but the user wouldn't have to look at the sizeof(T) and bake
that decision in, instead letting the cointainer make that choice and
letting that choice remain correct/up-to-date as the shape/size of T
changes over time. I think this would make the code more maintainable - the
likely-right implementation tradeoffs happen as code changes, having to
revisit the choice of data structure less often.

>
>
>  As you say, we don’t have a solution for this in the current code other
>> than std::vector.  I haven’t heard of this being a significant enough
>> problem to be worth “fixing”, and I don’t think we can drop the inline vs
>> out-of-line distinction.
>>
>
> Hmm - I think I'm still miscommunicating/not making sense.
>
> I think there's value in having a type that has a guarantee of
> no-inline-storage - when you want to move and maintain iterator validity.
>
>
> Right, in the vector domain, this is the llvm::Vector type(def).
>
> But I'm wondering if the type with possibly using inline storage could be
> thought of more like std::string (ie: define the contract: iterators
> invalidated on move) rather than like SmallVector (ie: "must have an inline
> buffer"). I don't know what we would call this thing - but my point is to
> try to move away from the name dictating the implementation detail of
> having an inline buffer (we could still have SmallVector<T, N> for when you
> specifically want an inline buffer) but OtherThingVector<T> would not
> require an inline buffer it would be "a type without iterator validity on
> move that has, or doesn't have, an inline buffer as it chooses to based on
> performance tradeoffs".
>
> I think, to me, this would be better than "SmallVector<T> must have at
> least 1 inline element because of the word "Small" in the name" if we moved
> away from having Small in the name, and towards some sense of the contract
> & leaving the implementation to make the choice of how to best implement
> that contract.
>
> But I think I'm going around in circles/not necessarily making sense here.
> Perhaps a chat on discord or something would be more effective?
>
>
> I think I understand what you’re saying: you’re focusing on making
> iterator invalidation semantics primary instead of memory layout.
>

In the sense that the difference in iterator invalidation semantics are the
element that allows the implementation to make different choices in memory
layout.

>  This is a reasonable thing to do, but I don’t think we should go with a
> design that ignores the “in place ness”.
>

I'm suggesting somewhat ignoring the "in place nesss" in terms of the API
design, but not implementation. std::string's spec doesn't speak about "in
place ness", but it does talk about iterator invalidation & specifies it in
a way that allows "in place ness" & leaves the implementation scope to
choose what that looks like - how large the in place buffer is, how it's
implemented (do you have a pointer that points back into the object itself
for the in-place buffer, or some bit squirreled away to determine big/small
and allow use of the pointer bits for inline storage for more
compact/longer effective inline storage, etc).

That doesn't mean the implementation choices are totally invisible, and the
spec doesn't tend to talk about the sizeof of C++ standard library types -
a std::string could have sizeof 512 and be conforming, but would be
gratuitously suboptimal, but we're generally accepting that it maybe sizeof
> 3 words but not hugely moreso.

I don't think we should move to a point where we don't have the ability to
specify the inline buffer size when it's worth doing so for optimization
purposes. But for when the size isn't being specified, I think it may be
more maintainable if it's thought of more about the contract and let the
implementation choose - including possibly choosing 0.

>  Maybe there is a “best of both worlds” design proposal that could make
> sense here?
>
> The difference I meant was that it forwards transparently to
>> array_pod_sort / qsort which is a pretty big behavioral difference.
>>
>
> Ah, more llvm::sort(iter, iter) - yeah, hadn't thought about that one.
> Different performance characteristics, but the same contract, right? So
> this is working around sub-optimal standard library implementations, but
> not providing a different contract/requirements, is it?
>
>
> It’s really a “similar but different” API.  I works the same in the easy
> cases, but doesn’t have the same set of contracts, that was my only
> analogy.
>

Ah, perhaps I'm missing the detail: What part of the contract is different
between llvm::sort(iter, iter) and std::sort(iter, iter)?

> There are lots of other APIs that are similar-but-different in innumerable
> ways that provide tradeoffs e.g. DenseMap vs std::map, etc.  The
> programmer’s manual has a long list of options :-)
>

For sure

- Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201205/21718a57/attachment.html>