[llvm-dev] RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Sat Dec 5 10:46:55 PST 2020


On Sat, Dec 5, 2020 at 8:56 AM Chris Lattner <clattner at nondot.org> wrote:

> On Dec 3, 2020, at 10:26 AM, David Blaikie <dblaikie at gmail.com> wrote:
> I was meaning to highlight that I think a better conceptual abstraction
> than "small" (1 or more inlined objects) and "not small" (0 inlined
> objects) it'd be better to have an abstraction more like std::string -
> where the smallness isn't part of the API, as such.
>
>
> But we have SmallString as a dual to std::string precisely because we need
> that distinction.  Speaking of, the whole default argument thing should
> probably be applied to SmallString, SmallDenseMap and other types as well.
>
> I'm suggesting that we should have two types:
>
> Reduced sizeof(t) and guaranteed iterator validity over swap/move (ie:
> must not use an inline buffer)
> The rest: Like std::string. It might have an inline buffer, it might have
> zero, depending on size, etc. But importantly it's not guaranteed to
> maintain iterator/pointer validity on move/swap, and sizeof is optimized
> for stack usage.
>
> Neither of these really make sense being called "SmallVector" I suppose.
> And I'd dislike calling (2) "Vector" even though it's likely the more
> popular one - because it'd have subtly different semantics from std::vector
> regarding iterator invalidation.
>
>
> I think I see what you’re going for here, and I agree that std::string is
> a slightly different situation than std::vector given that some
> implementations have a small string optimizations already.
>
> I feel like you’re prioritizing the iterator invalidation behavior as the
> core difference between the two types, whereas I’m prioritizing the
> performance/allocation-behavior aspect.  I feel like this is a major
> difference in practice that API users should think about, and they should
> be prepared to deal with the iterator invalidation issues as necessary.
>
> Is the "Reduced sizeof(t) and guaranteed iterator validity over swap/move
> (ie: must not use an inline buffer)” use case important enough to have a
> “string-y” type?
>

I think things got confused again - reduced sizeof/guaranteed iterator
validity isn't a string-y type (because of the iterator validity guarantee
such a type can't use an inline buffer - unlike std::string), it's what
we're talking about/potentially calling llvm::Vector.


>  As you say, we don’t have a solution for this in the current code other
> than std::vector.  I haven’t heard of this being a significant enough
> problem to be worth “fixing”, and I don’t think we can drop the inline vs
> out-of-line distinction.
>

Hmm - I think I'm still miscommunicating/not making sense.

I think there's value in having a type that has a guarantee of
no-inline-storage - when you want to move and maintain iterator validity.
But I'm wondering if the type with possibly using inline storage could be
thought of more like std::string (ie: define the contract: iterators
invalidated on move) rather than like SmallVector (ie: "must have an inline
buffer"). I don't know what we would call this thing - but my point is to
try to move away from the name dictating the implementation detail of
having an inline buffer (we could still have SmallVector<T, N> for when you
specifically want an inline buffer) but OtherThingVector<T> would not
require an inline buffer it would be "a type without iterator validity on
move that has, or doesn't have, an inline buffer as it chooses to based on
performance tradeoffs".

I think, to me, this would be better than "SmallVector<T> must have at
least 1 inline element because of the word "Small" in the name" if we moved
away from having Small in the name, and towards some sense of the contract
& leaving the implementation to make the choice of how to best implement
that contract.

But I think I'm going around in circles/not necessarily making sense here.
Perhaps a chat on discord or something would be more effective?

> So I think (2) has some legs - but the problem for me is that if we name
>> the type llvm::Vector, which already gets a bit close to std::vector
>> (unqualified "Vector" seems like fairly subtle code to me - but I'm leaning
>> towards being OK with that aspect if other peopel are) and it has different
>> iterator invalidation/move semantics than std::vector, I think that could
>> be problematic.
>>
>>
>> Not sure if I want to defend this, but we have lots of precedent for
>> this, including llvm::sort etc.
>>
>
> For sure - I'm not trying to advocate for avoiding having a std::vector
> replacement in LLVM (excuse the several negatives there). And some
> differences between the C++ standard APIs and the LLVM ones is expected -
> llvm::sort's and many of the other alternatives are fairly "obvious" I'd
> say, it sorts a range instead of a pair of iterators - I don't think anyone
> would find that surprising. Something with subtly different
> iterator/pointer invalidation semantics under a near exactly matching name
> wouldn't be such a good fit, though.
>
>
> The difference I meant was that it forwards transparently to
> array_pod_sort / qsort which is a pretty big behavioral difference.
>

Ah, more llvm::sort(iter, iter) - yeah, hadn't thought about that one.
Different performance characteristics, but the same contract, right? So
this is working around sub-optimal standard library implementations, but
not providing a different contract/requirements, is it?

I think the distinction I'm trying to draw here - is that we're not talking
about "a better std::vector" - it's something with a different contract.
The thing we're talking about as llvm::Vector is pretty close to "a better
std::vector" within the "a world without exceptions" alternate reality of
C++ that the C++ standard doesn't acknowledge.

But SmallVector isn't like that, it has a difference in the contract
(iterator invalidation being the major part of that, in terms of observable
contract - similarly with DenseMap having such iterator invalidation (& the
need for a tombstone and empty value)) - and I'm trying to suggest that
maybe it's worth thinking about in a way that's different from the way it
seems like we've historically thought about it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201205/d0c70212/attachment.html>


More information about the llvm-dev mailing list