[PATCH] D18506: Add a copy constructor to StringMap

Tue Mar 29 22:37:27 PDT 2016

On Tue, Mar 29, 2016 at 10:34 PM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Tue, Mar 29, 2016 at 5:53 PM, Hal Finkel via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> hfinkel added inline comments.
>>
>> ================
>> Comment at: include/llvm/ADT/StringMap.h:261
>> @@ +260,3 @@
>> +    NumItems = RHS.NumItems;
>> +    NumTombstones = RHS.NumTombstones;
>> +    for (unsigned I = 0, E = NumBuckets; I != E; ++I) {
>> ----------------
>> joker.eph wrote:
>> > hfinkel wrote:
>> > > joker.eph wrote:
>> > > > I wonder if we really want to keep the tombstones or if it wouldn't
>> be better to reinsert the elements?
>> > > I don't have a good feeling for this; it seems like this could mean
>> one of two things:
>> > >
>> > >  1. We start with the same number of buckets as the RHS; in this case
>> keeping the tombstones vs. not just affects whether we write a 0 or -1 into
>> the pointer array for the empty slot (although insertion will prefer that
>> slot over further probing).
>> > >  2. Start with a minimal number of buckets. Potentially more memory
>> efficient, but essentially leads to "rehashing" (which in this case means
>> re-probing for all inputs).
>> > >
>> > > What do you prefer?
>> > It seems to me that when looking for an entry in the map, the search
>> stops when it find an empty slot, but continue when there is a tombstone.
>> If I understand your 1), it is just replacing tombstones with empty
>> buckets, but that would break the search for existing entries that are
>> after the tombstones.
>> >
>> > The 2) is basically what I was thinking of, eventually allocating the
>> same number of buckets, but indeed reinserting each entry (reusing the
>> existing known hash). Not that I have a strong preference, I was rather
>> wondering if you considered this and what is the rational for one or the
>> other (and if there is a reason, I'd rather have it documented in the code
>> for future reference).
>> > I hope David could chime in with an opinion.
>> > It seems to me that when looking for an entry in the map, the search
>> stops when it find an empty slot, but continue when there is a tombstone.
>> If I understand your 1), it is just replacing tombstones with empty
>> buckets, but that would break the search for existing entries that are
>> after the tombstones.
>>
>> I believe you're correct; scratch that option.
>>
>> > The 2) is basically what I was thinking of, eventually allocating the
>> same number of buckets, but indeed reinserting each entry (reusing the
>> existing known hash). Not that I have a strong preference, I was rather
>> wondering if you considered this and what is the rational for one or the
>> other (and if there is a reason, I'd rather have it documented in the code
>> for future reference).
>>
>> My thought was the following: Most StringMaps don't have many tombstones,
>> because many use cases don't ever (or only rarely) delete things from their
>> string maps. As a result, just copying the structure as-is (any tombstones
>> included) will be faster than re-probing for all keys with little downside.
>>
>
> Data might be nice, if you happen to have a way to gather some easily at
> hand.
>

In lieu of data, at least a comment describing the guesswork that's been
applied & other options that might be relevant/tested by someone who's
interested/motivated at some point.

>
>
>>
>> > I hope David could chime in with an opinion.
>>
>>
>>
>> http://reviews.llvm.org/D18506
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160329/ace65902/attachment.html>