[PATCH] D18506: Add a copy constructor to StringMap

Tue Mar 29 17:59:45 PDT 2016

joker.eph added inline comments.

================
Comment at: include/llvm/ADT/StringMap.h:261
@@ +260,3 @@
+    NumItems = RHS.NumItems;
+    NumTombstones = RHS.NumTombstones;
+    for (unsigned I = 0, E = NumBuckets; I != E; ++I) {
----------------
hfinkel wrote:
> joker.eph wrote:
> > hfinkel wrote:
> > > joker.eph wrote:
> > > > I wonder if we really want to keep the tombstones or if it wouldn't be better to reinsert the elements?
> > > I don't have a good feeling for this; it seems like this could mean one of two things:
> > > 
> > >  1. We start with the same number of buckets as the RHS; in this case keeping the tombstones vs. not just affects whether we write a 0 or -1 into the pointer array for the empty slot (although insertion will prefer that slot over further probing).
> > >  2. Start with a minimal number of buckets. Potentially more memory efficient, but essentially leads to "rehashing" (which in this case means re-probing for all inputs).
> > > 
> > > What do you prefer?
> > It seems to me that when looking for an entry in the map, the search stops when it find an empty slot, but continue when there is a tombstone. If I understand your 1), it is just replacing tombstones with empty buckets, but that would break the search for existing entries that are after the tombstones.
> > 
> > The 2) is basically what I was thinking of, eventually allocating the same number of buckets, but indeed reinserting each entry (reusing the existing known hash). Not that I have a strong preference, I was rather wondering if you considered this and what is the rational for one or the other (and if there is a reason, I'd rather have it documented in the code for future reference).
> > I hope David could chime in with an opinion.
> > It seems to me that when looking for an entry in the map, the search stops when it find an empty slot, but continue when there is a tombstone. If I understand your 1), it is just replacing tombstones with empty buckets, but that would break the search for existing entries that are after the tombstones.
> 
> I believe you're correct; scratch that option.
> 
> > The 2) is basically what I was thinking of, eventually allocating the same number of buckets, but indeed reinserting each entry (reusing the existing known hash). Not that I have a strong preference, I was rather wondering if you considered this and what is the rational for one or the other (and if there is a reason, I'd rather have it documented in the code for future reference).
> 
> My thought was the following: Most StringMaps don't have many tombstones, because many use cases don't ever (or only rarely) delete things from their string maps. As a result, just copying the structure as-is (any tombstones included) will be faster than re-probing for all keys with little downside.
> 
> > I hope David could chime in with an opinion.
> 
> Most StringMaps don't have many tombstones, because many use cases don't ever (or only rarely) delete things from their string maps.

That sounds a good argument to me!

http://reviews.llvm.org/D18506