[llvm-commits] PATCH: Implement a SmallDenseMap container

Chandler Carruth chandlerc at gmail.com
Sat Jun 16 20:16:46 PDT 2012


On Sat, Jun 16, 2012 at 11:53 AM, Benjamin Kramer
<benny.kra at googlemail.com>wrote:

>
> On 16.06.2012, at 16:05, Chandler Carruth wrote:
>
> > And here we are!
> >
> > Functional highlights:
> >
> > - Inline buffer will have proper alignment and size constraints, with no
> aliasing issues AFAICT.
> > - Inline buffer re-uses the space typically occupied by the buffer
> pointer and buffer size, allowing for example SmallDenseMap<unsigned,
> unsigned, 2> to be the same size as DenseMap<unsigned, unsigned>.
> > - Supports any types DenseMap supports, including those with non-trivial
> constructors, destructors, copy constructors, etc.
> > - Significant attempts to minimize copying, but there is more to do here
> (see below).
> > - Tests are automatically extended to cover the new implementation.
> >
> > All that said, this patch is still quite rough I'm afraid. There is
> quite a bit of duplicated and/or fiddly code that I would like to pull out.
> It also needs lots of commenting.
> >
> > It is surprisingly hard to implement the semantics of a hash-based map
> with the small-buffer optimization. This works, but it is far from clean. I
> hope it's still acceptable, as I suspect it is worth it in many situations.
> >
> > I can either try to hack on this patch until it's a bit cleaner and then
> submit, or I can submit, and then clean it up incrementally as I go. I only
> have a mild preference for the latter approach.
> >
> > I *think* I have all of the construction / destruction semantics
> correct, and I have most of the move semantics correct but not all of them.
> Hardening this and adding some tests that exercise at least correct
> invocation of constructors and destructors will be important follow-ups to
> this.
> >
> > Anyways, comments welcome!
>
> This looks very good already, though I'm not entirely sure the downsides
> of this design are worth it.
>
> Is merging the pointer for the big case and the data for the small case
> really worth adding a branch to each lookup of the bucket number/bucket
> table? Since you are using the same lookup functions for both the big and
> the small case you could just have a pointer in the base class pointing to
> the small array or a malloc'd array.


This is a fundamental design question w/ any small-inline-buffer optimized
container, and I don't know if there is a definite answer...

One interesting note is that I suspect the code complexity on, say,
operator[] matters nearly as much in this case as it would with
SmallVector. With SmallVector, the cost of getting an element is loading
from an offset pointer. Done.

With dense map, in the best case, we have to do a lot more even in the
hottest, best case scenario:

1) Check that its not empty.
2) Hash the key.
3) Get the bucket pointer (in small implementation i have)
4) Offset based on the hash
5) Compare the found keys

There are going to be quite a few branches here, one more seems unlikely to
make or break anything. So my intuition is that here, we might as well go
crazy tuning for space.


> That would simplify the code a lot, getBuckets() could go away.


Honestly, that's not the worst of the code IMO. I don't think this would
make a big enough difference. The tricky parts involve swapping and copying
in mixed small-ness scenarios.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120616/e8b80bf4/attachment.html>


More information about the llvm-commits mailing list