[llvm-dev] RFC: DenseMap grow() slowness

Xinliang David Li via llvm-dev llvm-dev at lists.llvm.org
Tue Mar 15 16:21:08 PDT 2016


On Tue, Mar 15, 2016 at 3:07 PM, via llvm-dev <llvm-dev at lists.llvm.org>
wrote:

> There’s a few passes in LLVM that make heavy use of a big DenseMap, one
> that potentially gets filled with up to 1 entry for each instruction in the
> function. EarlyCSE is the best example, but Reassociate and MachineCSE have
> this to some degree as well (there might be others?). To put it simply: at
> least in my profile, EarlyCSE spends ~1/5 of its time growing DenseMaps.
> This is kind of… bad.
>
> grow() is inherently slow because it needs to rehash and reinsert
> everything. This means growing a DenseMap costs much, much more than
> growing, for example, a vector. I talked about this with a few people and
> here are some possibilities we’ve come up with to improve this (some of
> which probably aren’t what we want):
>
> 1. Use a map that doesn’t require rehashing and reinsertion to grow.
> Chaining lets you do this, but std::unordered_map is probably so much
> slower than DenseMap we’d lose more than we gain.
> 2. Include the hash code in the map so that we don’t have to rehash. 32
> bits more per entry (or whatever), and it might not help that much, since
> we still have to do the whole reinsertion routine.
>


Another choice is to implement DenseMap using segmented array (of buckets)
-- the segment size is fixed so finding a bucket using bucket number can be
pretty cheap : segment number + segment offset.

With this, there is no need for resizing.

David



> 3. Pre-calculate an estimate as to the map size we need. For example, in
> EarlyCSE, this is possibly gross overestimate of size needed:
>
>   unsigned InstCount = 0;
>   unsigned LoadCount = 0;
>   unsigned CallCount = 0;
>   for (inst_iterator FI = inst_begin(F), FE = inst_end(F); FI != FE;
> ++FI) {
>     if (FI->mayReadOrWriteMemory())
>       ++LoadCount;
>     else if (isa<CallInst>(*FI))
>       ++CallCount;
>     else
>       ++InstCount;
>   }
>   AvailableValues.resize(InstCount);
>   AvailableLoads.resize(LoadCount);
>   AvailableCalls.resize(CallCount);
>
> But it does the job, and saves ~20% of time in EarlyCSE on my profiles.
> Yes, iterating over the entire function is way cheaper than grow().
> Downsides are that while it’s still bounded by function size, it could end
> up allocating a good bit more depending on — in EarlyCSE’s case — the
> control flow/dominator structure.
>
> Any thoughts on this, or other less ugly alternatives? I estimate that, at
> least in our pass pipeline, we’re losing at least ~1% of total time to
> avoidable DenseMap::grow() operations, which feels a little bit…
> unnecessary.
>
> —escha
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/b30bc501/attachment.html>


More information about the llvm-dev mailing list