[llvm-dev] RFC: DenseMap grow() slowness
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Tue Mar 15 16:56:01 PDT 2016
On 03/15/2016 04:22 PM, escha at apple.com wrote:
>
>> On Mar 15, 2016, at 4:09 PM, Philip Reames <listmail at philipreames.com
>> <mailto:listmail at philipreames.com>> wrote:
>>
>>
>>
>> On 03/15/2016 03:07 PM, via llvm-dev wrote:
>>> There’s a few passes in LLVM that make heavy use of a big DenseMap,
>>> one that potentially gets filled with up to 1 entry for each
>>> instruction in the function. EarlyCSE is the best example, but
>>> Reassociate and MachineCSE have this to some degree as well (there
>>> might be others?). To put it simply: at least in my profile,
>>> EarlyCSE spends ~1/5 of its time growing DenseMaps. This is kind of…
>>> bad.
>>>
>>> grow() is inherently slow because it needs to rehash and reinsert
>>> everything. This means growing a DenseMap costs much, much more than
>>> growing, for example, a vector. I talked about this with a few
>>> people and here are some possibilities we’ve come up with to improve
>>> this (some of which probably aren’t what we want):
>>>
>>> 1. Use a map that doesn’t require rehashing and reinsertion to grow.
>>> Chaining lets you do this, but std::unordered_map is probably so
>>> much slower than DenseMap we’d lose more than we gain.
>>> 2. Include the hash code in the map so that we don’t have to rehash.
>>> 32 bits more per entry (or whatever), and it might not help that
>>> much, since we still have to do the whole reinsertion routine.
>>> 3. Pre-calculate an estimate as to the map size we need. For
>>> example, in EarlyCSE, this is possibly gross overestimate of size
>>> needed:
>>>
>>> unsigned InstCount = 0;
>>> unsigned LoadCount = 0;
>>> unsigned CallCount = 0;
>>> for (inst_iterator FI = inst_begin(F), FE = inst_end(F); FI != FE;
>>> ++FI) {
>>> if(FI->mayReadOrWriteMemory())
>>> ++LoadCount;
>>> else if (isa<CallInst>(*FI))
>>> ++CallCount;
>>> else
>>> ++InstCount;
>>> }
>>> AvailableValues.resize(InstCount);
>>> AvailableLoads.resize(LoadCount);
>>> AvailableCalls.resize(CallCount);
>>>
>>> But it does the job, and saves ~20% of time in EarlyCSE on my
>>> profiles. Yes, iterating over the entire function is way cheaper
>>> than grow(). Downsides are that while it’s still bounded by function
>>> size, it could end up allocating a good bit more depending on — in
>>> EarlyCSE’s case — the control flow/dominator structure.
>>>
>>> Any thoughts on this, or other less ugly alternatives? I estimate
>>> that, at least in our pass pipeline, we’re losing at least ~1% of
>>> total time to avoidable DenseMap::grow() operations, which feels a
>>> little bit… unnecessary.
>>
>> Slightly OT, but it looks like the LoadValue (value type of the
>> AvailableLoads structure) is relatively memory inefficient. At
>> minimum, we could get rid of the IsAtomic space by using a tagged
>> pointer. That would at least bring us down to a 128 bits (a nice
>> power of two). That might help speed up some of the copying.
>
> Just to note, it looks like I was testing on a somewhat older LLVM
> version that didn’t have LoadValue at all, so my guess is this means
> it’s even -worse- in trunk.
Er, LoadValue's been around for a while (6 months). How far back are
you testing? I'd strongly suggest switching to something more recent.
Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/6af5cf32/attachment.html>
More information about the llvm-dev
mailing list