[llvm-dev] RFC: DenseMap grow() slowness

Tue Mar 15 16:56:01 PDT 2016


On 03/15/2016 04:22 PM, escha at apple.com wrote:
>
>> On Mar 15, 2016, at 4:09 PM, Philip Reames <listmail at philipreames.com 
>> <mailto:listmail at philipreames.com>> wrote:
>>
>>
>>
>> On 03/15/2016 03:07 PM, via llvm-dev wrote:
>>> There’s a few passes in LLVM that make heavy use of a big DenseMap, 
>>> one that potentially gets filled with up to 1 entry for each 
>>> instruction in the function. EarlyCSE is the best example, but 
>>> Reassociate and MachineCSE have this to some degree as well (there 
>>> might be others?). To put it simply: at least in my profile, 
>>> EarlyCSE spends ~1/5 of its time growing DenseMaps. This is kind of… 
>>> bad.
>>>
>>> grow() is inherently slow because it needs to rehash and reinsert 
>>> everything. This means growing a DenseMap costs much, much more than 
>>> growing, for example, a vector. I talked about this with a few 
>>> people and here are some possibilities we’ve come up with to improve 
>>> this (some of which probably aren’t what we want):
>>>
>>> 1. Use a map that doesn’t require rehashing and reinsertion to grow. 
>>> Chaining lets you do this, but std::unordered_map is probably so 
>>> much slower than DenseMap we’d lose more than we gain.
>>> 2. Include the hash code in the map so that we don’t have to rehash. 
>>> 32 bits more per entry (or whatever), and it might not help that 
>>> much, since we still have to do the whole reinsertion routine.
>>> 3. Pre-calculate an estimate as to the map size we need. For 
>>> example, in EarlyCSE, this is possibly gross overestimate of size 
>>> needed:
>>>
>>> unsigned InstCount = 0;
>>> unsigned LoadCount = 0;
>>> unsigned CallCount = 0;
>>> for (inst_iterator FI = inst_begin(F), FE = inst_end(F); FI != FE; 
>>> ++FI) {
>>> if(FI->mayReadOrWriteMemory())
>>> ++LoadCount;
>>> else if (isa<CallInst>(*FI))
>>> ++CallCount;
>>> else
>>> ++InstCount;
>>>   }
>>> AvailableValues.resize(InstCount);
>>> AvailableLoads.resize(LoadCount);
>>> AvailableCalls.resize(CallCount);
>>>
>>> But it does the job, and saves ~20% of time in EarlyCSE on my 
>>> profiles. Yes, iterating over the entire function is way cheaper 
>>> than grow(). Downsides are that while it’s still bounded by function 
>>> size, it could end up allocating a good bit more depending on — in 
>>> EarlyCSE’s case — the control flow/dominator structure.
>>>
>>> Any thoughts on this, or other less ugly alternatives? I estimate 
>>> that, at least in our pass pipeline, we’re losing at least ~1% of 
>>> total time to avoidable DenseMap::grow() operations, which feels a 
>>> little bit… unnecessary.
>>
>> Slightly OT, but it looks like the LoadValue (value type of the 
>> AvailableLoads structure) is relatively memory inefficient.  At 
>> minimum, we could get rid of the IsAtomic space by using a tagged 
>> pointer.  That would at least bring us down to a 128 bits (a nice 
>> power of two).  That might help speed up some of the copying.
>
> Just to note, it looks like I was testing on a somewhat older LLVM 
> version that didn’t have LoadValue at all, so my guess is this means 
> it’s even -worse- in trunk.
Er, LoadValue's been around for a while (6 months).  How far back are 
you testing?  I'd strongly suggest switching to something more recent.

Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/6af5cf32/attachment.html>