[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

Mon Sep 14 12:23:52 PDT 2015

I should note, when I was writing my simplifyInstructionsInBlock replacement, I got a similarly large speedup (~1000ms -> 700ms) with the trick where I only added instructions to the worklist if they needed to be /revisited/, as opposed to initting the entire worklist with all the instructions. This is likely a similar gain as to what I got here because it’s the difference between “put everything in the function in a set” and “putting only a few things from the function in a set”, so the gain from Not Putting Tons Of Stuff In A Set seems to be pretty consistent across widely varying use-cases.

—escha

> On Sep 14, 2015, at 12:03 PM, Philip Reames <listmail at philipreames.com> wrote:
> 
> Given the use case here, I somewhat agree.  In the common case, you're going to need an extra O(I) memory to represent the alive set.  However, there are definitely ways to do better without needing transient data stored in the instruction itself.
> 
> Have you considered a bloom filter or something equivalent for testing aliveness?  We don't actually need a precise result here - right?.  If we're willing to tolerate a slightly imprecise result (in theory), using a bloom filter could give noticeable memory reduction.
> 
> p.s. I disagree with Daniel on sizing.  While linear search over a 128 element array is somewhat slow, memory allocation/churn can be far far worse.  I haven't looked at this particular use case, but in general, larger small sizes for frequently called functions can make a lot of sense.
> 
> Philip
> 
> On 09/14/2015 11:41 AM, escha via llvm-dev wrote:
>> I tried changing it to 16 and only saw a very small speedup. I’m pretty sure that no matter how you slice it, adding every single instruction in a function to a set is going to be expensive.
>> 
>> —escha
>> 
>>> On Sep 14, 2015, at 11:34 AM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>> 
>>> I did something similar for dominators, for GVN, etc.
>>> All see significant speedups.
>>> 
>>> However, the answer i got back when i mentioned this was "things like
>>> ptrset and densemap should only have a small performance difference
>>> from side data when used and sized right", and i've found this to
>>> mostly be true after looking harder.
>>> 
>>> In the case you are looking at, i see:
>>> 
>>> -  SmallPtrSet<Instruction*, 128> Alive;
>>> 
>>> This seems ... wrong.
>>> In fact, it seems optimally bad.
>>> 
>>> 
>>> This says the small ptr set size is 128.
>>> That is, the smallsize is 128.
>>> 
>>> SmallPtrSet will linear search the array when it's size <= smallsize,
>>> and otherwise fall back to building a non-small array and using better
>>> algorithms.
>>> 
>>> Linear searching a 128 member array == slow
>>> 
>>> I bet if you change the 128 to 8, you will see significant speedups.
>>> 
>>> 
>>> 
>>> On Mon, Sep 14, 2015 at 11:23 AM, Steve King via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>> Are there any other passes that could benefit from having a single bit (or similarly small amount) of per-Instruction metadata local to the pass, i.e. to avoid having to keep a side-Set of instructions...
>>>> FWIW, I have a target specific pass that needs exactly this sort of
>>>> tracking.  Once you offer 1 bit, it's a slippery slope to giving a
>>>> whole pointer.  I like your idea regardless.
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>