[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

Wed Sep 16 07:16:21 PDT 2015

On Tue, Sep 15, 2015 at 11:56 PM, escha <escha at apple.com> wrote:
>
>> On Sep 15, 2015, at 8:47 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>
>> On Tue, Sep 15, 2015 at 7:16 PM, escha <escha at apple.com> wrote:
>>> You mean the input test data? I was testing performance using our offline perf suite (which is a ton of out of tree code), so it’s not something I can share, but I imagine any similar test suite put through a typical compiler pipeline will exercise ADCE in similar ways. ADCE’s cost is pretty much the sum of (cost of managing the set) + (cost of eraseinstruction), which in our case turns out to be 1/3 the former and 2/3 the latter (roughly).
>>
>> I asked because I can't really reproduce the results of your massive
>> speedups on even the larger testcases i have around, certainly not on
>> the order you reported.
>>
>> I can get a speedup of about 10%  by changing the smallptrsetsize to
>> be 16 from 128.
>> I can get a reduction of about 15% by using a visited bit (IE half the
>> speedup you report).
>> I can get roughly the same speedup by using denseset, etc.
>
> Hmm, interesting! I guess it could be a variation in either:
>
> a) the number of instructions per function on average
> or
> b) the amount of stuff that needs to be DCE’d on average (less stuff -> more time spent in set maintenance, comparatively)
>

I suspect there are other differences as well, related to memory
allocation/cache behavior depending on the platform.

Note that the fastest variant i've gotten now is actually:

std::unordered_set sized to some estimate of the function size (IE
F.size() * 25).

I expect, however, that this depends even more heavily on the c++
library impl used/platform behavior, so i wouldn't recommend it.

I do think we should have some ideal way to  manage these kinds of
"set of possibly all instructions".  The GVN rewrite also used to
spend 20+% of its time managing a touched/visited set for instructions
(and the total time is about 4-5x what ADCE is).

It now uses a  unique id scheme/bitmaps to manage that set, costing about 0.5%

See https://github.com/dberlin/llvm-gvn-rewrite/blob/newgvn/lib/Transforms/Scalar/NewGVN.cpp
and search for TouchedInstructions.

I don't necessarily recommend this for ADCE mind you, just pointing
out that i agree there is an underlying problem we should be solving
here.

--Dan