[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
Daniel Berlin via llvm-dev
llvm-dev at lists.llvm.org
Wed Sep 16 07:16:21 PDT 2015
On Tue, Sep 15, 2015 at 11:56 PM, escha <escha at apple.com> wrote:
>> On Sep 15, 2015, at 8:47 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>> On Tue, Sep 15, 2015 at 7:16 PM, escha <escha at apple.com> wrote:
>>> You mean the input test data? I was testing performance using our offline perf suite (which is a ton of out of tree code), so it’s not something I can share, but I imagine any similar test suite put through a typical compiler pipeline will exercise ADCE in similar ways. ADCE’s cost is pretty much the sum of (cost of managing the set) + (cost of eraseinstruction), which in our case turns out to be 1/3 the former and 2/3 the latter (roughly).
>> I asked because I can't really reproduce the results of your massive
>> speedups on even the larger testcases i have around, certainly not on
>> the order you reported.
>> I can get a speedup of about 10% by changing the smallptrsetsize to
>> be 16 from 128.
>> I can get a reduction of about 15% by using a visited bit (IE half the
>> speedup you report).
>> I can get roughly the same speedup by using denseset, etc.
> Hmm, interesting! I guess it could be a variation in either:
> a) the number of instructions per function on average
> b) the amount of stuff that needs to be DCE’d on average (less stuff -> more time spent in set maintenance, comparatively)
I suspect there are other differences as well, related to memory
allocation/cache behavior depending on the platform.
Note that the fastest variant i've gotten now is actually:
std::unordered_set sized to some estimate of the function size (IE
F.size() * 25).
I expect, however, that this depends even more heavily on the c++
library impl used/platform behavior, so i wouldn't recommend it.
I do think we should have some ideal way to manage these kinds of
"set of possibly all instructions". The GVN rewrite also used to
spend 20+% of its time managing a touched/visited set for instructions
(and the total time is about 4-5x what ADCE is).
It now uses a unique id scheme/bitmaps to manage that set, costing about 0.5%
and search for TouchedInstructions.
I don't necessarily recommend this for ADCE mind you, just pointing
out that i agree there is an underlying problem we should be solving
More information about the llvm-dev