[LLVMdev] mem2reg optimization
nicolas at capens.net
Mon Oct 6 22:58:33 PDT 2008
Sounds like an excellent generic approach. I'd love to lend a hand to
implement it but I'm afraid it involves a little more in-depth LLVM
knowledge than what I currently master, so I'll leave it to Dave and you for
Talking about performance, for me this optimization was the difference
between mem2reg taking 63% of execution time, and having it totally
'vanish'. It also marked the transition between rather using a custom JIT
compiler and getting really excited to use LLVM instead. :) Previously I
just assumed LLVM to be inherently slow but now I realize there's still
potential for optimization.
Do you consider it useful if I go ahead and further identify bottlenecks and
report them or would you rather concentrate on correctness bugs for now? Is
there any list of known optimization opportunities?
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Chris Lattner
Sent: Sunday, 05 October, 2008 00:05
To: LLVM Developers Mailing List
Subject: Re: [LLVMdev] mem2reg optimization
On Oct 4, 2008, at 2:51 PM, Chris Lattner wrote:
>>> I like your approach of using the use lists but I'm not sure the
>>> is guaranteed. If it is, your approach is superior.
>> I got my patch updated to work with TOT. Here it is. Comments
> Hi Dave,
> Great. I'd like to get this in, but would really like to do a profile
> first. Could you please include a testcase? Even if it is something
> contrived with llvm-gcc, I would really like to understand what is
> going on.
Looking more closely at the approach, I infer that the (logical in
hindsight) issue are cases where mem2reg scans a BB to see the
relative ordering of two instructions. For example, the single store
case. I could imagine that if you saw lots of single store allocas in
the middle of a large BB that mem2reg would be very very slow.
As far as the approach goes, here's a counterproposal that is a bit
less complex. I suggest adding a single DenseMap<Instruction*,
unsigned> to Mem2reg. If there is an entry in this map for a load/
store instruction, it would store the instruction's index in its basic
block. This means that to check dominance of two instrs in a block,
you just need to compare their two indices (which is fast). Really
small blocks (< 10 instructions?) would never be entered into map, to
keep the size of the map low. The first time a query is made within a
large block, it would compute the indices of all loads and stores
within the block.
The nice thing about using a single map is that it makes updating the
map really trivial. Just use TheMap.erase(I) when deleting load and
store instructions. The numbering can be purely local to a block and
the relative ordering of memops don't get invalidated when a load or
store is nuked.
Does this seem like a reasonable approach? If so, please split the
various parts of mem2reg out into helper methods as a prepatch. This
will make review of the subsequent changes much easier.
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
More information about the llvm-dev