[LLVMdev] [GSoC] Optimizing for size

Tue Apr 5 23:43:02 PDT 2011

On Tue, Apr 5, 2011 at 8:29 PM, Sean Bartell <wingedtachikoma at gmail.com> wrote:
> Hi all,

Hi Sean,

Last year, I've written master's thesis on MergeBlocks optimization on
LLVM IR level (in St.-Petersburg State Polytechnical University),
supervised by Anton Korobeynikov.

I'll comment on MergeBlocks part of your proposal.

> After working on existing passes, I would add a new MergeBlocks pass. This is
> an IPO pass that would combine equivalent basic blocks, extracting them into
> new functions. Research has shown that this can decrease code size by 5-6%. The
> new pass will be based on CodeExtractor and MergeFunctions; it will create a
> hash table of every basic block, based on the number of instructions of
> different types, and then perform a detailed comparison on blocks with the same
> hash. I would first write an inefficient preliminary version that only finds
> obvious cases. This can be used to get an impression of the technique's
> effectiveness. I would then work on making it more efficient and able to merge
> blocks in more difficult cases.

My implementation of MergeBasicBlocks pass (mergebb for short) managed
to reduce code size by ~3%, on some samples from the LLVM test suite,
on both x86 and ARM.

Detecting mergeable basic blocks is pretty easy and straightforward:
you can introduce an order on them, and detect all mergeable groups in
N log N time (very fast in practice, event without any hashing).

The hard part is deciding, if a particular group is worth extracting
into a function. I ended up using simple heurisitc (like block size *
A — n_inputs * B — n_outputs * C), and hand-picked magic A, B, and C
coefficients for every platform. This may not be the best way to do
it; I believe that one should use the code generator to actually
measure the real difference that extracting every particular basic
blocks group will make. This, however, may be really slow.

To conclude, I'd say that, in my mind:
— LLVM IR is not well suited for mergebb pass (at least if not using
codegen to decide which blocks are worth merging). While it is very
easy to detect equivalent blocks, and extract them into functions, you
have almost no control on the code generation, which is crucial for
this kind of optimization. Once again, if not using codegen for each
basic blocks group, you have no good way to tell if a particular group
is worth extracting at all.
— Platform-specific low-level code generation tweaks and traditional
-Os (-O3 without things that blow the size up) have an order of
magnitude bigger impact on the size of generated code, and are «the
way to go».

                Gregory