[PATCH] D22051: MergeSimilarFunctions: a code size pass to merge functions with small differences

Thu Jul 7 15:49:43 PDT 2016

jfb added a subscriber: jrkoenig.
jfb added a comment.

In http://reviews.llvm.org/D22051#477450, @tobiasvk wrote:

> In http://reviews.llvm.org/D22051#476003, @jfb wrote:
>
> > Could you detail how this is different from MergeFunc, and what it would take to add the new capabilities to MergeFunc instead of duplicating?
>
>
> The main difference is that the in-tree MergeFunctions can only merge identical functions (modulo pointer types) whereas MergeSimilarFunctions is also capable of merging sets of functions that are merely similar, i.e. have some differences in their instructions. This expands the scope of the optimization significantly.

I'm not sure that's true anymore, or at least I'm pretty sure it's now much easier to add that capability to MergeFunc than it used to be. @jrkoenig fixed a bunch of issues in MergeFunc last summer, and experimented with merging similar functions by adding support for fuzzy-equivalence in MergeFunc.

> As to whether this could be integrated into the in-tree MergeFunctions... Well, this started off as a patch to MergeFunctions back in 2013. However, the in-tree MergeFunctions has undergone significant architectural changes since then; it now uses a total ordering of functions to speed up merging. This is great if you only want to merge identical functions, but it doesn't work for merging of similar functions.

I think that's incorrect. The comparison functions can be tuned in MergeFunc to achieve what you want.

> So it really depends on what your optimization goal is. If you want to eliminate duplicates quickly, the in-tree MergeFunctions is great. In our experience, however, the main benefit comes from being able to deal with those small differences here and there that arise e.g. from template instantiations or 'copy-paste-and-hack'.

I entirely agree with that goal, but I'd much rather see *less* code duplication. MergeFunc is a finicky piece of code, and duplicating it sounds like a pretty bad approach versus fixing it. This fork-from-2013 won't have some of the more recent fixes from MergeFunc, and we'd just end up with yet another pass to maintain.

http://reviews.llvm.org/D22051