[PATCH] D22051: MergeSimilarFunctions: a code size pass to merge functions with small differences
Tobias Edler von Koch via llvm-commits
llvm-commits at lists.llvm.org
Mon Jul 11 16:01:45 PDT 2016
On 07/07/2016 07:10 PM, Daniel Berlin wrote:
> Two thoughts after staring at it:
>
> 1. If you were to form SESE/etc regions and process them in topo
> order, you could actually do even better than you do now. You would
> be able to validly say "the first X regions of this function are the
> same". You currently do this in some cases, but you could do it in more :)
Yep, that's a good idea. You could then also outline identical regions
from functions... :)
An easy improvement to the current traversal algorithm would be to
enqueue successor blocks in the order of their instruction count as
opposed to the order they appear in the terminator instruction. That
way, you could handle easy cases where the successor blocks are swapped
but otherwise equivalent or similar.
> 2. You could integrate most of what you are doing with mergefuncs with
> a bit of work.
Depends on how you define "a bit of work" ;) See below.
> As to whether this could be integrated into the in-tree
> MergeFunctions... Well, this started off as a patch to
> MergeFunctions back in 2013. However, the in-tree MergeFunctions
> has undergone significant architectural changes since then; it now
> uses a total ordering of functions to speed up merging. This is
> great if you only want to merge identical functions, but it
> doesn't work for merging of similar functions.
>
>
> This is not correct.
> You could use simhash, for example, or any thing that will give you
> actual similarity metrics in addition to total ordering.
What I meant to say is that it doesn't make sense to use the *current*
hash function and the *current* total ordering criterion in the in-tree
MergeFunctions if you want to merge similar functions.
Why?
1. The hash function includes the instruction opcodes. Similar, but
non-identical, functions will end up in different buckets. We'd have to
replace the hash function by something less precise for similar merging.
Hashing the CFG structure is a neat idea though, I should be doing that too.
2. The total ordering criterion (used inside the buckets) in
MergeFunctions is based on equality. A very large chunk of the
MergeFunctions code deals with those comparisons (see all the cmpXYZ
functions). Similar functions won't be near each other if you use that
specific ordering criterion, so it's not much use. We'd need to replace
the criterion by something else - and yes, similarity hashing is a great
option for that (although not yet implemented so unclear how well it'd
work for IR).
So essentially, to make this work in MergeFunctions, we'd have to rip
out two of its key advantages (which are great if you only want to merge
identical functions)... and you'd essentially get this pass :)
Am I missing something?
Since the two MergeFunctions share a common ancestor, there is some
shared code; basically a shared skeleton. So it should be possible to
factor that out into a "MergeFunctionsUtils" sort of module. Perhaps
that's a way to go if people are happy with that?
Thanks,
Tobias
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160711/9424d22a/attachment.html>
More information about the llvm-commits
mailing list