[PATCH] D22317: Codegen: Tail Merge: Be less aggressive with special cases.

Fri Jul 15 16:38:14 PDT 2016

On Fri, Jul 15, 2016 at 3:55 PM, Kyle Butt <kyle+llvm at iteratee.net> wrote:

> iteratee added a comment.
>
> I've looked over the results with and without this patch, and I've come to
> the conclusion that It's correct to make this less aggressive generally.
>
> For an unconditional branch, ignoring the threshold because of fallthrough
> is correct, because there is no runtime cost if the jump happens earlier.
> It was going to happen anyway.
>

agree -- the case specialized by your patch should probably be split out as
a real no-runtime cost case (but can improve size).

> For a conditional branch this isn't true. We trade a conditional branch
> and a jump (which may not execute), for a jump (which always executes) and
> a conditional branch. Given this, I think it's reasonable to enforce the
> threshold in this case.
>

We can easily prove that tail merging will almost always introduce
additional runtime cost with conditional branch.

Assuming MBB1 and MBB2 have two successors Succ1 and Succ2.   After tail
merging, assuming Succ1 is the fallthough block from MBB1 before merging,
and the fall through block from the merge block after the transformation,
then after merging, the branch cost will be

   Freq(MBB2)  + (Freq(MBB1) * Prob(MBB1->Succ2) + Freq(MBB2) *
Prob(MBB2->Succ2))          (1)

Before tail merging, there are two scenarios:

Scenario A):   When Succ2 is not a fall through of MBB2, such that there
are JCC + JMP instruction in MBB2 . The branch cost in this scenario is

    Freq(MBB1) * Prob(MBB1->Succ2) + Freq(MBB2)          (2)

(1) is greater than (2).

Scenario B): Succ2 is fallthrough of MBB2, then the branch cost before tail
merging is

     Freq(MBB1) * Prob(MBB1->Succ2) + Freq(MBB2) * Prob(MBB2->Succ1)    (3)

(2) > (3), so   (1) > (3).   In other words, (1) is always worse.

However as I mentioned in a previous reply to Chandler,  TailMerging is
never about runtime performance, but about size reduction.  The pre-layout
TailMerging can be dummy (blindly reduce code size), and let later tailDup
to undo the previous tail Merging decision depending on the optimization
mode (e.g, not undo if Os or Oz is specified).

Since it is unclear how much size reduction can be gained by allowing
conditional branch case , I am fine with your patch, but  you need to add
more comments for your patch.   I don't want to see later someone find some
size reduction opportunity and remove the check of succ_size() == 1 -- so
all of sudden things (about taildup/tailmerge non overlapping assumption
are broken.

It will also be better if you can measure if we lose any size reduction
benefit (by building SPEC or some large programs with Os option).

David

>
>
> Repository:
>   rL LLVM
>
> https://reviews.llvm.org/D22317
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160715/c9901841/attachment.html>