[PATCH] D18226: Codegen: Tail-duplicate during placement.

Mon Jul 11 21:44:52 PDT 2016

On Mon, Jul 11, 2016 at 7:19 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:

> On Mon, Jul 11, 2016 at 7:12 PM Xinliang David Li <xinliangli at gmail.com>
> wrote:
>
>> On Mon, Jul 11, 2016 at 6:55 PM, Chandler Carruth via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> chandlerc added inline comments.
>>>
>>> ================
>>> Comment at: lib/CodeGen/BranchFolding.cpp:656
>>> @@ +655,3 @@
>>> +  // duplicated.
>>> +  if (!AfterPlacement)
>>> +    if (SuccBB && MBB1 != PredBB && MBB2 != PredBB &&
>>> ----------------
>>> davidxl wrote:
>>> > iteratee wrote:
>>> > > Discussed this with chandlerc. He thinks that passing around the
>>> list of blocks that were tail-duplicated too tightly couples the passes,
>>> and that we should strive to have non-overlap. He suggested that a list of
>>> blocks that were duplicated might be a good debug measure to make sure they
>>> aren't being re-merged.
>>> > >
>>> > > Overall I agree with him, and I made a change so there shouldn't be
>>> any overlap.
>>> > > Do you want to see the set as a debugging measure?
>>> > I don't agree. Without making it explicit, the the pass coupling does
>>> not disappear -- it is still there but in a more implicit but hard to
>>> maintain way.
>>> >
>>> > For instance, can we guarantee this still work when TailMerger and
>>> TailDuplicator uses non-default size thresholds?
>>> My thought was specifically that we should enforce an invariant that
>>> these are defined in a non-overlapping way, *and* check it in asserts
>>> builds precisely so that things like non-default size thresholds work in a
>>> sane way.
>>>
>>>
>> The problem is that the implementation maintains the non-overlapping by
>> making assumption about the size limit (the default setting), so when
>> non-default size is used, the assertion will likely to fire off.
>>
>
> Having two thresholds like this that are coupled is definitely
> problematic, but it seems like the solution should be to have a single
> threshold rather than two.
>
> Put differently, whether we are merging or duplicating tails, we should
> have one threshold and maybe a tolerance that provides any desired padding
> between the two operations. That way it will be impossible to configure
> this in an overlapping manner.
>

Having one threshold here is not the right thing to do for a couple of
reasons:

1) TailMerging's main purpose is to reduce code size. Its threshold is a
lower bound -- i.e., it is worthwhile to do tail merging only when the
common sequence length is >= threshold.   TailMerge is also possible to
have negative performance impact (increased taken branches, worse branch
prediction),  so the threshold is set to 3 as it is turned on at O2.  For
Os and Oz a smaller value is possible.

2) TailDup's main purpose, on the other hand, is to improve performance.
The size needs to be <= than the threshold in order for taildup to happen
to avoid excessive size increase. However, under different options with
different size/performance trade-offs, the setting is different. For
instance, its current default is set at 2, while at Os, the threshold is
hard coded to be 1.   In fact, to improve branch prediction, the threshold
is greatly increased to 20 when indirect branch is involved.

Currently, <=2 (tailDup) and >=3 (tailMerge) happen to not overlapping for
majority of cases (Kyle's handling is only for special cases), but I can
easily see the big benefit of increasing tailDup default threshold to be a
larger value, at least for O3, or hot functions with PGO.

>
>>
>>
>>> As long as there is the potential for overlap, we have the problem (IMO)
>>> that the innards of tail dup or tail merge can do things that user programs
>>> cannot. I would much prefer that we have a set of rules that are
>>> sufficiently precise and enforced that one pass doesn't have to have a
>>> back-channel to communicate to the other pass.
>>>
>>>
>> Passing a 'filter' is actually the exact mechanism IMO to hide pass
>> implementation details from one another. Is there a better way? Making any
>> assumption about how tailDup is done in tailMerger is the wrong direction
>> to take.
>>
>
> This assumes that all duplication of tail code sequences is done by a
> particular part of the compiler. The input program may have manually done
> such duplication itself. Same story for merging.
>
> The behavior shouldn't change when you manually apply a transform to the
> source program rather than having LLVM apply it for you.
>
> For that to work, we can't rely on communicating between the two systems,
> we need to actually define the operations in non-overlapping ways. We do
> the exact same thing for many other code motion transforms where we will
> transform in both directions to try and converge in the middle.
>
>
> But all of this is not to say that the problem of having two thresholds
> with a subtle coupling isn't a real problem! It is. We should define the
> interfaces in a way that clearly surfaces that there is some point above
> which we will merge and below which we will duplicate and that it is one
> point (not two) with maybe some delta around the point where we will make
> no change at all.
>

There are also likely cases where TailMerging purely reduces size without
affecting performance. One way is to teach tailMerger to be 'smart' about
it (instead of just doing simple size benefit as it is done today). However
I think doing that is not the right way to go.

To put it another way, TailMerging can be considered a 'normalization' pass
that simply merges all possible common tail sequences (with benefit, so it
should have a low threshold).  Later, tailDup will come in to undo some of
the decisions made earlier for performance -- so the threshold needs to be
opt level/mode dependent. Post layout tailMerge simply needs to honor
decisions made by TailDup as it has no way of precise cost analysis.

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160711/b605d73a/attachment-0001.html>