[PATCH] D18226: Codegen: Tail-duplicate during placement.

Wed Jul 13 10:33:43 PDT 2016

Any more concerns? Not being able to mix tailDup and tailMerge properly
with non-default thresholds is the only issue remaining that blocks the
checkin. I'd like Kyle to be able to move on. I think the right approach is
to make it work as proposed. If there are better ways, we can revisit it
later.

David

On Mon, Jul 11, 2016 at 9:44 PM, Xinliang David Li <davidxl at google.com>
wrote:

>
>
> On Mon, Jul 11, 2016 at 7:19 PM, Chandler Carruth <chandlerc at gmail.com>
> wrote:
>
>> On Mon, Jul 11, 2016 at 7:12 PM Xinliang David Li <xinliangli at gmail.com>
>> wrote:
>>
>>> On Mon, Jul 11, 2016 at 6:55 PM, Chandler Carruth via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>>
>>>> chandlerc added inline comments.
>>>>
>>>> ================
>>>> Comment at: lib/CodeGen/BranchFolding.cpp:656
>>>> @@ +655,3 @@
>>>> +  // duplicated.
>>>> +  if (!AfterPlacement)
>>>> +    if (SuccBB && MBB1 != PredBB && MBB2 != PredBB &&
>>>> ----------------
>>>> davidxl wrote:
>>>> > iteratee wrote:
>>>> > > Discussed this with chandlerc. He thinks that passing around the
>>>> list of blocks that were tail-duplicated too tightly couples the passes,
>>>> and that we should strive to have non-overlap. He suggested that a list of
>>>> blocks that were duplicated might be a good debug measure to make sure they
>>>> aren't being re-merged.
>>>> > >
>>>> > > Overall I agree with him, and I made a change so there shouldn't be
>>>> any overlap.
>>>> > > Do you want to see the set as a debugging measure?
>>>> > I don't agree. Without making it explicit, the the pass coupling does
>>>> not disappear -- it is still there but in a more implicit but hard to
>>>> maintain way.
>>>> >
>>>> > For instance, can we guarantee this still work when TailMerger and
>>>> TailDuplicator uses non-default size thresholds?
>>>> My thought was specifically that we should enforce an invariant that
>>>> these are defined in a non-overlapping way, *and* check it in asserts
>>>> builds precisely so that things like non-default size thresholds work in a
>>>> sane way.
>>>>
>>>>
>>> The problem is that the implementation maintains the non-overlapping by
>>> making assumption about the size limit (the default setting), so when
>>> non-default size is used, the assertion will likely to fire off.
>>>
>>
>> Having two thresholds like this that are coupled is definitely
>> problematic, but it seems like the solution should be to have a single
>> threshold rather than two.
>>
>> Put differently, whether we are merging or duplicating tails, we should
>> have one threshold and maybe a tolerance that provides any desired padding
>> between the two operations. That way it will be impossible to configure
>> this in an overlapping manner.
>>
>
> Having one threshold here is not the right thing to do for a couple of
> reasons:
>
> 1) TailMerging's main purpose is to reduce code size. Its threshold is a
> lower bound -- i.e., it is worthwhile to do tail merging only when the
> common sequence length is >= threshold.   TailMerge is also possible to
> have negative performance impact (increased taken branches, worse branch
> prediction),  so the threshold is set to 3 as it is turned on at O2.  For
> Os and Oz a smaller value is possible.
>
> 2) TailDup's main purpose, on the other hand, is to improve performance.
> The size needs to be <= than the threshold in order for taildup to happen
> to avoid excessive size increase. However, under different options with
> different size/performance trade-offs, the setting is different. For
> instance, its current default is set at 2, while at Os, the threshold is
> hard coded to be 1.   In fact, to improve branch prediction, the threshold
> is greatly increased to 20 when indirect branch is involved.
>
> Currently, <=2 (tailDup) and >=3 (tailMerge) happen to not overlapping for
> majority of cases (Kyle's handling is only for special cases), but I can
> easily see the big benefit of increasing tailDup default threshold to be a
> larger value, at least for O3, or hot functions with PGO.
>
>
>
>>
>>>
>>>
>>>> As long as there is the potential for overlap, we have the problem
>>>> (IMO) that the innards of tail dup or tail merge can do things that user
>>>> programs cannot. I would much prefer that we have a set of rules that are
>>>> sufficiently precise and enforced that one pass doesn't have to have a
>>>> back-channel to communicate to the other pass.
>>>>
>>>>
>>> Passing a 'filter' is actually the exact mechanism IMO to hide pass
>>> implementation details from one another. Is there a better way? Making any
>>> assumption about how tailDup is done in tailMerger is the wrong direction
>>> to take.
>>>
>>
>> This assumes that all duplication of tail code sequences is done by a
>> particular part of the compiler. The input program may have manually done
>> such duplication itself. Same story for merging.
>>
>> The behavior shouldn't change when you manually apply a transform to the
>> source program rather than having LLVM apply it for you.
>>
>> For that to work, we can't rely on communicating between the two systems,
>> we need to actually define the operations in non-overlapping ways. We do
>> the exact same thing for many other code motion transforms where we will
>> transform in both directions to try and converge in the middle.
>>
>>
>> But all of this is not to say that the problem of having two thresholds
>> with a subtle coupling isn't a real problem! It is. We should define the
>> interfaces in a way that clearly surfaces that there is some point above
>> which we will merge and below which we will duplicate and that it is one
>> point (not two) with maybe some delta around the point where we will make
>> no change at all.
>>
>
>
> There are also likely cases where TailMerging purely reduces size without
> affecting performance. One way is to teach tailMerger to be 'smart' about
> it (instead of just doing simple size benefit as it is done today). However
> I think doing that is not the right way to go.
>
> To put it another way, TailMerging can be considered a 'normalization'
> pass that simply merges all possible common tail sequences (with benefit,
> so it should have a low threshold).  Later, tailDup will come in to undo
> some of the decisions made earlier for performance -- so the threshold
> needs to be opt level/mode dependent. Post layout tailMerge simply needs to
> honor decisions made by TailDup as it has no way of precise cost analysis.
>
> David
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160713/8ba3ebd2/attachment.html>