[PATCH] Add getUnrollingPreferences to TTI

Tue Sep 10 09:21:34 PDT 2013

On Sep 6, 2013, at 1:43 AM, Hal Finkel <hfinkel at anl.gov> wrote:

> Chandler, Andy,
> 
> I've attached an updated patch.
> 
> - I've not changed the interface to return the structure by value (because, as noted previously, I think it is better this way).
> 
> - I've not changed the formatting (given the formatting of the surrounding code, I think that the indentation chosen makes the most sense). 'clang-formatting'ing the whole file looks like it will provide an overall improvement, but it will be a large change, and we should do that in a separate commit. 
> 
> - I've added more explanatory text to the interface definition.
> 
> - I've switched around the conditionals as requested by Andy.
> 
> Please review.

LGTM. Thanks. 
 -Andy

> Thanks again,
> Hal
> 
> ----- Original Message -----
>> ----- Original Message -----
>>> 
>>> On Thu, Aug 29, 2013 at 5:54 AM, Hal Finkel < hfinkel at anl.gov >
>>> wrote:
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>>> ----- Original Message -----
>>>>> On Wed, Aug 28, 2013 at 03:07:31PM -0500, Hal Finkel wrote:
>>>>>> Nadav, et al.,
>>>>>> 
>>>>>> The attached patch adds the following interface to TTI:
>>> 
>>> I've attached an updated patch. In this version, the caller always
>>> initializes the UP structure to the current target-independent
>>> defaults before calling getUnrollingPreferences. This simplifies
>>> the
>>> implementation, and also stabilizes the interface (target code will
>>> not need to be updated just because new fields are added).
>>> 
>>> 
>>> 
>>> So, I'd like Andy to chime in on whether he's comfortable with this
>>> use of TTI. I think I'm OK with it, but it's a borderline case.
>>> 
>>> 
>>> Second, I think essentially only the LoopVectorizer should use this
>>> to drive its partial unrolling. This is both because it has added
>>> information about when the cost is low (ILP) and when the cost is
>>> unusually high. Finally, it should happen at the same phase of the
>>> optimizer IMO (IE, it shouldn't drastically influence the behavior
>>> of the inliner). While the LoopVectorizer is currently inside the
>>> CGSCC pass pipeline, Nadav is working on moving it to live after
>>> that completes.
>> 
>> The loop unroller's partial unrolling is different from the loop
>> vectorizer's partial unrolling. The loop vectorizer specifically
>> partially unrolls vectorizable loops for ILP (with loop iteration
>> bodies maximally intermixed), while the loop unroller uses loop-body
>> concatenation. Both are important:
>> 
>> - On cores with high loop-branch overhead, partial unrolling is
>> important for all loops with small bodies, regardless of ILP
>> considerations in order to hide the backedge cost.
>> 
>> - Because the loop vectorizer does not partially unroll
>> non-vectorizable loops, having only that would miss a lot of
>> important cases where loop-body concatenation can nevertheless help
>> hide instruction latency and expose ILP near the end and beginning
>> of the loop body. Especially for cores with deep pipelines (high
>> instruction latency) this is actually quite important.
>> 
>> In short, the ILP unrolling that the loop vectorizer does is almost
>> always preferable (so long as its register pressure estimates are
>> good), but it is not always possible, and the concatenation-based
>> unrolling is also important.
>> 
>>> 
>>> 
>>> Some comments on the patch:
>>> 
>>> TargetTransformInfo.h
>>> index 06810a7..8a51266 100644
>>> --- a/include/llvm/Analysis/TargetTransformInfo.h
>>> +++ b/include/llvm/Analysis/TargetTransformInfo.h
>>> @@ -29,6 +29,7 @@
>>> namespace llvm {
>>> 
>>> class GlobalValue;
>>> +class Loop;
>>> class Type;
>>> class User;
>>> class Value;
>>> @@ -191,6 +192,23 @@ public:
>>> /// incurs significant execution cost.
>>> virtual bool isLoweredToCall(const Function *F) const;
>>> 
>>> + /// Parameters that control the generic loop unrolling
>>> transformation.
>>> + struct UnrollingPreferences {
>>> + unsigned Threshold; ///< The cost threshold for the unrolled loop
>>> + ///< (set to UINT_MAX to disable).
>>> + unsigned OptSizeThreshold; ///< The cost threshold for the
>>> unrolled
>>> loop
>>> + ///< when optimizing for size
>>> + ///< (set to UINT_MAX to disable).
>>> + unsigned Count; ///< Force this unrolling count (set to 0 to
>>> disable).
>>> 
>>> 
>>> I don't understand the Count at all. Can you explain this more? I
>>> suspect using normal doxygen comments rather than trailing comments
>>> will let you use the prose necessary to document this.
>> 
>> This provides a way to override the unrolling factor for the loop. As
>> Tom suggested, the Loop object is provided to the callback, and
>> based on some target specific analysis, a specific unrolling factor
>> could be determined. I'll add more commentary.
>> 
>>> 
>>> 
>>> + bool Partial; ///< Allow partial loop unrolling.
>>> + bool Runtime; ///< Perform runtime unrolling.
>>> + };
>>> +
>>> + /// \brief Get target-customized preferences for the generic loop
>>> unrolling
>>> + /// transformation. The caller will initialize UP with the
>>> current
>>> + /// target-independent defaults.
>>> + virtual void getUnrollingPreferences(Loop *L,
>>> UnrollingPreferences
>>> &UP) const;
>>> 
>>> 
>>> Just return the struct by value. It's a much simpler API that way.
>> 
>> I disagree. In the current patch, the struct members are initialized
>> to the current defaults by the unrolling pass. This way the target
>> only needs to change those elements it cares about, and we don't
>> need to update all targets that use the interface if we happen to
>> add another structure member.
>> 
>>> 
>>> 
>>> +
>>> /// @}
>>> 
>>> /// \name Scalar Target Information
>>> diff --git a/lib/Analysis/TargetTransformInfo.cpp
>>> b/lib/Analysis/TargetTransformInfo.cpp
>>> index 8c6005b..b23223d 100644
>>> --- a/lib/Analysis/TargetTransformInfo.cpp
>>> +++ b/lib/Analysis/TargetTransformInfo.cpp
>>> @@ -96,6 +96,11 @@ bool TargetTransformInfo::isLoweredToCall(const
>>> Function *F) const {
>>> return PrevTTI->isLoweredToCall(F);
>>> }
>>> 
>>> +void TargetTransformInfo::getUnrollingPreferences(Loop *L,
>>> + UnrollingPreferences &UP) const {
>>> 
>>> 
>>> Can you run the patch through clang-format to fix up weird
>>> indentation like this?
>> 
>> Hrmm... I did not think it was weird, actually it was quite
>> intentional. The second line is indented 2 spaces from the function
>> name on the previous line. Given all of the alternatives, that
>> seemed to make the most sense to me. I'll see what clang-format
>> does, maybe I'll change my mind and like it better ;)
>> 
>>> 
>>> 
>>> + PrevTTI->getUnrollingPreferences(L, UP);
>>> +}
>>> +
>>> bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const {
>>> return PrevTTI->isLegalAddImmediate(Imm);
>>> }
>>> @@ -469,6 +474,8 @@ struct NoTTI : ImmutablePass,
>>> TargetTransformInfo
>>> {
>>> return true;
>>> }
>>> 
>>> + void getUnrollingPreferences(Loop *, UnrollingPreferences &)
>>> const
>>> { }
>>> 
>>> 
>>> This doesn't look like it initializes anything?
>> 
>> Because the default implementation does not change anything (because,
>> as documented in the header file, the structure is pre-initialized
>> with the target-independent defaults).
>> 
>> Thanks for the review!
>> 
>> -Hal
>> 
>>> 
>>> 
>> 
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory<unroll_tti-v3.patch>