[PATCH] Add getUnrollingPreferences to TTI

Thu Aug 29 16:44:44 PDT 2013

----- Original Message -----
> 
> On Thu, Aug 29, 2013 at 5:54 AM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> 
> 
> 
> 
> 
> ----- Original Message -----
> > ----- Original Message -----
> > > On Wed, Aug 28, 2013 at 03:07:31PM -0500, Hal Finkel wrote:
> > > > Nadav, et al.,
> > > > 
> > > > The attached patch adds the following interface to TTI:
> 
> I've attached an updated patch. In this version, the caller always
> initializes the UP structure to the current target-independent
> defaults before calling getUnrollingPreferences. This simplifies the
> implementation, and also stabilizes the interface (target code will
> not need to be updated just because new fields are added).
> 
> 
> 
> So, I'd like Andy to chime in on whether he's comfortable with this
> use of TTI. I think I'm OK with it, but it's a borderline case.
> 
> 
> Second, I think essentially only the LoopVectorizer should use this
> to drive its partial unrolling. This is both because it has added
> information about when the cost is low (ILP) and when the cost is
> unusually high. Finally, it should happen at the same phase of the
> optimizer IMO (IE, it shouldn't drastically influence the behavior
> of the inliner). While the LoopVectorizer is currently inside the
> CGSCC pass pipeline, Nadav is working on moving it to live after
> that completes.

The loop unroller's partial unrolling is different from the loop vectorizer's partial unrolling. The loop vectorizer specifically partially unrolls vectorizable loops for ILP (with loop iteration bodies maximally intermixed), while the loop unroller uses loop-body concatenation. Both are important:

 - On cores with high loop-branch overhead, partial unrolling is important for all loops with small bodies, regardless of ILP considerations in order to hide the backedge cost.

 - Because the loop vectorizer does not partially unroll non-vectorizable loops, having only that would miss a lot of important cases where loop-body concatenation can nevertheless help hide instruction latency and expose ILP near the end and beginning of the loop body. Especially for cores with deep pipelines (high instruction latency) this is actually quite important.

In short, the ILP unrolling that the loop vectorizer does is almost always preferable (so long as its register pressure estimates are good), but it is not always possible, and the concatenation-based unrolling is also important.

> 
> 
> Some comments on the patch:
> 
> TargetTransformInfo.h
> index 06810a7..8a51266 100644
> --- a/include/llvm/Analysis/TargetTransformInfo.h
> +++ b/include/llvm/Analysis/TargetTransformInfo.h
> @@ -29,6 +29,7 @@
> namespace llvm {
> 
> class GlobalValue;
> +class Loop;
> class Type;
> class User;
> class Value;
> @@ -191,6 +192,23 @@ public:
> /// incurs significant execution cost.
> virtual bool isLoweredToCall(const Function *F) const;
> 
> + /// Parameters that control the generic loop unrolling
> transformation.
> + struct UnrollingPreferences {
> + unsigned Threshold; ///< The cost threshold for the unrolled loop
> + ///< (set to UINT_MAX to disable).
> + unsigned OptSizeThreshold; ///< The cost threshold for the unrolled
> loop
> + ///< when optimizing for size
> + ///< (set to UINT_MAX to disable).
> + unsigned Count; ///< Force this unrolling count (set to 0 to
> disable).
> 
> 
> I don't understand the Count at all. Can you explain this more? I
> suspect using normal doxygen comments rather than trailing comments
> will let you use the prose necessary to document this.

This provides a way to override the unrolling factor for the loop. As Tom suggested, the Loop object is provided to the callback, and based on some target specific analysis, a specific unrolling factor could be determined. I'll add more commentary.

> 
> 
> + bool Partial; ///< Allow partial loop unrolling.
> + bool Runtime; ///< Perform runtime unrolling.
> + };
> +
> + /// \brief Get target-customized preferences for the generic loop
> unrolling
> + /// transformation. The caller will initialize UP with the current
> + /// target-independent defaults.
> + virtual void getUnrollingPreferences(Loop *L, UnrollingPreferences
> &UP) const;
> 
> 
> Just return the struct by value. It's a much simpler API that way.

I disagree. In the current patch, the struct members are initialized to the current defaults by the unrolling pass. This way the target only needs to change those elements it cares about, and we don't need to update all targets that use the interface if we happen to add another structure member.

> 
> 
> +
> /// @}
> 
> /// \name Scalar Target Information
> diff --git a/lib/Analysis/TargetTransformInfo.cpp
> b/lib/Analysis/TargetTransformInfo.cpp
> index 8c6005b..b23223d 100644
> --- a/lib/Analysis/TargetTransformInfo.cpp
> +++ b/lib/Analysis/TargetTransformInfo.cpp
> @@ -96,6 +96,11 @@ bool TargetTransformInfo::isLoweredToCall(const
> Function *F) const {
> return PrevTTI->isLoweredToCall(F);
> }
> 
> +void TargetTransformInfo::getUnrollingPreferences(Loop *L,
> + UnrollingPreferences &UP) const {
> 
> 
> Can you run the patch through clang-format to fix up weird
> indentation like this?

Hrmm... I did not think it was weird, actually it was quite intentional. The second line is indented 2 spaces from the function name on the previous line. Given all of the alternatives, that seemed to make the most sense to me. I'll see what clang-format does, maybe I'll change my mind and like it better ;)

> 
> 
> + PrevTTI->getUnrollingPreferences(L, UP);
> +}
> +
> bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const {
> return PrevTTI->isLegalAddImmediate(Imm);
> }
> @@ -469,6 +474,8 @@ struct NoTTI : ImmutablePass, TargetTransformInfo
> {
> return true;
> }
> 
> + void getUnrollingPreferences(Loop *, UnrollingPreferences &) const
> { }
> 
> 
> This doesn't look like it initializes anything?

Because the default implementation does not change anything (because, as documented in the header file, the structure is pre-initialized with the target-independent defaults).

Thanks for the review!

 -Hal

> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory