[PATCH] D32451: Improve profile-guided heuristics to use estimated trip count.
Ayal Zaks via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu May 11 13:25:51 PDT 2017
Ayal added a comment.
I'm inclined to treat TinyTripCount loops with associated reports the same for static and profile-based TripCounts, but am ok with setting OptForSize if profile-based-TripCount < TinyTripCount while aborting & reporting when static-TripCount < TinyTripCount, as suggested. The outcome is practically the same (see below). @mssimpso, @mkuper - do you have a preference here?
More comments:
In https://reviews.llvm.org/D32451#748358, @twoh wrote:
> Thanks @Ayal for your comments! If the profile-based trip count checking is moved above the line
>
> if (MaxTC > 0u && MaxTC < TinyTripCountVectorThreshold)
>
>
> , it wouldn't be possible to distinguish the case of static analysis fail to compute MaxTC from the case of profile-based trip count is actually 0. Also, as profile-based numbers are not as definitive as the number from static analysis, I think it might be worth to just optimize for size rather than disable the vectorization. As you mentioned in the comment, OptForSize is effectively same as disabling vectorization for now, but the algorithm for OptForSize case might be changed in the future.
Having EstimatedTC < TinyTripCountVectorThreshold should not imply "IsColdLoop". The loop may be hot.
Yes, getSmallConstantMaxTripCount() should also return an Optional<unsigned> (but not in this commit).
Can alternatively do
unsigned ExpectedTC = SE->getSmallConstantMaxTripCount(L);
bool HasExpectedTC = (ExpectedTC > 0);
if (!HasExpectedTC && LoopVectorizeWithBlockFrequency) {
auto EstimatedTC = getLoopEstimatedTripCount(L);
if (EstimatedTC) {
ExpectedTC = *EstimatedTC;
HasExpectedTC = true;
}
}
if (HasExpectedTC && ExpectedTC < TinyTripCountVectorThreshold) {
...
}
> OTOH, setting OptForSize to true when the trip count is unknown effectively prevents vectorization, because an epilog is needed.
Just for completeness, if the trip count is unknown but known to be divisible by VF, loop could potentially be vectorized w/o an epilog.
https://reviews.llvm.org/D32451
More information about the llvm-commits
mailing list