[PATCH] D28368: Give higher full-unroll boosting when the loop iteration is small.

Thu Jan 5 21:16:17 PST 2017

danielcdh added a comment.

In https://reviews.llvm.org/D28368#637684, @mzolotukhin wrote:

> > The real motivation for this patch is to boost the threshold for fully unroll so that we can materialize the performance benefits in our benchmarks. The initial thoughts were: simply boost unroll-threshold by a minimum of 2X for fully unrolling (which is fine to materialize the performance). But I think this might be harder to be accepted by upstream as it seems too brutal-force. Then I'm thinking of integrating trip_count into the model to limit the "relative code size increase".
>
> I see. Just doubling the threshold indeed will be hard to upstream, and reasonably so. While it improves performance on your benchmarks, it also increases code-size/compile-time on many other tests. IMHO the ideal solution here would be to understand what differentiates loops in your benchmarks from other loops, then estimate how popular such cases are, and if they're popular enough, teach llvm to recognize them (i.e. see benefits of unrolling such loops). This way we'll only increase code-size/compile time in cases where we gain performance, and lose nothing in other cases.

I compared the profile between our internal benchmark and 464.h264ref, the only difference is that the fully unrolled loop showed high up in the profile of our benchmark, while the fully unrolled loop is cold in 464.h264ref, thus it has no performance impact.

We could use profile info to allow more aggressive threshold only for hot loops, so that code size increase can be avoided. But this requires BFI within the loop pass, which will be expensive (compile time overhead).

OTOH, if the heuristic is generally helping performance, I guess it would be worth to tradeoff 3% code size/compile time with potential better performance?

Thanks,
Dehao

> 
> 
>> Anyway, I think the patch needs to be updated to make it accurate. We could either use a constant minimum boosting factor, or still use the trip_count to limit "relative code size increase" but update the comment to make it accurate.
>> 
>> Any suggestions?

https://reviews.llvm.org/D28368