[PATCH] D28368: Give higher full-unroll boosting when the loop iteration is small.

Michael Zolotukhin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 5 22:36:43 PST 2017


mzolotukhin added a comment.

> I compared the profile between our internal benchmark and 464.h264ref, the only difference is that the fully unrolled loop showed high up in the profile of our benchmark, while the fully unrolled loop is cold in 464.h264ref, thus it has no performance impact.

Could you tell what exactly happens to the loop after unrolling? Do we get the performance improvement from just removing branches, or does unrolling enable later optimizations (if so, which ones)?

> We could use profile info to allow more aggressive threshold only for hot loops, so that code size increase can be avoided. But this requires BFI within the loop pass, which will be expensive (compile time overhead).

Using profile info in loop-unrolling is definitely worthwhile. Most of the loops we unroll are actually in the cold parts, so if we can avoid unrolling them, we can save some budget for more aggressive unrolling in hot regions (or just get smaller code and faster compilation).

> OTOH, if the heuristic is generally helping performance...

What heuristic are you referring to here?

Michael


https://reviews.llvm.org/D28368





More information about the llvm-commits mailing list