[PATCH] D28368: Give higher full-unroll boosting when the loop iteration is small.

Tue Jan 10 12:40:57 PST 2017

mzolotukhin added a comment.

> In the fully unroll, if the loop can be fully unrolled, it will not likely to trigger LSD (not enough trip count), nor will it affect the icache-miss (fully unrolled loop is streight-line code, no temporal locality, even if it's embedded in an outer-loop, the backedge of the outer loop should be easy to predict right). So if we assume all backend optimizations is sane (e.g. SLP performs as well as loop vectorizer, RA is doing good job in large BB, etc). As a result, larger code size should always lead to better performance for fully unroll. So a threshold here is purely limiting the size of the text.

This is not exactly true in practice. If we just bump up the threshold, we'll see both performance improvements and regressions.

> I think probably two types of unroller should not share the same threshold?

This makes sense. However, I prefer not to bloat our army of thresholds without a guaranteed benefit.

> How about we bump the threshold in O3, so that people who do not have profiler can still choose to fully unroll more aggressively?

For the change like this please submit a separate patch and include as much testing data as you can (including but not limited to SPEC, LLVM-testsuite, etc.). Please include runtime performance, compile time, and binary sizes.

Thanks,
Michael

https://reviews.llvm.org/D28368