[PATCH] D105996: [AArch64] Enable Upper bound unrolling universally

JinGu Kang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jul 20 02:34:42 PDT 2021


jaykang10 added a comment.

In D105996#2887074 <https://reviews.llvm.org/D105996#2887074>, @jaykang10 wrote:

> In D105996#2886560 <https://reviews.llvm.org/D105996#2886560>, @fhahn wrote:
>
>> I'm not sure it is a good idea to enable it universally after just measuring on a single core in a single configuration. From the data you shared, it is not clear which build settings were used (-O3, -Os, LTO, PGO?) and it would also be good to specify what metric you are showing (score I guess, but it is not clear). For making such changes in general, I think it would be good to see data on as many impacted configurations as possible to rule out negative impacts. I am also curious what the impact is on -Os and it would also be good to check Geekbench.
>>
>> Last time I measured this for Apple out-of-order cores, this was not really beneficial on a wider set of benchmarks, but things might have change. I'd recommend to only enable it for the cores you measured it (or cores you are confident that are very similar).
>
> Um... I think it is normally better to unroll more loops. At least, it reduces the number of branches and it provides opportunities for post/pre index load/store. It could cause more register pressure and spill codes but it needs to be handled by the unroll cost on the pass. From my opinion, it is better to enable upper bound unroll. If it causes performance degradation, I think the pass needs to checks the cost. I do not think it is good to add 'if' statements with the combination of conditions which has lots of variant.

Sorry @fhahn I could imagine something wrong for the situation with out-of-order cores. If possible, can you share the case, in which the upper bound unroll makes performance worse on out-of-order cores, please? I imagined there is dynamic instruction scheduling of hardware level in out-of-order cores. The unroll could cause the instruction window filled with instructions of smaller scope. I thought other benefits from the fully unrolled loop could bridge the gap... If I missed something, please let me know.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105996/new/

https://reviews.llvm.org/D105996



More information about the llvm-commits mailing list