[PATCH] D105996: [AArch64] Enable Upper bound unrolling universally

Wed Jul 21 01:45:32 PDT 2021

fhahn added a comment.

In D105996#2889824 <https://reviews.llvm.org/D105996#2889824>, @jaykang10 wrote:

> In D105996#2887074 <https://reviews.llvm.org/D105996#2887074>, @jaykang10 wrote:
>
>>> snip .
>>
>> Um... I think it is normally better to unroll more loops. At least, it reduces the number of branches and it provides opportunities for post/pre index load/store. It could cause more register pressure and spill codes but it needs to be handled by the unroll cost on the pass. From my opinion, it is better to enable upper bound unroll. If it causes performance degradation, I think the pass needs to checks the cost. I do not think it is good to add 'if' statements with the combination of conditions which has lots of variant.
>
> Sorry @fhahn I could imagine something wrong for the situation with out-of-order cores. If possible, can you share the case, in which the upper bound unroll makes performance worse on out-of-order cores, please? I imagined there is dynamic instruction scheduling of hardware level in out-of-order cores. The unroll could cause the instruction window filled with instructions of smaller scope. I thought other benefits from the fully unrolled loop could bridge the gap... If I missed something, please let me know.

I think the impact may be highly specific to the uarch, hence I think it would be good to test the change with different CPUs. There may be very little benefit from unrolling on its own for CPUs that have enough execution units to hide the branch overhead for example. As you mentioned, unrolling could also cause more spilling or cause some other optimisations not to trigger. I agree the latter are issues in LLVM and should be fixed.

But it would be good to iron out potential issues before landing the patch, which is why I think it would be good to also test with other build options as well (unless I missed something I think there is no mention of the configuration you used to produce the numbers)

I can test the patch with some CPUs, but I will only be able to do so next week, as I don't have access to my devices at the moment. Not sure about other vendors, but if you have access to other Arm OOO cores, it might be worth to double check those as well.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105996/new/

https://reviews.llvm.org/D105996