[PATCH] D123865: [LoopPeel] Allow partial unrolling for profile-based peeling

David Li via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 22 11:18:21 PDT 2022


davidxl added a comment.

In D123865#3468077 <https://reviews.llvm.org/D123865#3468077>, @ikudrin wrote:

> Unfortunatelly, I do not have access to SPEC tests. I checked with `7zip-benchmark` and found that `Compressing` speed is slightly improved with this patch while `Decompressing` is slightly degraded. But I have to note that compared to a `Release` build, `PGO` one is worse even without the patch.
>
> As for my own tests, I noticed that the performance gain with unrolled loops results from the generated code that has less instructions. In one sample, there was no need to reserve a register for the loop counters and as in each iteration the value of the counter was known, there was no instructions for increment the counter and the comparison instructions used an immediate value instead of comparing with the register; all that made the code path shorter. In another example, the subsequent iterations used overlapped data from the memory. Both compact and unrolled version loaded the data to a set of registers, but the compact version had to shift values in registers in each iteration, so that they are placed each time in the predefined registers, while unrolled version just used instructions which work this different set of registers in each iteration. Thus, unrolling again helped to find a way to generate a code with less instructions for each peeled iteration.

Thanks for the analysis.   That matches what I expected -- which circles back to my original question -- how do we know the new limit is generally better than the old limit?  It really depends on the cost-benefit analysis. There are workloads (as demonstrated by the Decompressing case) which gets hurt.

On the other hand, the patch itself (the restructuring part) looks good -- it makes the code cleaner.   I suggest you commit the restructuring part first (e.g. by removing the *2 multiplier part).     I  think it is better to keep the default setting unchanged until a better analysis in place.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123865/new/

https://reviews.llvm.org/D123865



More information about the llvm-commits mailing list