[PATCH] D123865: [LoopPeel] Allow partial unrolling for profile-based peeling

Igor Kudrin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 22 10:07:55 PDT 2022


ikudrin added a comment.

Unfortunatelly, I do not have access to SPEC tests. I checked with `7zip-benchmark` and found that `Compressing` speed is slightly improved with this patch while `Decompressing` is slightly degraded. But I have to note that compared to a `Release` build, `PGO` one is worse even without the patch.

As for my own tests, I noticed that the performance gain with unrolled loops results from the generated code that has less instructions. In one sample, there was no need to reserve a register for the loop counters and as in each iteration the value of the counter was known, there was no instructions for increment the counter and the comparison instructions used an immediate value instead of comparing with the register; all that made the code path shorter. In another example, the subsequent iterations used overlapped data from the memory. Both compact and unrolled version loaded the data to a set of registers, but the compact version had to shift values in registers in each iteration, so that they are placed each time in the predefined registers, while unrolled version just used instructions which work this different set of registers in each iteration. Thus, unrolling again helped to find a way to generate a code with less instructions for each peeled iteration.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123865/new/

https://reviews.llvm.org/D123865



More information about the llvm-commits mailing list