[PATCH] D15408: [AArch64/LoopUnrollRuntime] Don't avoid high-cost trip count computation on the AArch64

Adam Nemet via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 10 23:17:51 PST 2015


anemet added a subscriber: anemet.
anemet added a comment.

> After this commit merged, @hfinkel upload new commit r237947.

> 

> > On X86 (and similar OOO cores) unrolling is very limited, and even if the runtime unrolling is otherwise profitable, the expense of a division to compute the trip count could greatly outweigh the benefits. On the A2, we unroll a lot, and the benefits of unrolling are more significant (seeing a 5x or 6x speedup is not uncommon), so we're more able to tolerate the expense, on average, of adivision to compute the trip count.

> 

> 

> I totally agree with this comment. Most of AArch64 Cores support h/w divider including floating point. So I think we can have unrolling oppotunity more.


Hmm, I don't know how hfinkel's comment supports your case.  I can see how the trade-off is beneficial for his case of an in-order processors but not for an out-of-order.  Did you run SPEC?


http://reviews.llvm.org/D15408





More information about the llvm-commits mailing list