[PATCH] D15408: [AArch64/LoopUnrollRuntime] Don't avoid high-cost trip count computation on the AArch64
Adam Nemet via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 10 23:17:51 PST 2015
anemet added a subscriber: anemet.
anemet added a comment.
> After this commit merged, @hfinkel upload new commit r237947.
>
> > On X86 (and similar OOO cores) unrolling is very limited, and even if the runtime unrolling is otherwise profitable, the expense of a division to compute the trip count could greatly outweigh the benefits. On the A2, we unroll a lot, and the benefits of unrolling are more significant (seeing a 5x or 6x speedup is not uncommon), so we're more able to tolerate the expense, on average, of adivision to compute the trip count.
>
>
> I totally agree with this comment. Most of AArch64 Cores support h/w divider including floating point. So I think we can have unrolling oppotunity more.
Hmm, I don't know how hfinkel's comment supports your case. I can see how the trade-off is beneficial for his case of an in-order processors but not for an out-of-order. Did you run SPEC?
http://reviews.llvm.org/D15408
More information about the llvm-commits
mailing list