[PATCH] D10857: Update documentation for unroll pragmas on loops with runtime trip counts

Thu Jul 2 16:38:35 PDT 2015

hfinkel added inline comments.

================
Comment at: include/clang/Basic/AttrDocs.td:1330
@@ -1328,1 +1329,3 @@
+the loop unroller tries to unroll the loop by a power-of-two factor
+subject to a limit on unrolled code size.
 
----------------
meheff wrote:
> broune wrote:
> > Is #pragma unroll different from #pragma unroll(full) ? If not, why not make the two doc strings the same?
> In AttrDocs.td there is not documentation for "#pragma clang loop unroll".  Instead it just mentions "#pragma clang loop" and points to LanguageExtensions for details.  Regarding LanguageExtensions documentation of "#pragma clang loop unroll" and AttrDocs documentation for "#pragma unroll", I believe they do both say the same thing (or at least aren't conflicting with each other) with the LanguageExtensions documentation providing more context and describing the limits on unrolling.
> 
> Little bit of background here.  There are two forms of the loop unroll pragmas:
> 
> #pragma unroll
> #pragma unroll N
> 
> and:
> 
> #pragma clang loop unroll(full)
> #pragma clang loop unroll_count(N)
> 
> The "#pragma unroll" forms are supported for compatibilty with other compilers (such as nvcc CUDA compiler) where this form is common while "#pragma clang .." is a more cannonical form specific to clang.  "#pragma unroll" and "#pragma clang loop unroll(full)" are equivalent.  What I propose in the related optimization patch (http://reviews.llvm.org/D10854) is that "#pragma unroll" should enable loop unrolling with a runtime trip count of the loop.  So if you have:
> 
> void foo(int n) {
> #pragma unroll
>   for (int i=0; i<n; i++) { ... }
> }
> 
> Then the optimizer will unroll the loop with some reasonable power-of-two factor.  One potentially funny bit about this approach is that "#pragma clang loop unroll(full)" will also enable unrolling of a loop with a runtime trip count.  This may be surprising... or maybe not. Full unrolling is clearly not possible in this case so just being more aggressive with unrolling may be the sensible thing to do.
> One potentially funny bit about this approach is that "#pragma clang loop unroll(full)" will also enable unrolling of a loop with a runtime trip count. This may be surprising... or maybe not. Full unrolling is clearly not possible in this case so just being more aggressive with unrolling may be the sensible thing to do.

I really don't like this. If the user is requesting 'full' unrolling, I'd rather we do only that (at least up to some really high limit, as we do currently). Runtime unrolling adds extra expense in the low-trip-count case (because of the various checks involved) and has a different set of trade-offs than simpler unrolling.

I think that we should have 'full' unrolling add "llvm.loop.unroll.runtime.disable" metadata. We should add some other mode, "aggressive" for example, which is like full unrolling but also allows for runtime unrolling. I think the desired end state here is that -Rpass=unroll would provide some specific feedback in the case where full unrolling was requested, but not possible because of a runtime trip count.



rL LLVM

http://reviews.llvm.org/D10857