[LLVMdev] Partial loop unrolling

Tue Jul 15 05:37:04 PDT 2014

On 15 July 2014 13:08, Cristianno Martins <cristiannomartins at gmail.com> wrote:
> * There is, in fact, another interesting function for unrolling: if the
> upper limit of the loop is known during compilation-time and it is a very
> small value, it could be interesting to substitute the whole loop for all
> the necessary calls to do_foo

This is also the case for re-rolling, where the loop is unrolled in
the "wrong way", and re-rolling, than unrolling will expose
paralellism.

but there are others:

3. join all loads and stores into a block and hide the delays in
between the loop cycles.

Reading a large block of data is almost as efficient as reading a
small one, so a totally rolled loop will have read-op-write while a
partially unrolled loop will have read-op-op-op-write, saving two
reads and two writes every three.

4. Align vectorized loops

Vectorization will often partially unroll the loop (like your 1 and 3
examples) to make the loop aligned to the memory constraints (ex, if
the pointer or if the loop count is unaligned, etc).

AFAIK, all vectorizing compilers, including LLVM, do all of them. It
depends more on what you want and how's your original loop than
generic goodness.

cheers,
--renato