[llvm-dev] [RFC] Changing LoopUnrollAndJamPass to a function pass.

Fri Jan 3 05:42:04 PST 2020

Hello

Sounds interesting, and it would be great to see Unroll and Jam get some 
love. I'm maybe being unimaginative, but I don't know of many cases 
where Unroll-and-Jamming the outer loops would be more beneficial than 
the code bloat it gave (a lot like normal unrolling). But making this a 
function pass sounds useful anyway, it would avoid the awkward case of 
accidentally unrolling the inner loop before the outer loop has been UnJ'd!

 > 3. Less remaining loops are needed if we consider the whole loop nest 
together

This one is interesting. UnJ was written for things like matrix multiply 
on microcontroller's without vectorization available. It helps quite a 
lot there. In our experience, we often end up unroll-and-jamming the 
middle(j) iteration, then unrolling the inner(k) one, creating a tile of 
sorts. It can create quite large code, but that code is quick on these 
little cpus.

Oh, and IIRC UnJ was written in a way that it would potentially be 
expanded to handle multiple inner loops. Or maybe I just had the idea 
that it could be expanded like that, providing that you can work out all 
the control flow and dependencies correctly. This might help with 
unroll-and-jamming vectorized code too (although if outer loop 
vectorization and interleaving is something that can already be handled 
in the vectorizer, it may be better to leave it to do that job and have 
it included in all it's cost modelling. Reverse-engineering that the 
runtime checks are loop-invariant doesn't sound very reliable. If the 
vectorizer can already do Unroll and Jam, let it do it on its own).

I look forward to seeing patches!
Dave

On 02/01/2020 19:00, Whitney T Tsang via llvm-dev wrote:
> LoopUnrollAndJamPass is currently a loop pass. It is added in a LPM with 
> only itself.
> `OptimizePM.addPass(createFunctionToLoopPassAdaptor(LoopUnrollAndJamPass(Level)));`
> Notice that loops are traversed in an inner to outer order in a LPM.
> 
> The current implementation of LoopUnrollAndJamPass supports only loop 
> nest with one inner loop (L->getSubLoops().size() == 1).
> Consider the example below:
> Before loop unroll and jam:
> ```
> for i
>    for j
>      for k
>        A[I][j][k] = 0;
> ```
> After loop unroll and jam loop-j with a factor of 2:
> ```
> for i
>    for j += 2
>      for k
>        A[I][j][k] = 0;
>        A[I][j+1][k] = 0;
>    for j’=j
>      for k
>        A[I][j][k] = 0;
> ```
> Notice that LoopUnrollAndJamPass can no longer unroll and jam loop-i at 
> the next invocation of LoopUnrollAndJamPass, since there exists two 
> inner loops in loop-i.
> If LoopUnrollAndJamPass is a function pass, then it can control the 
> order of the loops being considered. By doing the transformation from 
> outer to inner, both loop-i and loop-j can be unroll and jammed.
> 
> In conclusion, I propose to change LoopUnrollAndJamPass from loop to 
> function pass, with the reasons below:
> 1. There is no obvious reason why LoopUnrollAndJamPass need to be a loop 
> pass
> 2. More loops can be transformed by traversing in a outer to inter order
> 3. Less remaining loops are needed if we consider the whole loop nest 
> together
> 4. Better cost model can be created by considering the whole loop nest 
> together
> 
> Regards,
> Whitney Tsang
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>