RFC: LoopEditor, a high-level loop transform toolkit

Wed Jul 29 09:10:31 PDT 2015

I think refactoring the commonalities into a LoopEditor makes sense.

The LoopVectorizer has grown to be a class with a lot of functionality. Moving some of the parts shared with other loop optimization passes makes sense to me. As we want to tackle more and more loop nest optimizations I think this will become imperative to maintainability.

Most of the interesting state in the vectorizer will move to the delegate.

> On Jul 28, 2015, at 12:49 PM, James Molloy <james at jamesmolloy.co.uk> wrote:
> 
> Hi Michael,
> 
> +llvmdev,Hal,Nadav
> 
> For testing, I was currently thinking of a two pronged approach. Lit tests as you suggest with a dummy pass, probably with command line options to define what transform to do, and unit tests to test the delegate behaviour and return values. 
> 
> I'll try and produce a mega patch with at least the loop vectoriser moved over, then split it up again after review.
> 
> Cheers,
> 
> James
> 
> On Tue, 28 Jul 2015 at 20:27, Michael Zolotukhin <mzolotukhin at apple.com> wrote:
> BTW, I think this thread should be on LLVM-dev, and I’d also CC Arnold, Hal, and Nadav - their feedback would be very valuable here.
> 
> Michael
> 
>> On Jul 28, 2015, at 12:21 PM, Mikhail Zolotukhin <mzolotukhin at apple.com> wrote:
>> 
>> Hi James,
>> 
>> Thanks for your comments!
>> 
>>> On Jul 28, 2015, at 1:08 AM, James Molloy <james at jamesmolloy.co.uk> wrote:
>>> 
>>> Hi Michael,
>>> 
>>> Thanks for the feedback.
>>> 
>>> > As a show-case, how does it help to simplify the aforementioned createEmptyBlock?
>>> 
>>> That's a good example. If you look at http://reviews.llvm.org/D11530 line 916 in the .cpp file, there's a function "versionWidenAndInterleaveLoop" that does basically createEmptyBlock(). It also does more stuff - it performs the loop interleave too - but as an example showcase it should do.
>>> 
>>> For the vectorizer, I think the cleanest thing would be to add a delegate hook that's called whenever an instruction is about to be cloned. Then, the vectorizer could override that hook and provide its own implementation (instead of cloning an instruction completely, it'd replace it with a vector variant. So the entire main body of the vectorizer would look something like this:
>>> 
>>> struct Delegate : public LoopEditor::Delegate {
>>>   Value *hookCloneInstruction(Instruction *OldI, IRBuilder<> &IRB) {
>>>     if (isControlFlowInst(OldI)) return IRB.CreateExtractElement; // Don't vectorize branches
>>>     return createVectorVersionOfInstruction(OldI, VF, IRB); // Defined somewhere in LoopVectorize
>>>   }
>>> };
>>> BasicBlock *PredBB;
>>> Delegate D;
>>> LoopEditor LE(L, DT, SE, LI);
>>> LoopEditor VectorLE = LE.versionWidenAndInterleaveLoop(UF, PredBB, &D);
>>> VectorLE.widen(VF); // Widen further, so it's widened by VF*UF. Only affects induction variable steps and trip count.
>>> 
>>> How does that look to you?
>> That looks good, I always wanted to factor this stuff out too:)
>>> 
>>> > Also, I think it’s very important to explicitly define requirements for loops we can handle, and what we do with other loops. E.g. what do we do if a loop has multiple back-edges? What do we do if a loop body has complicated control-flow?
>>> 
>>> Good point, agreed. I think we need to handle loops in LoopSimplify form and LCSSA form at least. Some functions may not be able to handle arbitrary control flow but most should be agnostic to it (for example making sure control flow is correct in a vectorized loop, should be the user's job not LoopEditor's). But defining exactly what loop type each function supports and ensuring it does support that is a good idea.
>>> 
>>> I'm also trying to work out what's the best way of getting this in-tree. It's a lot of code and I don't want to dump it in one go, but getting pre-commit review on every little step would make it take forever. Do you have any suggestions as to methodology? I was thinking getting review on the header (API), then committing that with stub implementations and implementing them bit by bit from there...
>> I agree, and this way seems reasonable to me. However, it's really tempting to see what we end up with before we commit to it - so, if you have some final patch with this API applied to most of its users (even if this patch is huge), it would be cool if you share it.
>> 
>> Also, how will we test it? Should we create a dummy-pass that will invoke these transforms on a given loop (without any cost models, just doing what you ask it to do)?
>> 
>> Michael
>>> 
>>> Cheers,
>>> 
>>> James
>>> 
>>> On Mon, 27 Jul 2015 at 20:23 Michael Zolotukhin <mzolotukhin at apple.com> wrote:
>>> Hi James,
>>> 
>>>> On Jul 27, 2015, at 10:04 AM, James Molloy <james at jamesmolloy.co.uk> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> LLVM currently lacks a single coherent way to edit, modify and transform loops. We have the Loop abstraction that provides some functionality, and we have a cluster of functions and abstractions in LoopUtils, but each is fairly cumbersome to use. There's also VersionedLoop, but that's quite specific and not really extendable/composable.
>>>> 
>>>> I've recently been trying to do some prototyping of a high level loop optimization (unroll-and-jam) and ended up with quite a bit of spaghetti code doing what should be simple transforms (as another example, look at LoopVectorize::createEmptyBlock()!)
>>> Totally agree here!
>>>> 
>>>> So I decided to prototype an API for performing more high level operations on LLVM Loops - cloning, widening, interleaving, stitching the output of one loop into another loop, setting loop trip counts, peeling... and all while keeping loopinfo, domtree and SCEV updated.
>>>> 
>>>> I've uploaded my current prototype to Phab here: http://reviews.llvm.org/D11530
>>>> 
>>>> It's still at the prototype stage so there are lots of things that need fixing - sorting out the testing story being top of my list. But I'm looking for feedback on:
>>>>   * Is this a useful abstraction? Is it something the community would like to see in-tree (eventually)?
>>> I think it could be very useful - depending on how easy it would be to use. As a show-case, how does it help to simplify the aforementioned createEmptyBlock?
>>> 
>>>>   * Does the API design make sense? Are there obvious things I've missed that people need?
>>> From the first glance, it seems reasonable. One probably missing thing: we have peelAfter, but we may also need peelBefore (it would be helpful if we create prologues for alignment for instance).
>>> 
>>> Also, I think it’s very important to explicitly define requirements for loops we can handle, and what we do with other loops. E.g. what do we do if a loop has multiple back-edges? What do we do if a loop body has complicated control-flow?
>>> 
>>> Thanks,
>>> Michael
>>> 
>>>> 
>>>> I've thought about the use cases of the loop vectorizer, loop distribution, loop unrolling, loop versioning, and I think the current design can solve their use cases fairly elegantly. I've used this API to write a fairly sophisticated unroll-and-jam transform in very few lines too.
>>>> 
>>>> Any feedback would be gratefully appreciated!
>>>> 
>>>> Cheers,
>>>> 
>>>> James
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>