RFC: LoopEditor, a high-level loop transform toolkit

Tue Jul 28 12:52:58 PDT 2015

I should mention that I don't currently have such a patch - I'll have to do
a bunch of work to create it and start the test suite in the process.
On Tue, 28 Jul 2015 at 20:48, James Molloy <james at jamesmolloy.co.uk> wrote:

> Hi Michael,
>
> +llvmdev,Hal,Nadav
>
> For testing, I was currently thinking of a two pronged approach. Lit tests
> as you suggest with a dummy pass, probably with command line options to
> define what transform to do, and unit tests to test the delegate behaviour
> and return values.
>
> I'll try and produce a mega patch with at least the loop vectoriser moved
> over, then split it up again after review.
>
> Cheers,
>
> James
>
> On Tue, 28 Jul 2015 at 20:27, Michael Zolotukhin <mzolotukhin at apple.com>
> wrote:
>
>> BTW, I think this thread should be on LLVM-dev, and I’d also CC Arnold,
>> Hal, and Nadav - their feedback would be very valuable here.
>>
>> Michael
>>
>> On Jul 28, 2015, at 12:21 PM, Mikhail Zolotukhin <mzolotukhin at apple.com>
>> wrote:
>>
>> Hi James,
>>
>> Thanks for your comments!
>>
>> On Jul 28, 2015, at 1:08 AM, James Molloy <james at jamesmolloy.co.uk>
>> wrote:
>>
>> Hi Michael,
>>
>> Thanks for the feedback.
>>
>> > As a show-case, how does it help to simplify the aforementioned
>> createEmptyBlock?
>>
>> That's a good example. If you look at http://reviews.llvm.org/D11530 line
>> 916 in the .cpp file, there's a function "versionWidenAndInterleaveLoop"
>> that does basically createEmptyBlock(). It also does more stuff - it
>> performs the loop interleave too - but as an example showcase it should do.
>>
>> For the vectorizer, I think the cleanest thing would be to add a delegate
>> hook that's called whenever an instruction is about to be cloned. Then, the
>> vectorizer could override that hook and provide its own implementation
>> (instead of cloning an instruction completely, it'd replace it with a
>> vector variant. So the entire main body of the vectorizer would look
>> something like this:
>>
>> struct Delegate : public LoopEditor::Delegate {
>>   Value *hookCloneInstruction(Instruction *OldI, IRBuilder<> &IRB) {
>>     if (isControlFlowInst(OldI)) return IRB.CreateExtractElement; //
>> Don't vectorize branches
>>     return createVectorVersionOfInstruction(OldI, VF, IRB); // Defined
>> somewhere in LoopVectorize
>>   }
>> };
>> BasicBlock *PredBB;
>> Delegate D;
>> LoopEditor LE(L, DT, SE, LI);
>> LoopEditor VectorLE = LE.versionWidenAndInterleaveLoop(UF, PredBB, &D);
>> VectorLE.widen(VF); // Widen further, so it's widened by VF*UF. Only
>> affects induction variable steps and trip count.
>>
>> How does that look to you?
>>
>> That looks good, I always wanted to factor this stuff out too:)
>>
>>
>> > Also, I think it’s very important to explicitly define requirements
>> for loops we can handle, and what we do with other loops. E.g. what do we
>> do if a loop has multiple back-edges? What do we do if a loop body has
>> complicated control-flow?
>>
>> Good point, agreed. I think we need to handle loops in LoopSimplify form
>> and LCSSA form at least. Some functions may not be able to handle arbitrary
>> control flow but most should be agnostic to it (for example making sure
>> control flow is correct in a vectorized loop, should be the user's job not
>> LoopEditor's). But defining exactly what loop type each function supports
>> and ensuring it does support that is a good idea.
>>
>> I'm also trying to work out what's the best way of getting this in-tree.
>> It's a lot of code and I don't want to dump it in one go, but getting
>> pre-commit review on every little step would make it take forever. Do you
>> have any suggestions as to methodology? I was thinking getting review on
>> the header (API), then committing that with stub implementations and
>> implementing them bit by bit from there...
>>
>> I agree, and this way seems reasonable to me. However, it's really
>> tempting to see what we end up with before we commit to it - so, if you
>> have some final patch with this API applied to most of its users (even if
>> this patch is huge), it would be cool if you share it.
>>
>> Also, how will we test it? Should we create a dummy-pass that will invoke
>> these transforms on a given loop (without any cost models, just doing what
>> you ask it to do)?
>>
>> Michael
>>
>>
>> Cheers,
>>
>> James
>>
>> On Mon, 27 Jul 2015 at 20:23 Michael Zolotukhin <mzolotukhin at apple.com>
>> wrote:
>>
>>> Hi James,
>>>
>>> On Jul 27, 2015, at 10:04 AM, James Molloy <james at jamesmolloy.co.uk>
>>> wrote:
>>>
>>> Hi all,
>>>
>>> LLVM currently lacks a single coherent way to edit, modify and transform
>>> loops. We have the Loop abstraction that provides some functionality, and
>>> we have a cluster of functions and abstractions in LoopUtils, but each is
>>> fairly cumbersome to use. There's also VersionedLoop, but that's quite
>>> specific and not really extendable/composable.
>>>
>>> I've recently been trying to do some prototyping of a high level loop
>>> optimization (unroll-and-jam) and ended up with quite a bit of spaghetti
>>> code doing what should be simple transforms (as another example, look at
>>> LoopVectorize::createEmptyBlock()!)
>>>
>>> Totally agree here!
>>>
>>>
>>> So I decided to prototype an API for performing more high level
>>> operations on LLVM Loops - cloning, widening, interleaving, stitching the
>>> output of one loop into another loop, setting loop trip counts, peeling...
>>> and all while keeping loopinfo, domtree and SCEV updated.
>>>
>>> I've uploaded my current prototype to Phab here:
>>> http://reviews.llvm.org/D11530
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D11530&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=3t7gCOgkqnA-65i6cez_eslTon3hxx5XJKhMBw98kv0&s=yCaxJfCogeqwUH-wlsdHFPHktniLwG3-cZnq1yciR04&e=>
>>>
>>>
>>> It's still at the prototype stage so there are lots of things that need
>>> fixing - sorting out the testing story being top of my list. But I'm
>>> looking for feedback on:
>>>   * Is this a useful abstraction? Is it something the community would
>>> like to see in-tree (eventually)?
>>>
>>> I think it could be very useful - depending on how easy it would be to
>>> use. As a show-case, how does it help to simplify the aforementioned
>>> createEmptyBlock?
>>>
>>>   * Does the API design make sense? Are there obvious things I've missed
>>> that people need?
>>>
>>> From the first glance, it seems reasonable. One probably missing thing:
>>> we have peelAfter, but we may also need peelBefore (it would be helpful if
>>> we create prologues for alignment for instance).
>>>
>>> Also, I think it’s very important to explicitly define requirements for
>>> loops we can handle, and what we do with other loops. E.g. what do we do if
>>> a loop has multiple back-edges? What do we do if a loop body has
>>> complicated control-flow?
>>>
>>> Thanks,
>>> Michael
>>>
>>>
>>> I've thought about the use cases of the loop vectorizer, loop
>>> distribution, loop unrolling, loop versioning, and I think the current
>>> design can solve their use cases fairly elegantly. I've used this API to
>>> write a fairly sophisticated unroll-and-jam transform in very few lines too.
>>>
>>> Any feedback would be gratefully appreciated!
>>>
>>> Cheers,
>>>
>>> James
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150728/abadd278/attachment.html>