[llvm-dev] Writing loop transformations on the right representation is more productive

Sat Jan 11 09:43:25 PST 2020

On Sat, 11 Jan 2020 at 00:34, Michael Kruse <llvmdev at meinersbur.de> wrote:
> Yes, as mentioned in the Q&A. Unfortunately VPlan is able to represent
> arbitrary code not has cheap copies.

Orthogonal, but we should also be looking into implementing the cheap
copies in VPlan if we want to search for composable plans.

> This conversion is a possibility and certainly not the main motivation
> for a loop hierarchy.

I know. There are many things that can be done with what you propose,
but we should focus on what's the main motivation.

>From what I can tell, the tree representation is a concrete proposal
for the many year discussion about parallel IR.

The short paper doesn't mention that, nor it discusses other
opportunities to fix pipeline complexity (that is inherent of any
compiler).

I still believe that many of the techniques you propose are meaningful
ways to solve them, but creating another IR will invariably create
some adoption barriers.

Especially when we already have VPlan and MLIR converging now, which
will need to find their own spaces, too.

> I wouldn't have thought that parallelization and offloading was ever
> considered on top of VPlan.

I don't see why not. VPlan is a structure for picking a path through
composable transformations.

While so far it's being mainly focused at replacing the monolithic
vectorisation, there are concrete plans to look at composition and
more complex idioms.

> Are you arguing against code versioning? It is already done today by
> multiple passes such as LoopVersioningLICM, LoopDistribute,
> LoopUnrollAndJam and LoopVectorize. The proposal explicitly tries to
> avoid code bloat by having just one fallback copy. Runtime conditions
> can be chosen more or less optimistically, but I don't see how this
> should be an argument for all kinds of versioning.

No. I'm cautious to the combination of heuristics search and
versioning, especially when the conditions are runtime based. It may
be hard to CSE them later.

The paths found may not be the most optimal in terms of intermediate states.

> > Don't get me wrong, I like the idea, it's a cool experiment using some
> > cool data structures and algorithms. But previous experiences with the
> > pass manager have, well, not gone smooth in any shape or form.
>
> What experiments? I don't see a problem if the pass manger has to
> invalidate analysis are re-run canonicalization passes. This happens
> many times in the default pass pipelines. In addition, this
> invalidation is only necessary if the loop optimization pass optimizes
> something, in which case the additional cost should be justified.

My point goes back to doing that in VPlan, then tree. The more
back-and-forth IR transformations we add to the pipeline, the more
brittle it will be.

The original email also proposes, for the future, to do all sorts of
analyses and transformations in the tree representation, and that will
likely be incompatible with (or at least not propagated through) the
conversions.

> I don't think the proposal qualifies as including a full-flexible new
> pass manger, at least no more than the current mechanism LoopVectorize
> uses to run passes on VPlan (LoopVectorizationPlanner::plan).

Sorry, that came out stronger than it should have been. I agree it's
not a "whole new pass manager".

> While I still think the goals of VPlan and a loop hierarchy are
> different, I expect VPlan to be production-ready earlier than this
> proposal. I fear that combining them would delay the both.

I get it, but I fear taking a completely different approach may make
it harder to get your proposal to show benefits any time soon.

> > https://xkcd.com/927/
>
> While I can never find this xkcd not funny, a the loop hierarchy is
> not intended to be universal.

Sorry, poetic license. :)

I tried to reflect the perils of creating too many, sometimes competing, IRs.

> In a previous RFC [8] I tried to NOT introduce a data structure but to
> re-use LLVM-IR. The only discussion there was about the RFC, was about
> not to 'abuse' the LLVM-IR.
>
> https://lists.llvm.org/pipermail/llvm-dev/2017-October/118169.html
> https://lists.llvm.org/pipermail/llvm-dev/2017-October/118258.html
>
> I definitely see the merits of using fewer data structures, but it is
> also hard to re-use something existing for a different purpose (in
> this case: VPlan) without making both more complex.

My point about avoiding more structures and IRs was related to VPlan
and MLIR, not LLVM-IR.

I agree there should be an abstraction layer to do parallelisation
analysis, but we already have two, and I'd rather add many of your
good proposals on those than create a third.

Perhaps it's not clear how we could do that now, but we should at
least try to weigh the options.

I'd seriously look at adding a tree-like annotation as an MLIR
dialect, and use it for lean copies.

> For the foreseeable future, Clang will generate LLVM-IR, but our
> motivation is to (also) optimize C/C++ code. That is, I do not see a
> way to not (also) handle LLVM-IR until Clang is changed to generate
> MLIR (which then again will be another data struture in the system).

Even if/when Clang generates MLIR, there's no guarantee the high-level
dialects will be preserved until the vectorisation pass. And other
front-ends may not generate the same quality of annotations.

We may have to re-generate what we need anyway, so no point in waiting
all the front-ends to do what we need as well as all the previous
passes to guarantee to keep it.

cheers,
--renato