[llvm-dev] Writing loop transformations on the right representation is more productive

Sat Jan 25 18:05:33 PST 2020

Am Mi., 22. Jan. 2020 um 02:14 Uhr schrieb Renato Golin <rengolin at gmail.com>:
> Precisely why I don't think adding more passes is an advantage to adoption. :)

The alternative is to have a single pass for each kind of loop
transformations, i.e. many more than a single loop transformation
pass.

> > I don't find the argument "there might be bugs" very convincing.
>
> Sorry, it wasn't an argument, just a jest at the expense of some old
> commercial front-ends.
>
> Pass ordering is complex no matter how you slice it.

Indeed. I am already concerned of additional phase ordering problems
if we implement each loop transformation in its own pass, e.g. between
loop fusion and loop distribution. Do we first fuse into as few loops
as possible and then distribute, or the other way around?

> > This was relevant to the discussion that /all/ front-ends would have
> > to generate good-enough annotations for loop transformations. Only the
> > ones that do might enable loop optimization passes.
>
> This doesn't scale. You end up with a few metadata from a single
> front-end justifying a huge new pass that does a few things in even
> fewer cases.

I'd think that the metadata is not front-end/language specific.

A language where most instructions can access any memory is arguable
harder to optimize than a language where only a selected set of
instructions can do that. But the metadata describing what memory an
instruction can access is not frond-end specific.

> Your proposal is a new pass, that does some new things and therefore
> shouldn't need to be incremental (only when the correct info is
> present).
>
> But that means now we have two loop transformation infrastructures
> that could radically change the IR (and the loop metadata, if any).

I don't think LLVM's current loop optimizations are well developed.
Only LoopVectorize and LoopUnroll are even enabled by default.

> Which one passes first will have the advantage, as I think we agreed,
> pass ordering is not trivial.

Which is one of the things this proposal addresses.

> If you restrict this new pass to only doing transformations that are
> guaranteed to be valid and better (for some definition of both),

This is a strange argument. You want transformation that are invalid
and/or worse?

> then
> you'll end like Polly that does wonders to a very small selection of
> loops and nothing at all to most of them, just wasted time looking at
> all the possibilities.

This is exactly what this proposal is addressing.

I think the small selection mostly stems from Polly requiring
well-formed IR. Very often it could algorithmically optimize a
problem, but cannot represent the IR in its internal representation: a
SCoP which is based on ISL's schedule tree representation. The main
motivation of the proposal is to exactly address this, meaning there
is no external library that restricts what we can represent.

A second reason is that Polly relies on ISL's solver algorithm that
minimizes re-use distance and parallelism while the pass-based
optimizers use hand-written heuristics. I want to make both possible
withing the same framework.

> If you want to be more aggressive, then the IR will change more often
> and the pass ordering problem gets worse, requiring changes in later
> passes to cope with the changes.

My goal is to get a hierarchical optimization order: Once the
higher-level optimizations (that is, loops) have been decided on, only
lower-level optimizations (InstCombine, instruction motion, CFG
simplification, etc) are left to do. If we have to re-do loop
optimizations, something went very wrong.

Michael