[cfe-dev] RFC: User-directed code transformations with #pragma clang transform

Wed Dec 18 09:49:04 PST 2019

Am Mi., 18. Dez. 2019 um 09:37 Uhr schrieb David Greene <dag at cray.com>:
>
> Hi Michael, thanks for this RFC!  I just have a few questions and
> comments to start off.
>
> Are these pragmas meant to be advisory or prescriptive (when legal)?
> From your description and motivation I assume the latter but I wanted to
> double-check.

For loop hint style directives, they are advisory. However, I would
expect the compiler to able to emit a diagnostic (not an error) if
applying it fails, as #pragma clang loop vectorize(enable) already
does.
Determining the safety of a transformations depends on compiler
capability (which may change between versions of the compiler), hence
cannot be prescriptive.

For OpenMP style directives (and possibly assume_safey), they should
be prescriptive.

> >  * Hints are emitted in form of metadata (MDNode) that can be dropped
> > by mid-end optimizers
>
> Below you state that metadata will be used to encode the transformations
> and order.  Doesn't this suffer from the same problem?

I did not claim to be able to solve every problem ;-)
But we should work towards removing sources where the metadata can be
lost (e.g. https://reviews.llvm.org/D53876 and
https://reviews.llvm.org/D66892), but it will always be best effort by
the definition of MDNode.

However, emitting already transformed IR using the OpenMPIRBuilder
would eliminate this source of transformations being forgotten.

> > The selection is currently limited to the passes LLVM currently
> > supports. I am working on more transformations that currently are only
> > picked-up by Polly. The largest difference to loop hints is that it
> > allows to specify in which order the transformations are applied,
> > which is ignored by clang's current LoopHint attribute. That is, the
> > following for reverses the loop, then unrolls it.
> >
> >     #pragma clang transform unroll partial(2)
> >     #pragma clang transform reverse
> >     for (int i = 0; i < 128; i+=1)
> >       body(i);
>
> This seems unintuitive to me.  I would expect this to unroll first and
> then reverse.  I get the "inner-to-outer" ordering you present here, but
> I wonder if it will be too easy for users to get something unexpected.

I find this order more intuitive and matches the OpenMP semantics. For instance,

    #pragma omp parallel for
    for (int i = 0; i < 128; i+=1)

has the same semantics as

    #pragma omp parallel
    #pragma omp for
   for (int i = 0; i < 128; i+=1)

which is the same as

    #pragma omp parallel
    {
       #pragma omp for
       for (int i = 0; i < 128; i+=1)
       ..
    }

I think the difference in interpretation comes from either seeing the pragmas as
1. a collection of attributes to the next statement (like
AttributedStmt/LoopHint)

or

2. as an statement taking another statement as argument (like
OMPExecutableDirective)

In trying unify both implementations, I will have to use the latter.
It also avoid the problems you mentioned below. Moreover, for
transformations that do not apply on loops, this interpretation makes
it clear that it consumes a statement with its own scope:

   #pragma clang transform offload // Compared to "#pragma omp
target", the compiler has to to a legality analysis
   {
     do_something();
     do_something_else();
   }

>
> Will it be possible to list multiple transformations with one directive?

No, and one of the reason I decided against reusing #pragma clang loop syntax.

> > Furthermore, I intend to implement assigning identifiers to loop to
> > reference them in followup transformations (e.g. tile a loop,
> > parallelize the generated outer loop and vectorize the inner),
>
> Do you have an example of this idea?

The same example as code:

    #pragma clang transform vectorize on(innername) width(4)
    #pragma clang transform parallelize_thread on(outername)

    #pragma clang transform tile sizes(32) floor(id(outername))
tile(id(innername))
    for (int i = 0; i < 128; i+=1)

Without ids, by writing more transformation in the clauses, it could
also be written as

    #pragma clang transform parallelize_thread // applies to the
outermost loop of the previous transformation
    #pragma clang transform tile sizes(32) tile(vectorize width(4))
    for (int i = 0; i < 128; i+=1)

Ids are more required when a follow-up transformation applies to
multiple loops, or handy for writing transformations for a specific
target together.

    #ifdef OPTIMIZE_FOR_COARSE_GRAIN
      #pragma clang transform interchange on(j,thefloor) permutation(thefloor,j)
    #endif

    #pragma clang transform id(j)
    for (int j = 0; j < n; j+=1) {
      #pragma clang transform tile size(32) floor(id(thefloor))
      for (int i = 0; i < 128; i+=1)

Michael