<div dir="ltr"><div>Hi Michael,</div><div><br></div><div>Thanks for the RFC!  While I see that the identifier is needed to be able to apply transform directives to new loop (which is an important capability), I'm wondering if the identifier is meant to generically separate the source location from the optimization directive.  Will I be able to add identifiers to my loops and have the optimization directives live elsewhere?  If that is the case, then the ordering of the transformations could be difficult to understand.</div><div><br></div><div>Thanks again,</div><div>Brian</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Dec 18, 2019 at 11:49 AM Michael Kruse via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Am Mi., 18. Dez. 2019 um 09:37 Uhr schrieb David Greene <<a href="mailto:dag@cray.com" target="_blank">dag@cray.com</a>>:<br>

><br>

> Hi Michael, thanks for this RFC!  I just have a few questions and<br>

> comments to start off.<br>

><br>

> Are these pragmas meant to be advisory or prescriptive (when legal)?<br>

> From your description and motivation I assume the latter but I wanted to<br>

> double-check.<br>

<br>

For loop hint style directives, they are advisory. However, I would<br>

expect the compiler to able to emit a diagnostic (not an error) if<br>

applying it fails, as #pragma clang loop vectorize(enable) already<br>

does.<br>

Determining the safety of a transformations depends on compiler<br>

capability (which may change between versions of the compiler), hence<br>

cannot be prescriptive.<br>

<br>

For OpenMP style directives (and possibly assume_safey), they should<br>

be prescriptive.<br>

<br>

<br>

> >  * Hints are emitted in form of metadata (MDNode) that can be dropped<br>

> > by mid-end optimizers<br>

><br>

> Below you state that metadata will be used to encode the transformations<br>

> and order.  Doesn't this suffer from the same problem?<br>

<br>

I did not claim to be able to solve every problem ;-)<br>

But we should work towards removing sources where the metadata can be<br>

lost (e.g. <a href="https://reviews.llvm.org/D53876" rel="noreferrer" target="_blank">https://reviews.llvm.org/D53876</a> and<br>

<a href="https://reviews.llvm.org/D66892" rel="noreferrer" target="_blank">https://reviews.llvm.org/D66892</a>), but it will always be best effort by<br>

the definition of MDNode.<br>

<br>

However, emitting already transformed IR using the OpenMPIRBuilder<br>

would eliminate this source of transformations being forgotten.<br>

<br>

<br>

> > The selection is currently limited to the passes LLVM currently<br>

> > supports. I am working on more transformations that currently are only<br>

> > picked-up by Polly. The largest difference to loop hints is that it<br>

> > allows to specify in which order the transformations are applied,<br>

> > which is ignored by clang's current LoopHint attribute. That is, the<br>

> > following for reverses the loop, then unrolls it.<br>

> ><br>

> >     #pragma clang transform unroll partial(2)<br>

> >     #pragma clang transform reverse<br>

> >     for (int i = 0; i < 128; i+=1)<br>

> >       body(i);<br>

><br>

> This seems unintuitive to me.  I would expect this to unroll first and<br>

> then reverse.  I get the "inner-to-outer" ordering you present here, but<br>

> I wonder if it will be too easy for users to get something unexpected.<br>

<br>

I find this order more intuitive and matches the OpenMP semantics. For instance,<br>

<br>

    #pragma omp parallel for<br>

    for (int i = 0; i < 128; i+=1)<br>

<br>

has the same semantics as<br>

<br>

    #pragma omp parallel<br>

    #pragma omp for<br>

   for (int i = 0; i < 128; i+=1)<br>

<br>

which is the same as<br>

<br>

    #pragma omp parallel<br>

    {<br>

       #pragma omp for<br>

       for (int i = 0; i < 128; i+=1)<br>

       ..<br>

    }<br>

<br>

I think the difference in interpretation comes from either seeing the pragmas as<br>

1. a collection of attributes to the next statement (like<br>

AttributedStmt/LoopHint)<br>

<br>

or<br>

<br>

2. as an statement taking another statement as argument (like<br>

OMPExecutableDirective)<br>

<br>

In trying unify both implementations, I will have to use the latter.<br>

It also avoid the problems you mentioned below. Moreover, for<br>

transformations that do not apply on loops, this interpretation makes<br>

it clear that it consumes a statement with its own scope:<br>

<br>

   #pragma clang transform offload // Compared to "#pragma omp<br>

target", the compiler has to to a legality analysis<br>

   {<br>

     do_something();<br>

     do_something_else();<br>

   }<br>

<br>

<br>

><br>

> Will it be possible to list multiple transformations with one directive?<br>

<br>

No, and one of the reason I decided against reusing #pragma clang loop syntax.<br>

<br>

<br>

> > Furthermore, I intend to implement assigning identifiers to loop to<br>

> > reference them in followup transformations (e.g. tile a loop,<br>

> > parallelize the generated outer loop and vectorize the inner),<br>

><br>

> Do you have an example of this idea?<br>

<br>

The same example as code:<br>

<br>

    #pragma clang transform vectorize on(innername) width(4)<br>

    #pragma clang transform parallelize_thread on(outername)<br>

<br>

    #pragma clang transform tile sizes(32) floor(id(outername))<br>

tile(id(innername))<br>

    for (int i = 0; i < 128; i+=1)<br>

<br>

Without ids, by writing more transformation in the clauses, it could<br>

also be written as<br>

<br>

    #pragma clang transform parallelize_thread // applies to the<br>

outermost loop of the previous transformation<br>

    #pragma clang transform tile sizes(32) tile(vectorize width(4))<br>

    for (int i = 0; i < 128; i+=1)<br>

<br>

Ids are more required when a follow-up transformation applies to<br>

multiple loops, or handy for writing transformations for a specific<br>

target together.<br>

<br>

    #ifdef OPTIMIZE_FOR_COARSE_GRAIN<br>

      #pragma clang transform interchange on(j,thefloor) permutation(thefloor,j)<br>

    #endif<br>

<br>

    #pragma clang transform id(j)<br>

    for (int j = 0; j < n; j+=1) {<br>

      #pragma clang transform tile size(32) floor(id(thefloor))<br>

      for (int i = 0; i < 128; i+=1)<br>

<br>

<br>

Michael<br>

_______________________________________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

</blockquote></div></div>