<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <div class="moz-cite-prefix">On 09/12/2017 10:26 PM, Gerolf

      Hoflehner wrote:<br>

    </div>

    <blockquote

      cite="mid:14DD24A7-F9AB-4137-8CFA-F85A0FE12B41@apple.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode:

        space; line-break: after-white-space;" class=""><br class="">

        <div><br class="">

          <blockquote type="cite" class="">

            <div class="">On Sep 11, 2017, at 10:47 PM, Hal Finkel via

              llvm-dev <<a moz-do-not-send="true"

                href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>>

              wrote:</div>

            <br class="Apple-interchange-newline">

            <div class="">

              <div bgcolor="#FFFFFF" text="#000000" class="">

                <p class=""><br class="">

                </p>

                <div class="moz-cite-prefix">On 09/11/2017 12:26 PM,

                  Adam Nemet wrote:<br class="">

                </div>

                <blockquote

                  cite="mid:203EDEE9-A19B-4180-9736-CC9C2E7BD4FB@apple.com"

                  type="cite" class=""> Hi Hal, Tobias, Michael and

                  others,

                  <div class=""><b>...</b>

                    <div class="">

                      <div class=""><br class="">

                      </div>

                      <div class="">One thing that I’d like to see more

                        details on is what this means for the evolution

                        of loop transformations in LLVM.</div>

                      <div class=""><br class="">

                      </div>

                      <div class="">Our more-or-less established

                        direction was so far to incrementally improve

                        and generalize the required analyses (e.g. the

                        LoopVectorizer’s dependence analysis + loop

                        versioning analysis into a stand-alone analysis

                        pass (LoopAccessAnalysis)) and then build new

                        transformations (e.g. LoopDistribution,

                        LoopLoadElimination, LICMLoopVersioning) on top

                        of these.</div>

                      <div class=""><br class="">

                      </div>

                      <div class="">The idea was that infrastructure

                        would be incrementally improved from two

                        directions:</div>

                      <div class=""><br class="">

                      </div>

                      <div class="">- As new transformations are built

                        analyses have to be improved (e.g. past

                        improvements to LAA to support the

                        LoopVersioning utility, future improvements for

                        full LoopSROA beyond just store->load

                        forwarding [1] or the improvements to LAA for

                        the LoopFusion proposal[2])</div>

                      <div class=""><br class="">

                      </div>

                      <div class="">- As more complex loops would have

                        to be analyzed we either improve LAA or make

                        DependenceAnalysis a drop-in replacement for the

                        memory analysis part in LAA</div>

                    </div>

                  </div>

                </blockquote>

                <br class="">

                Or we could use Polly's dependence analysis, which I

                believe to be more powerful, more robust, and more

                correct than DependenceAnalysis. I believe that the

                difficult part here is actually the pairing with

                predicated SCEV or whatever mechanism we want to use

                generate runtime predicates (this applies to use of

                DependenceAnalysis too).<br class="">

              </div>

            </div>

          </blockquote>

          <div><br class="">

          </div>

          What is a good way to measure these assertions (More powerful,

          more robust)? Are you saying the LLVM Dependence Analysis is

          incorrect or do you actually mean less conservative (or "more

          accurate" or something like that)?<br class="">

        </div>

      </div>

    </blockquote>

    <br>

    Sebastian's email covers the issues with the DependenceAnalysis pass

    pretty well.<br>

    <br>

    Regarding what's in LoopAccessAnalysis, I believe it to be correct,

    but more limited. It is not clear to me that LAA is bad at what it

    does based on what the vectorizer can handle. LAA could do better in

    some cases with non-unit-stride loops. Polly also handles

    piecewise-affine functions, which allows the modeling of loops with

    conditionals. Extending LAA to handle loop nests, moreover, seems

    likely to be non-trivial.<br>

    <br>

    Regardless, measuring these differences certainly seems like a good

    idea. I think that we can do this using optimization remarks. LAA

    already emits optimization remarks for loops in which it finds

    unsafe memory dependencies. Polly also emits optimization remarks.

    We may need to iterate some in order to setup a good comparison, but

    we should be able to collect statistics (and other information) by

    compiling code using -fsave-optimization-record (in combination with

    some other flags), and then analyzing the resulting YAML files.<br>

    <br>

    <blockquote

      cite="mid:14DD24A7-F9AB-4137-8CFA-F85A0FE12B41@apple.com"

      type="cite">

      <div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode:

        space; line-break: after-white-space;" class="">

        <div>

          <blockquote type="cite" class="">

            <div class="">

              <div bgcolor="#FFFFFF" text="#000000" class=""> <br

                  class="">

                <blockquote

                  cite="mid:203EDEE9-A19B-4180-9736-CC9C2E7BD4FB@apple.com"

                  type="cite" class="">

                  <div class="">

                    <div class="">

                      <div class=""><br class="">

                      </div>

                      <div class="">While this model may be slow it has

                        all the benefits of the incremental development

                        model.</div>

                    </div>

                  </div>

                </blockquote>

                <br class="">

                The current model may have been slow in many areas, but

                I think that's mostly a question of development effort.

                My largest concern about the current model is that, to

                the extent that we're implementing classic loop

                transformations (e.g., fusion, distribution,

                interchange, skewing, tiling, and so on), we're

                repeating a historical design that is known to have

                several suboptimal properties. Chief among them is the

                lack of integration: many of these transformations are

                interconnected, and there's no good pass ordering in

                which to make independent decisions. Many of these

                transformations can be captured in a single model and we

                can get much better results by integrating them. There's

                also the matter of whether building these transformation

                on SCEV (or IR directly) is the best underlying

                infrastructure, or whether parts of Polly would be

                better.<br class="">

              </div>

            </div>

          </blockquote>

          <div><br class="">

          </div>

          I believe that is true. What I wonder is is there a good

          method to reason about it?</div>

      </div>

    </blockquote>

    <br>

    If I understand what you mean, one way to look at it is this: This

    is not a canonicalization problem. Picking an optimal way to

    interchange loops may depend on how the result can be skewed and/or

    tiled, picking an optimal way to distribute loops often depends on

    what can be done afterward in each piece. Optimal here generally

    involves reasoning about the memory hierarchy (e.g., cache

    properties), available prefetching streams, register-file size, and

    so on.<br>

    <br>

    I know that I've seen some good examples in papers over the years

    that illustrate the phase-ordering challenges. Hopefully, someone

    will jump in here with some good references. One classic one is:

    William Pugh. Uniform Techniques for Loop Optimization. 1991.<br>

    <br>

    <blockquote

      cite="mid:14DD24A7-F9AB-4137-8CFA-F85A0FE12B41@apple.com"

      type="cite">

      <div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode:

        space; line-break: after-white-space;" class="">

        <div> Perhaps concrete examples or perhaps opt-viewer based

          comparisons on large sets of benchmarks? In the big picture

          you could make such a modeling argument for all compiler

          optimizations.<br class="">

        </div>

      </div>

    </blockquote>

    <br>

    Certainly. However, in this case there's a well-studied unified

    model for this set of optimizations known to reduce phase-ordering

    effects. That's not true in general.<br>

    <br>

    <blockquote

      cite="mid:14DD24A7-F9AB-4137-8CFA-F85A0FE12B41@apple.com"

      type="cite">

      <div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode:

        space; line-break: after-white-space;" class="">

        <div>

          <blockquote type="cite" class="">

            <div class="">

              <div bgcolor="#FFFFFF" text="#000000" class=""> <br

                  class="">

                That having been said, I think that integrating this

                technology into LLVM will also mean applying appropriate

                modularity. I think that we'll almost definitely want to

                make use of the dependence analysis separately as an

                analysis. We'll want to decide which of these

                transformations will be considered canonicalization (and

                run in the iterative pipeline) and which will be

                lowering (and run near the vectorizer). LoopSROA

                certainly sounds to me like canonicalization, but loop

                fusion might also fall into that category (i.e., we

                might want to fuse early to enable optimizations and

                then split late).<br class="">

                <br class="">

                <blockquote

                  cite="mid:203EDEE9-A19B-4180-9736-CC9C2E7BD4FB@apple.com"

                  type="cite" class="">

                  <div class="">

                    <div class="">

                      <div class=""><br class="">

                      </div>

                      <div class="">Then there is the question of use

                        cases.  It’s fairly obvious that anybody wanting

                        to optimize a 5-deep highly regular loop-nest

                        operating on arrays should use Polly.  On the

                        other hand it’s way less clear that we should

                        use it for singly or doubly nested

                        not-so-regular loops which are the norm in

                        non-HPC workloads.</div>

                    </div>

                  </div>

                </blockquote>

                <br class="">

                This is clearly a good question, but thinking about

                Polly as a set of components, not as a monolithic

                transformation component, I think that polyhedral

                analysis and transformations can underlie a lot of the

                transformations we need for non-HPC code (and, which

                I'll point out, we need for modern HPC code too). In

                practice, the loops that we can actually analyze have

                affine dependencies, and Polly does, or can do, a better

                job at generating runtime predicates and dealing with

                piecewise-linear expressions than our current

                infrastructure.<br class="">

                <br class="">

                In short, I look at Polly as two things: First, an

                infrastructure for dealing with loop analysis and

                transformation. I view this as being broadly applicable.

                Second, an application of that to apply

                cost-model-driven classic loop transformations. To some

                extent this is going to be more useful for HPC codes,

                but also applies to machine learning, signal processing,

                graphics, and other areas. <br class="">

              </div>

            </div>

          </blockquote>

          I’m wondering if it could be used for pointing out headroom

          for the existing LLVM ecosystem (*)<br class="">

        </div>

      </div>

    </blockquote>

    <br>

    Sure.<br>

    <br>

    <blockquote

      cite="mid:14DD24A7-F9AB-4137-8CFA-F85A0FE12B41@apple.com"

      type="cite">

      <div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode:

        space; line-break: after-white-space;" class="">

        <div><br class="">

          <blockquote type="cite" class="">

            <div class="">

              <div bgcolor="#FFFFFF" text="#000000" class=""> <br

                  class="">

                <blockquote

                  cite="mid:203EDEE9-A19B-4180-9736-CC9C2E7BD4FB@apple.com"

                  type="cite" class="">

                  <div class="">

                    <div class="">

                      <div class=""><br class="">

                      </div>

                      <div class="">And this brings me to the

                        maintenance question.  Is it reasonable to

                        expect people to fix Polly when they have a

                        seemingly unrelated change that happens to break

                        a Polly bot.</div>

                    </div>

                  </div>

                </blockquote>

                <br class="">

                The eventual goal here is to have this technology in

                appropriate parts of the main pipeline, and so the

                question here is not really about breaking a "Polly

                bot", but just about a "bot" in general. I've given this

                question some thought and I think it sits in a

                reasonable place in the risk-reward space. The answer

                would be, yes, we'd need to treat this like any other

                part of the pipeline. However, I believe that Polly has

                as many, or more, active contributors than essentially

                any other individual part of the mid-level optimizer or

                CodeGen. As a result, there will be people around in

                many time zones to help with problems with Polly-related

                code.<br class="">

                <br class="">

                <blockquote

                  cite="mid:203EDEE9-A19B-4180-9736-CC9C2E7BD4FB@apple.com"

                  type="cite" class="">

                  <div class="">

                    <div class="">

                      <div class="">  As far as I know, there were

                        companies in the past that tried Polly without a

                        whole lot of prior experience.  It would be

                        great to hear what the experience was before

                        adopting Polly at a much larger scale.</div>

                    </div>

                  </div>

                </blockquote>

                <br class="">

                I'm also interested, although I'll caution against

                over-interpreting any evidence here (positive or

                negative). Before a few weeks ago, Polly didn't

                effectively run in the pipeline after inlining, and so I

                doubt it would have been much use outside of embedded

                environments (and maybe some HPC environments) with

                straightforwardly-presented C code. It's only now that

                this has been fixed that I find the possibility of

                integrating this in production interesting.<br class="">

              </div>

            </div>

          </blockquote>

          <div><br class="">

          </div>

          That is a good point. There are also biases independent of

          past experiences (for disclosure mine is (*) above). But I

          think it is objective to say a Polly integration is a big

          piece to swallow.Your pro-Polly argument lists a number of

          categories that I think could be reasoned about individually

          and partly evaluated with a data-driven approach:</div>

        <div>A) Architecture</div>

        <div>- support for autoparallelism</div>

        <div>- support for accelerators</div>

        <div>- isl- rewrite? etc</div>

        <div>...</div>

        <div>B) Modelling</div>

        <div>- polyhedral model</div>

        <div>- temporal locality</div>

        <div>- spatial locality </div>

        <div>…</div>

        <div>C) Analysis/Optimizations</div>

        <div>- Dependence Analysis</div>

        <div>- Transformation effective/power (loop nests, quality of

          transformations, #vectorizable loops etc)</div>

        <div><br class="">

        </div>

        <div>A) is mostly Polly independent (except for the isl question

          I guess). For B and C performance/ compile-time /opt-viewer

          data on a decent/wide range of benchmarks possibly at

          different optimization levels (O2, O3, LTO, PGO etc and

          combinations) should provide data-driven insight into

          costs/benefits. <br>

        </div>

      </div>

    </blockquote>

    <br>

    I agree. In practice, the first question is: Are will willing to

    take on Polly (+isl), in whole or in part, as a build dependency? If

    the answer is yes, the next question is: what parts should be reused

    or refactored for use in other parts of the pipeline? My argument is

    that we should take on Polly, or most of it, as a build dependency.

    Work on better unifying the developer communities as we start

    experimenting with other kinds of integration. This will, however,

    allow us to provide to all of our users these transformations

    through pragmas (and other kinds of optional enablement). This is an

    important first step.<br>

    <br>

    I'm not sure exactly how good this is, but polly has LNT-submitting

    bots, so the website can generate a comparison (e.g., <a

      class="moz-txt-link-freetext"

      href="http://lnt.llvm.org/db_default/v4/nts/71208?compare_to=71182">http://lnt.llvm.org/db_default/v4/nts/71208?compare_to=71182</a>).

    Looking at this comparison shows a number of potential problems but

    also cases where Polly really helps (and, FWIW, the largest two

    compile-time regressions are also associated with very large

    execution performance improvements). My first focus would certainly

    be on pragma-driven enablement.<br>

    <br>

    Thanks again,<br>

    Hal<br>

    <br>

    <blockquote

      cite="mid:14DD24A7-F9AB-4137-8CFA-F85A0FE12B41@apple.com"

      type="cite">

      <div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode:

        space; line-break: after-white-space;" class="">

        <div><br class="">

        </div>

        <div>Cheers</div>

        <div>Gerolf</div>

        <div><br class="">

        </div>

        <div><br class="">

        </div>

        <div><br class="">

        </div>

        <div><br class="">

          <blockquote type="cite" class="">

            <div class="">

              <div bgcolor="#FFFFFF" text="#000000" class=""> <br

                  class="">

                Thanks again,<br class="">

                Hal <br class="">

                <br class="">

                <blockquote

                  cite="mid:203EDEE9-A19B-4180-9736-CC9C2E7BD4FB@apple.com"

                  type="cite" class="">

                  <div class="">

                    <div class="">

                      <div class=""><br class="">

                      </div>

                      <div class="">Adam</div>

                      <div class=""><br class="">

                      </div>

                      <div class="">[1] <a moz-do-not-send="true"

href="http://lists.llvm.org/pipermail/llvm-dev/2015-November/092017.html"

                          class="">http://lists.llvm.org/pipermail/llvm-dev/2015-November/092017.html</a></div>

                      <div class="">

                        <div class="">[2] <a moz-do-not-send="true"

                            href="http://lists.llvm.org/pipermail/llvm-dev/2016-March/096266.html"

                            class="">http://lists.llvm.org/pipermail/llvm-dev/2016-March/096266.html</a></div>

                        <div class=""><br class="">

                        </div>

                        <div class=""><br class="">

                        </div>

                      </div>

                      <blockquote type="cite" class="">

                        <div class="">

                          <div bgcolor="#FFFFFF" text="#000000" class=""><b

                              style="font-weight:normal;"

                              id="docs-internal-guid-2e372c58-3ebe-fc55-3349-ab4430850fbc"

                              class=""> <br class="">

                              <div style="line-height: 1.38; margin-top:

                                0pt; margin-bottom: 0pt;" class=""><span style="font-size: 11pt; font-family: Arial; background-color: transparent; font-weight: 400; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-position: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;" class="">Sincerely,</span></div>

                              <div style="line-height: 1.38; margin-top:

                                0pt; margin-bottom: 0pt;" class=""><span style="font-size: 11pt; font-family: Arial; background-color: transparent; font-weight: 400; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-position: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;" class="">Hal (on behalf of myself, Tobias Grosser, and Michael Kruse, with feedback from<b class=""> </b>several other active Polly developers)</span></div>

                              <b>...</b><span style="font-size: 11pt; font-family: Arial; background-color: transparent; font-weight: 400; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-position: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;" class=""></span></b>

                            <div class=""><br

                                class="webkit-block-placeholder">

                            </div>

                            <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

                          </div>

_______________________________________________<br class="">

                          LLVM Developers mailing list<br class="">

                          <a moz-do-not-send="true"

                            href="mailto:llvm-dev@lists.llvm.org"

                            class="">llvm-dev@lists.llvm.org</a><br

                            class="">

                          <a moz-do-not-send="true"

                            class="moz-txt-link-freetext"

                            href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br

                            class="">

                        </div>

                      </blockquote>

                    </div>

                    <br class="">

                  </div>

                </blockquote>

                <br class="">

                <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

              </div>

              _______________________________________________<br

                class="">

              LLVM Developers mailing list<br class="">

              <a moz-do-not-send="true"

                href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br

                class="">

              <a class="moz-txt-link-freetext"

                href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br

                class="">

            </div>

          </blockquote>

        </div>

        <br class="">

      </div>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </body>

</html>