[llvm-dev] [RFC] Polly Status and Integration

Fri Sep 1 14:49:37 PDT 2017

On Fri, Sep 1, 2017, at 21:18, Davide Italiano via llvm-dev wrote:
> On Fri, Sep 1, 2017 at 11:47 AM, Hal Finkel via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > Hi everyone, As you may know, stock LLVM does not provide the kind of
> > advanced loop transformations necessary to provide good performance on many
> > applications. LLVM's Polly project provides many of the required
> > capabilities, including loop transformations such as fission, fusion,
> > skewing, blocking/tiling, and interchange, all powered by state-of-the-art
> > dependence analysis. Polly also provides automated parallelization and
> > targeting of GPUs and other accelerators.
> >
> >
> > Over the past year, Polly’s development has focused on robustness,
> > correctness, and closer integration with LLVM. To highlight a few
> > accomplishments:
> >
> >
> > Polly now runs, by default, in the conceptually-proper place in LLVM’s pass
> > pipeline (just before the loop vectorizer). Importantly, this means that its
> > loop transformations are performed after inlining and other
> > canonicalization, greatly increasing its robustness, and enabling its use on
> > C++ code (where [] is often a function call before inlining).
> >
> > Polly’s cost-modeling parameters, such as those describing the target’s
> > memory hierarchy, are being integrated with TargetTransformInfo. This allows
> > targets to properly override the modeling parameters and allows reuse of
> > these parameters by other clients.
> >
> > Polly’s method of handling signed division/remainder operations, which
> > worked around lack of support in ScalarEvolution, is being replaced thanks
> > to improvements being contributed to ScalarEvolution itself (see D34598).
> > Polly’s core delinearization routines have long been a part of LLVM itself.
> >
> > PolyhedralInfo, which exposes a subset of Polly’s loop analysis for use by
> > other clients, is now available.
> >
> > Polly is now part of the LLVM release process and is being included with
> > LLVM by various packagers (e.g., Debian).
> >
> >
> > I believe that the LLVM community would benefit from beginning the process
> > of integrating Polly with LLVM itself and continuing its development as part
> > of our main code base. This will:
> >
> > Allow for wider adoption of LLVM within communities relying on advanced loop
> > transformations.
> >
> > Provide for better community feedback on, and testing of, the code developed
> > (although the story in this regard is already fairly solid).
> >
> > Better motivate targets to provide accurate, comprehensive, modeling
> > parameters for use by advanced loop transformations.
> >
> > Perhaps most importantly, this will allow us to develop and tune the rest of
> > the optimizer assuming that Polly’s capabilities are present (the underlying
> > analysis, and eventually, the transformations themselves).
> >
> >
> > The largest issue on which community consensus is required, in order to move
> > forward at all, is what to do with isl. isl, the Integer Set Library,
> > provides core functionality on which Polly depends. It is a C library, and
> > while some Polly/LLVM developers are also isl developers, it has a large
> > user community outside of LLVM/Polly. A C++ interface was recently added,
> > and Polly is transitioning to use the C++ interface. Nevertheless, options
> > here include rewriting the needed functionality, forking isl and
> > transitioning our fork toward LLVM coding conventions (and data structures)
> > over time, and incorporating isl more-or-less as-is to avoid partitioning
> > its development.
> >
> >
> > That having been said, isl is internally modular, and regardless of the
> > overall integration strategy, the Polly developers anticipate specializing,
> > or even replacing, some of these components with LLVM-specific solutions.
> > This is especially true for anything that touches performance-related
> > heuristics and modeling. LLVM-specific, or even target-specific, loop
> > schedulers may be developed as well.
> >
> >
> > Even though some developers in the LLVM community already have a background
> > in polyhedral-modeling techniques, the Polly developers have developed, and
> > are still developing, extensive tutorials on this topic
> > http://pollylabs.org/education.html and especially
> > http://playground.pollylabs.org.
> >
> >
> > Finally, let me highlight a few ongoing development efforts in Polly that
> > are potentially relevant to this discussion. Polly’s loop analysis is sound
> > and technically superior to what’s in LLVM currently (i.e. in
> > LoopAccessAnalysis and DependenceAnalysis). There are, however, two known
> > reasons why Polly’s transformations could not yet be enabled by default:
> >
> > A correctness issue: Currently, Polly assumes that 64 bits is large enough
> > for all new loop-induction variables and index expressions. In rare cases,
> > transformations could be performed where more bits are required.
> > Preconditions need to be generated preventing this (e.g., D35471).
> >
> > A performance issue: Polly currently models temporal locality (i.e., it
> > tries to get better reuse in time), but does not model spatial locality
> > (i.e., it does not model cache-line reuse). As a result, it can sometimes
> > introduce performance regressions. Polly Labs is currently working on
> > integrating spatial locality modeling into the loop optimization model.
> >
> > Polly can already split apart basic blocks in order to implement loop
> > fusion. Heuristics to choose at which granularity are still being
> > implemented (e.g., PR12402).
> >
> > I believe that we can now develop a concrete plan for moving
> > state-of-the-art loop optimizations, based on the technology in the Polly
> > project, into LLVM. Doing so will enable LLVM to be competitive with
> > proprietary compilers in high-performance computing, machine learning, and
> > other important application domains. I'd like community feedback on what
> > should be part of that plan.
> >
> >
> 
> This, at least on paper, sounds great. I think LLVM could greatly
> benefit from this informations for some applications.
> I have a couple of questions:
> 1) I'm aware there have been attempts in the past to make polyhedral
> value informations available to LLVM, but they've been unsuccessful.
> Do you plan to develop new LLVM passes to overcome this issue?

It would be great to have a polyhedral range analysis in LLVM. We have
most code for this in Polly. Especially with the new isl C++ bindings,
its code could also look very nice!

> 2) As far as I can tell (last I tried, but that was a while ago),
> polly had a significant compile time impact. Do you plan to introduce
> a new -Opolly pipeline?

We do not intend to enable Polly by default just yet. Hence a new option
-O?? or -f** seems appropriate. I suggest to leave the naming debate for
later.

We also worked on performance over the last 2 years:

- Introduced fast arbitrary integers to backup isl (gives ~7x speedup
vs. imath, 2x vs. gmp, and is only 30% slower than native integers)
- Worked with Eli Friedman on compiling all of AOSP. We do this in
"-polly-process-unprofitable" mode, where we run on anything we can
analyze and make sure timeouts and validity checks are all in place.
- Adjusted our core scheduler's assymptotic complexity to not be in
function of number of statements to schedule, but rather the maximal
number of statements that are fused:
https://lirias.kuleuven.be/bitstream/123456789/586108/1/CW706.pdf
- We worked on bailing out fast (but slightly regressed in recent
months). For code that is not amenable to polyhedral 
   optimizations we I would like to see almost zero compile time
   overhead. We have a pure LLVM-IR analysis before Polly, which 
   just scans the IR and decides if we should look at it more
   intensively.

> On the ISL story. I think it would be better to have polly being
> self-contained (with LLVM implementing the needed functionality), but
> I understand that's a major undertaken and it comes at a cost (i.e.
> LLVM will have to maintain this forked/reimplemented library forever
> instead of leveraging upstream work). LLVM tries to minimize the
> amount of required dependencies, but it seems it's OK to add new
> external optional deps (recently, Z3), so that could be an OK path
> forward. I don't have a strong opinion on what's the best solution here, FWIW.

We currently ship a snapshot of isl with Polly. This has shown to be
very useful, as we ship isl "trunk" which is updated quickly in case of
bug reports. Having a single version of isl shipped in sync with polly
is also desirable when receiving bug reports for Polly/isl.

As Hal pointed out, I expect us to complement at least parts of isl with
our own heuristics and analysis. While this is likely the way to go I
believe isl provides solid "defaults" for us to start off with and to
identify where LLVM specific solutions indeed add value.

Best,
Tobias