[llvm-dev] [RFC] Polly Status and Integration
Adam Nemet via llvm-dev
llvm-dev at lists.llvm.org
Mon Sep 11 10:26:23 PDT 2017
Hi Hal, Tobias, Michael and others,
> On Sep 1, 2017, at 11:47 AM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> Hi everyone,
> As you may know, stock LLVM does not provide the kind of advanced loop transformations necessary to provide good performance on many applications. LLVM's Polly project provides many of the required capabilities, including loop transformations such as fission, fusion, skewing, blocking/tiling, and interchange, all powered by state-of-the-art dependence analysis. Polly also provides automated parallelization and targeting of GPUs and other accelerators.
> Over the past year, Polly’s development has focused on robustness, correctness, and closer integration with LLVM. To highlight a few accomplishments:
> Polly now runs, by default, in the conceptually-proper place in LLVM’s pass pipeline (just before the loop vectorizer). Importantly, this means that its loop transformations are performed after inlining and other canonicalization, greatly increasing its robustness, and enabling its use on C++ code (where  is often a function call before inlining).
> Polly’s cost-modeling parameters, such as those describing the target’s memory hierarchy, are being integrated with TargetTransformInfo. This allows targets to properly override the modeling parameters and allows reuse of these parameters by other clients.
> Polly’s method of handling signed division/remainder operations, which worked around lack of support in ScalarEvolution, is being replaced thanks to improvements being contributed to ScalarEvolution itself (see D34598). Polly’s core delinearization routines have long been a part of LLVM itself.
> PolyhedralInfo, which exposes a subset of Polly’s loop analysis for use by other clients, is now available.
> Polly is now part of the LLVM release process and is being included with LLVM by various packagers (e.g., Debian).
> I believe that the LLVM community would benefit from beginning the process of integrating Polly with LLVM itself and continuing its development as part of our main code base. This will:
> Allow for wider adoption of LLVM within communities relying on advanced loop transformations.
> Provide for better community feedback on, and testing of, the code developed (although the story in this regard is already fairly solid).
> Better motivate targets to provide accurate, comprehensive, modeling parameters for use by advanced loop transformations.
> Perhaps most importantly, this will allow us to develop and tune the rest of the optimizer assuming that Polly’s capabilities are present (the underlying analysis, and eventually, the transformations themselves).
> The largest issue on which community consensus is required, in order to move forward at all, is what to do with isl. isl, the Integer Set Library, provides core functionality on which Polly depends. It is a C library, and while some Polly/LLVM developers are also isl developers, it has a large user community outside of LLVM/Polly. A C++ interface was recently added, and Polly is transitioning to use the C++ interface. Nevertheless, options here include rewriting the needed functionality, forking isl and transitioning our fork toward LLVM coding conventions (and data structures) over time, and incorporating isl more-or-less as-is to avoid partitioning its development.
> That having been said, isl is internally modular, and regardless of the overall integration strategy, the Polly developers anticipate specializing, or even replacing, some of these components with LLVM-specific solutions. This is especially true for anything that touches performance-related heuristics and modeling. LLVM-specific, or even target-specific, loop schedulers may be developed as well.
> Even though some developers in the LLVM community already have a background in polyhedral-modeling techniques, the Polly developers have developed, and are still developing, extensive tutorials on this topic http://pollylabs.org/education.html <http://pollylabs.org/education.html> and especially http://playground.pollylabs.org <http://playground.pollylabs.org/>.
> Finally, let me highlight a few ongoing development efforts in Polly that are potentially relevant to this discussion. Polly’s loop analysis is sound and technically superior to what’s in LLVM currently (i.e. in LoopAccessAnalysis and DependenceAnalysis). There are, however, two known reasons why Polly’s transformations could not yet be enabled by default:
> A correctness issue: Currently, Polly assumes that 64 bits is large enough for all new loop-induction variables and index expressions. In rare cases, transformations could be performed where more bits are required. Preconditions need to be generated preventing this (e.g., D35471).
> A performance issue: Polly currently models temporal locality (i.e., it tries to get better reuse in time), but does not model spatial locality (i.e., it does not model cache-line reuse). As a result, it can sometimes introduce performance regressions. Polly Labs is currently working on integrating spatial locality modeling into the loop optimization model.
> Polly can already split apart basic blocks in order to implement loop fusion. Heuristics to choose at which granularity are still being implemented (e.g., PR12402).
> I believe that we can now develop a concrete plan for moving state-of-the-art loop optimizations, based on the technology in the Polly project, into LLVM. Doing so will enable LLVM to be competitive with proprietary compilers in high-performance computing, machine learning, and other important application domains. I’d like community feedback on what should be part of that plan.
One thing that I’d like to see more details on is what this means for the evolution of loop transformations in LLVM.
Our more-or-less established direction was so far to incrementally improve and generalize the required analyses (e.g. the LoopVectorizer’s dependence analysis + loop versioning analysis into a stand-alone analysis pass (LoopAccessAnalysis)) and then build new transformations (e.g. LoopDistribution, LoopLoadElimination, LICMLoopVersioning) on top of these.
The idea was that infrastructure would be incrementally improved from two directions:
- As new transformations are built analyses have to be improved (e.g. past improvements to LAA to support the LoopVersioning utility, future improvements for full LoopSROA beyond just store->load forwarding  or the improvements to LAA for the LoopFusion proposal)
- As more complex loops would have to be analyzed we either improve LAA or make DependenceAnalysis a drop-in replacement for the memory analysis part in LAA
While this model may be slow it has all the benefits of the incremental development model.
Then there is the question of use cases. It’s fairly obvious that anybody wanting to optimize a 5-deep highly regular loop-nest operating on arrays should use Polly. On the other hand it’s way less clear that we should use it for singly or doubly nested not-so-regular loops which are the norm in non-HPC workloads.
And this brings me to the maintenance question. Is it reasonable to expect people to fix Polly when they have a seemingly unrelated change that happens to break a Polly bot. As far as I know, there were companies in the past that tried Polly without a whole lot of prior experience. It would be great to hear what the experience was before adopting Polly at a much larger scale.
> Hal (on behalf of myself, Tobias Grosser, and Michael Kruse, with feedback from several other active Polly developers)
> We thank the numerous people who have contributed to the Polly infrastructure:
> Alexandre Isoard, Andreas Simbuerger, Andy Gibbs, Annanay Agarwal, Armin
> Groesslinger, Ajith Pandel, Baranidharan Mohan, Benjamin Kramer, Bill
> Wendling, Chandler Carruth, Craig Topper, Chris Jenneisch, Christian
> Bielert, Daniel Dunbar, Daniel Jasper, David Blaikie, David Peixotto,
> Dmitry N. Mikushin, Duncan P. N. Exon Smith, Eli Friedman, Eugene
> Zelenko, George Burgess IV, Hans Wennborg, Hongbin Zheng, Huihui Zhang,
> Jakub Kuderski, Johannes Doerfert, Justin Bogner, Karthik Senthil, Logan
> Chien, Lawrence Hu, Mandeep Singh Grang, Matt Arsenault, Matthew
> Simpson, Mehdi Amini, Micah Villmow, Michael Kruse, Matthias Reisinger,
> Maximilian Falkenstein, Nakamura Takumi, Nandini Singhal, Nicolas
> Bonfante, Patrik Hägglund, Paul Robinson, Philip Pfaffe, Philipp Schaad,
> Peter Conn, Pratik Bhatu, Rafael Espindola, Raghesh Aloor, Reid
> Kleckner, Roal Jordans, Richard Membarth, Roman Gareev, Saleem
> Abdulrasool, Sameer Sahasrabuddhe, Sanjoy Das, Sameer AbuAsal, Sam
> Novak, Sebastian Pop, Siddharth Bhat, Singapuram Sanjay Srivallabh,
> Sumanth Gundapaneni, Sunil Srivastava, Sylvestre Ledru, Star Tan, Tanya
> Lattner, Tim Shen, Tarun Ranjendran, Theodoros Theodoridis, Utpal Bora,
> Wei Mi, Weiming Zhao, and Yabin Hu.
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev