[llvm-dev] llvm-dev Digest, Vol 159, Issue 2

Mon Sep 4 11:37:05 PDT 2017

Hal, Tobias, et al. –

I am strongly in favor of seeing a broader range of loop transformations, supported by strong dependence analysis, added to LLVM, and the Polly infrastructure seems to be by far our best bet to make that happen.  I have a couple of questions:

1) Integer constraint libraries like ISL (and Omega, which I used extensively in a previous project) are fundamentally solving exponential problems or worse, and can in rare cases cause compile times to explode.  How does Polly deal with this issue? 
2) Are the loop transformations composable through a reasonable API, i.e., can a tool create an arbitrary pipeline of the transformations?  I know about the Polly scheduling work, which explores flexible combinations of these, but I don’t know whether there are interfaces other tools can use to create arbitrary, fixed pipelines of such transforms.

Thanks,

-- Vikram Adve

// Interim Head and Professor, Department of Computer Science
// University of Illinois at Urbana-Champaign
// Admin Assistant: Amanda Foley - ajfoley2 at illinois.edu
// Google Hangouts: vikram.s.adve at gmail.com || Skype: vikramsadve
// Research page: http://vikram.cs.illinois.edu <http://vikram.cs.illinois.edu/>

On 9/1/17, 1:46 PM, "llvm-dev on behalf of via llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of llvm-dev at lists.llvm.org> wrote:

    Date: Fri, 1 Sep 2017 13:47:57 -0500
    From: Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org>
    To: LLVM Dev <llvm-dev at lists.llvm.org>
    Cc: Tobias Grosser <tobias.grosser at inf.ethz.ch>,
    	michaelkruse at meinersbur.de
    Subject: [llvm-dev] [RFC] Polly Status and Integration
    Message-ID: <e976ae0e-4ccc-0552-2569-b0d472b1e2df at anl.gov>
    Content-Type: text/plain; charset="utf-8"; Format="flowed"

    **

    *Hi everyone,As you may know, stock LLVM does not provide the kind of 
    advanced loop transformations necessary to provide good performance on 
    many applications. LLVM's Polly project provides many of the required 
    capabilities, including loop transformations such as fission, fusion, 
    skewing, blocking/tiling, and interchange, all powered by 
    state-of-the-art dependence analysis. Polly also provides automated 
    parallelization and targeting of GPUs and other**accelerators.*

    *

    Over the past year, Polly’s development has focused on robustness, 
    correctness, and closer integration with LLVM. To highlight a few 
    accomplishments:

      *

        Polly now runs, by default, in the conceptually-proper place in
        LLVM’s pass pipeline (just before the loop vectorizer). Importantly,
        this means that its loop transformations are performed after
        inlining and other canonicalization, greatly increasing its
        robustness, and enabling its use on C++ code (where [] is often a
        function call before inlining).

      *

        Polly’s cost-modeling parameters, such as those describing the
        target’s memory hierarchy, are being integrated with
        TargetTransformInfo. This allows targets to properly override the
        modeling parameters and allows reuse of these parameters by other
        clients.

      *

        Polly’s method of handling signed division/remainder operations,
        which worked around lack of support in ScalarEvolution, is being
        replaced thanks to improvements being contributed to ScalarEvolution
        itself (see D34598). Polly’s core delinearization routines have long
        been a part of LLVM itself.

      *

        PolyhedralInfo, which exposes a subset of Polly’s loop analysis for
        use by other clients, is now available.

      *

        Polly is now part of the LLVM release process and is being included
        with LLVM by various packagers (e.g., Debian).

    I believe that the LLVM community would benefit from beginning the 
    process of integrating Polly with LLVM itself and continuing its 
    development as part of our main code base. This will:

      *

        Allow for wider adoption of LLVM within communities relying on
        advanced loop transformations.

      *

        Provide for better community feedback on, and testing of, the code
        developed (although the story in this regard is already fairly solid).

      *

        Better motivate targets to provide accurate, comprehensive, modeling
        parameters for use by advanced loop transformations.

      *

        Perhaps most importantly, this will allow us to develop and tune the
        rest of the optimizer assuming that Polly’s capabilities are present
        (the underlying analysis, and eventually, the transformations
        themselves).

    The largest issue on which community consensus is required, in order to 
    move forward at all, is what to do with isl. isl, the Integer Set 
    Library, provides core functionality on which Polly depends. It is a C 
    library, and while some Polly/LLVM developers are also isl developers, 
    it has a large user community outside of LLVM/Polly. A C++ interface was 
    recently added, and Polly is transitioning to use the C++ interface. 
    Nevertheless, options here include rewriting the needed functionality, 
    forking isl and transitioning our fork toward LLVM coding conventions 
    (and data structures) over time, and incorporating isl more-or-less 
    as-is to avoid partitioning its development.

    That having been said, isl is internally modular, and regardless of the 
    overall integration strategy, the Polly developers anticipate 
    specializing, or even replacing, some of these components with 
    LLVM-specific solutions. This is especially true for anything that 
    touches performance-related heuristics and modeling. LLVM-specific, or 
    even target-specific, loop schedulers may be developed as well.

    Even though some developers in the LLVM community already have a 
    background in polyhedral-modeling techniques, the Polly developers have 
    developed, and are still developing, extensive tutorials on this topic 
    http://pollylabs.org/education.htmland especially 
    http://playground.pollylabs.org.

    Finally, let me highlight a few ongoing development efforts in Polly 
    that are potentially relevant to this discussion. Polly’s loop analysis 
    is sound and technically superior to what’s in LLVM currently (i.e. in 
    LoopAccessAnalysis and DependenceAnalysis). There are, however, two 
    known reasons why Polly’s transformations could not yet be enabled by 
    default:

      *

        A correctness issue: Currently, Polly assumes that 64 bits is large
        enough for all new loop-induction variables and index expressions.
        In rare cases, transformations could be performed where more bits
        are required. Preconditions need to be generated preventing this
        (e.g., D35471).

      *

        A performance issue: Polly currently models temporal locality (i.e.,
        it tries to get better reuse in time), but does not model spatial
        locality (i.e., it does not model cache-line reuse). As a result, it
        can sometimes introduce performance regressions. Polly Labs is
        currently working on integrating spatial locality modeling into the
        loop optimization model.

    Polly can already split apart basic blocks in order to implement loop 
    fusion. Heuristics to choose at which granularity are still being 
    implemented (e.g., PR12402).

    I believe that we can now develop a concrete plan for moving 
    state-of-the-art loop optimizations, based on the technology in the 
    Polly project, into LLVM. Doing so will enable LLVM to be competitive 
    with proprietary compilers in high-performance computing, machine 
    learning, and other important application domains. I'd like community 
    feedback on what**should be part of that plan.

    Sincerely,

    Hal (on behalf of myself, Tobias Grosser, and Michael Kruse, with 
    feedback from**several other active Polly developers)

    We thank the numerous people who have contributed to the Polly 
    infrastructure:Alexandre Isoard, Andreas Simbuerger, Andy Gibbs, Annanay 
    Agarwal, ArminGroesslinger, Ajith Pandel, Baranidharan Mohan, Benjamin 
    Kramer, BillWendling, Chandler Carruth, Craig Topper, Chris Jenneisch, 
    ChristianBielert, Daniel Dunbar, Daniel Jasper, David Blaikie, David 
    Peixotto,Dmitry N. Mikushin, Duncan P. N. Exon Smith, Eli Friedman, 
    EugeneZelenko, George Burgess IV, Hans Wennborg, Hongbin Zheng, Huihui 
    Zhang,Jakub Kuderski, Johannes Doerfert, Justin Bogner, Karthik Senthil, 
    LoganChien, Lawrence Hu, Mandeep Singh Grang, Matt Arsenault, 
    MatthewSimpson, Mehdi Amini, Micah Villmow, Michael Kruse, Matthias 
    Reisinger,Maximilian Falkenstein, Nakamura Takumi, Nandini Singhal, 
    NicolasBonfante, Patrik Hägglund, Paul Robinson, Philip Pfaffe, Philipp 
    Schaad,Peter Conn, Pratik Bhatu, Rafael Espindola, Raghesh Aloor, 
    ReidKleckner, Roal Jordans, Richard Membarth, Roman Gareev, 
    SaleemAbdulrasool, Sameer Sahasrabuddhe, Sanjoy Das, Sameer AbuAsal, 
    SamNovak, Sebastian Pop, Siddharth Bhat, Singapuram Sanjay 
    Srivallabh,Sumanth Gundapaneni, Sunil Srivastava, Sylvestre Ledru, Star 
    Tan, TanyaLattner, Tim Shen, Tarun Ranjendran, Theodoros Theodoridis, 
    Utpal Bora,Wei Mi, Weiming Zhao, and Yabin Hu.*

    -- 
    Hal Finkel
    Lead, Compiler Technology and Programming Languages
    Leadership Computing Facility
    Argonne National Laboratory