[llvm-dev] [RFC] Polly Status and Integration

Adve, Vikram Sadanand via llvm-dev llvm-dev at lists.llvm.org
Mon Sep 4 13:14:13 PDT 2017


Responses inline, marked with (***VSA***) because Outlook on a Mac sucks for inline replies!


On 9/4/17, 2:14 PM, "Tobias Grosser" <tobias.grosser at inf.ethz.ch> wrote:

    On Mon, Sep 4, 2017, at 20:49, Hal Finkel via llvm-dev wrote:
    > [tying to original thread]
    > 
    > On 09/04/2017 01:37 PM, Adve, Vikram Sadanand via llvm-dev wrote:
    > > Hal, Tobias, et al. –
    > >
    > > I am strongly in favor of seeing a broader range of loop transformations, supported by strong dependence analysis, added to LLVM, and the Polly infrastructure seems to be by far our best bet to make that happen.  I have a couple of questions:
    > >
    > > 1) Integer constraint libraries like ISL (and Omega, which I used extensively in a previous project) are fundamentally solving exponential problems or worse, and can in rare cases cause compile times to explode.  How does Polly deal with this issue?
    
    Our integer set library has a compute out functionality. It counts
    expensive operations such as memory allocations but also so-called
    pivots which occur during ilp solving. In case a certain limit is
    reached the library skips all subsequent operations and returns a "null"
    result to the user. We guard potentially expensive parts of Polly
    (scheduling, dependence analysis) with these compute outs, and bail out
    gracefully. The important point here is that these are compute-outs not
    time-outs. Hence, the resulting binaries will be identical across system
    with differing performance (and obviously also across runs).


(***VSA***) Thanks, Tobias.  This is exactly what I was looking for.  Using compute-outs is definitely much better than timeouts (but are probably more intrusive and cannot have been easy to implement, so kudos on that).


    > > 2) Are the loop transformations composable through a reasonable API, i.e., can a tool create an arbitrary pipeline of the transformations?  I know about the Polly scheduling work, which explores flexible combinations of these, but I don’t know whether there are interfaces other tools can use to create arbitrary, fixed pipelines of such transforms.
    
    Loop transformations are transformations on the polyhedral schedule.
    Tools can set an arbitrary multi-dimensional scheduling function, which
    allow individual loop iterations to be remapped into a new loop
    structure. (With your background, this might already be clear). 

(***VSA***) Yes, that makes sense.  When a client creates such a scheduling function, does Polly automatically enforce correctness internally, or is the tool responsible for checking that separately?


For
    everybody who does not yet speak "polyhedral", we generally work on a
    tree representation of the execution order, which  -- even though not an
    AST -- can be modified in a similar way, such that loop transformations
    can be applied on a per-node level. 

(***VSA***) What granularity can clients transform in this representation?  For example, are schedules applicable to individual IR instructions?  Or only to entire basic blocks?


We use the latter to implement
    certain specific transformations, e.g. the GOTO/BLIS matrix
    multiplication transformation, which is a very specific sequence of
    transformations that is known to yield "optimal" performance.
    
    Best,
    Tobias



-- Vikram Adve
 
// Interim Head and Professor, Department of Computer Science
// University of Illinois at Urbana-Champaign
// Admin Assistant: Amanda Foley - ajfoley2 at illinois.edu
// Google Hangouts: vikram.s.adve at gmail.com || Skype: vikramsadve
// Research page: http://vikram.cs.illinois.edu <http://vikram.cs.illinois.edu/>
 
    

    > >
    > > Thanks,
    > >
    > > -- Vikram Adve
    > >   
    > > // Interim Head and Professor, Department of Computer Science
    > > // University of Illinois at Urbana-Champaign
    > > // Admin Assistant: Amanda Foley - ajfoley2 at illinois.edu
    > > // Google Hangouts: vikram.s.adve at gmail.com || Skype: vikramsadve
    > > // Research page: http://vikram.cs.illinois.edu <http://vikram.cs.illinois.edu/>
    > >   
    > >
    > > On 9/1/17, 1:46 PM, "llvm-dev on behalf of via llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of llvm-dev at lists.llvm.org> wrote:
    > >
    > >      Date: Fri, 1 Sep 2017 13:47:57 -0500
    > >      From: Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org>
    > >      To: LLVM Dev <llvm-dev at lists.llvm.org>
    > >      Cc: Tobias Grosser <tobias.grosser at inf.ethz.ch>,
    > >      	michaelkruse at meinersbur.de
    > >      Subject: [llvm-dev] [RFC] Polly Status and Integration
    > >      Message-ID: <e976ae0e-4ccc-0552-2569-b0d472b1e2df at anl.gov>
    > >      Content-Type: text/plain; charset="utf-8"; Format="flowed"
    > >      
    > >      **
    > >      
    > >      *Hi everyone,As you may know, stock LLVM does not provide the kind of
    > >      advanced loop transformations necessary to provide good performance on
    > >      many applications. LLVM's Polly project provides many of the required
    > >      capabilities, including loop transformations such as fission, fusion,
    > >      skewing, blocking/tiling, and interchange, all powered by
    > >      state-of-the-art dependence analysis. Polly also provides automated
    > >      parallelization and targeting of GPUs and other**accelerators.*
    > >      
    > >      *
    > >      
    > >      Over the past year, Polly’s development has focused on robustness,
    > >      correctness, and closer integration with LLVM. To highlight a few
    > >      accomplishments:
    > >      
    > >      
    > >        *
    > >      
    > >          Polly now runs, by default, in the conceptually-proper place in
    > >          LLVM’s pass pipeline (just before the loop vectorizer). Importantly,
    > >          this means that its loop transformations are performed after
    > >          inlining and other canonicalization, greatly increasing its
    > >          robustness, and enabling its use on C++ code (where [] is often a
    > >          function call before inlining).
    > >      
    > >        *
    > >      
    > >          Polly’s cost-modeling parameters, such as those describing the
    > >          target’s memory hierarchy, are being integrated with
    > >          TargetTransformInfo. This allows targets to properly override the
    > >          modeling parameters and allows reuse of these parameters by other
    > >          clients.
    > >      
    > >        *
    > >      
    > >          Polly’s method of handling signed division/remainder operations,
    > >          which worked around lack of support in ScalarEvolution, is being
    > >          replaced thanks to improvements being contributed to ScalarEvolution
    > >          itself (see D34598). Polly’s core delinearization routines have long
    > >          been a part of LLVM itself.
    > >      
    > >        *
    > >      
    > >          PolyhedralInfo, which exposes a subset of Polly’s loop analysis for
    > >          use by other clients, is now available.
    > >      
    > >        *
    > >      
    > >          Polly is now part of the LLVM release process and is being included
    > >          with LLVM by various packagers (e.g., Debian).
    > >      
    > >      
    > >      I believe that the LLVM community would benefit from beginning the
    > >      process of integrating Polly with LLVM itself and continuing its
    > >      development as part of our main code base. This will:
    > >      
    > >        *
    > >      
    > >          Allow for wider adoption of LLVM within communities relying on
    > >          advanced loop transformations.
    > >      
    > >        *
    > >      
    > >          Provide for better community feedback on, and testing of, the code
    > >          developed (although the story in this regard is already fairly solid).
    > >      
    > >        *
    > >      
    > >          Better motivate targets to provide accurate, comprehensive, modeling
    > >          parameters for use by advanced loop transformations.
    > >      
    > >        *
    > >      
    > >          Perhaps most importantly, this will allow us to develop and tune the
    > >          rest of the optimizer assuming that Polly’s capabilities are present
    > >          (the underlying analysis, and eventually, the transformations
    > >          themselves).
    > >      
    > >      
    > >      The largest issue on which community consensus is required, in order to
    > >      move forward at all, is what to do with isl. isl, the Integer Set
    > >      Library, provides core functionality on which Polly depends. It is a C
    > >      library, and while some Polly/LLVM developers are also isl developers,
    > >      it has a large user community outside of LLVM/Polly. A C++ interface was
    > >      recently added, and Polly is transitioning to use the C++ interface.
    > >      Nevertheless, options here include rewriting the needed functionality,
    > >      forking isl and transitioning our fork toward LLVM coding conventions
    > >      (and data structures) over time, and incorporating isl more-or-less
    > >      as-is to avoid partitioning its development.
    > >      
    > >      
    > >      That having been said, isl is internally modular, and regardless of the
    > >      overall integration strategy, the Polly developers anticipate
    > >      specializing, or even replacing, some of these components with
    > >      LLVM-specific solutions. This is especially true for anything that
    > >      touches performance-related heuristics and modeling. LLVM-specific, or
    > >      even target-specific, loop schedulers may be developed as well.
    > >      
    > >      
    > >      Even though some developers in the LLVM community already have a
    > >      background in polyhedral-modeling techniques, the Polly developers have
    > >      developed, and are still developing, extensive tutorials on this topic
    > >      http://pollylabs.org/education.htmland especially
    > >      http://playground.pollylabs.org.
    > >      
    > >      
    > >      Finally, let me highlight a few ongoing development efforts in Polly
    > >      that are potentially relevant to this discussion. Polly’s loop analysis
    > >      is sound and technically superior to what’s in LLVM currently (i.e. in
    > >      LoopAccessAnalysis and DependenceAnalysis). There are, however, two
    > >      known reasons why Polly’s transformations could not yet be enabled by
    > >      default:
    > >      
    > >        *
    > >      
    > >          A correctness issue: Currently, Polly assumes that 64 bits is large
    > >          enough for all new loop-induction variables and index expressions.
    > >          In rare cases, transformations could be performed where more bits
    > >          are required. Preconditions need to be generated preventing this
    > >          (e.g., D35471).
    > >      
    > >        *
    > >      
    > >          A performance issue: Polly currently models temporal locality (i.e.,
    > >          it tries to get better reuse in time), but does not model spatial
    > >          locality (i.e., it does not model cache-line reuse). As a result, it
    > >          can sometimes introduce performance regressions. Polly Labs is
    > >          currently working on integrating spatial locality modeling into the
    > >          loop optimization model.
    > >      
    > >      Polly can already split apart basic blocks in order to implement loop
    > >      fusion. Heuristics to choose at which granularity are still being
    > >      implemented (e.g., PR12402).
    > >      
    > >      I believe that we can now develop a concrete plan for moving
    > >      state-of-the-art loop optimizations, based on the technology in the
    > >      Polly project, into LLVM. Doing so will enable LLVM to be competitive
    > >      with proprietary compilers in high-performance computing, machine
    > >      learning, and other important application domains. I'd like community
    > >      feedback on what**should be part of that plan.
    > >      
    > >      
    > >      Sincerely,
    > >      
    > >      Hal (on behalf of myself, Tobias Grosser, and Michael Kruse, with
    > >      feedback from**several other active Polly developers)
    > >      
    > >      
    > >      We thank the numerous people who have contributed to the Polly
    > >      infrastructure:Alexandre Isoard, Andreas Simbuerger, Andy Gibbs, Annanay
    > >      Agarwal, ArminGroesslinger, Ajith Pandel, Baranidharan Mohan, Benjamin
    > >      Kramer, BillWendling, Chandler Carruth, Craig Topper, Chris Jenneisch,
    > >      ChristianBielert, Daniel Dunbar, Daniel Jasper, David Blaikie, David
    > >      Peixotto,Dmitry N. Mikushin, Duncan P. N. Exon Smith, Eli Friedman,
    > >      EugeneZelenko, George Burgess IV, Hans Wennborg, Hongbin Zheng, Huihui
    > >      Zhang,Jakub Kuderski, Johannes Doerfert, Justin Bogner, Karthik Senthil,
    > >      LoganChien, Lawrence Hu, Mandeep Singh Grang, Matt Arsenault,
    > >      MatthewSimpson, Mehdi Amini, Micah Villmow, Michael Kruse, Matthias
    > >      Reisinger,Maximilian Falkenstein, Nakamura Takumi, Nandini Singhal,
    > >      NicolasBonfante, Patrik Hägglund, Paul Robinson, Philip Pfaffe, Philipp
    > >      Schaad,Peter Conn, Pratik Bhatu, Rafael Espindola, Raghesh Aloor,
    > >      ReidKleckner, Roal Jordans, Richard Membarth, Roman Gareev,
    > >      SaleemAbdulrasool, Sameer Sahasrabuddhe, Sanjoy Das, Sameer AbuAsal,
    > >      SamNovak, Sebastian Pop, Siddharth Bhat, Singapuram Sanjay
    > >      Srivallabh,Sumanth Gundapaneni, Sunil Srivastava, Sylvestre Ledru, Star
    > >      Tan, TanyaLattner, Tim Shen, Tarun Ranjendran, Theodoros Theodoridis,
    > >      Utpal Bora,Wei Mi, Weiming Zhao, and Yabin Hu.*
    > >      
    > >      --
    > >      Hal Finkel
    > >      Lead, Compiler Technology and Programming Languages
    > >      Leadership Computing Facility
    > >      Argonne National Laboratory
    > >      
    > >      
    > >
    > > _______________________________________________
    > > LLVM Developers mailing list
    > > llvm-dev at lists.llvm.org
    > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
    > 
    > -- 
    > Hal Finkel
    > Lead, Compiler Technology and Programming Languages
    > Leadership Computing Facility
    > Argonne National Laboratory
    > 
    > _______________________________________________
    > LLVM Developers mailing list
    > llvm-dev at lists.llvm.org
    > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
    



More information about the llvm-dev mailing list