<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Mar 2, 2019 at 12:58 AM Fedor Sergeev <<a href="mailto:fedor.sergeev@azul.com">fedor.sergeev@azul.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF">
    <br>
    <br>
    <div class="gmail-m_-4863800615731490931moz-cite-prefix">On 3/2/19 2:38 AM, Hiroshi Yamauchi
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr">
          <div dir="ltr">
            <div dir="ltr">
              <div dir="ltr">
                <div dir="ltr">
                  <div dir="ltr">
                    <div dir="ltr">
                      <div dir="ltr">
                        <div dir="ltr">
                          <div style="font-family:arial,helvetica,sans-serif">
                            <div style="font-family:Arial,Helvetica,sans-serif"><span style="font-family:arial,helvetica,sans-serif">Here's a sketch of the
                                proposed approach for just one pass<span class="gmail_default"> (but imagine
                                  more)</span></span></div>
                            <div style="font-family:Arial,Helvetica,sans-serif"><span style="font-family:arial,helvetica,sans-serif"><br>
                              </span></div>
                            <div style="font-family:Arial,Helvetica,sans-serif"><span style="font-family:arial,helvetica,sans-serif"><a href="https://reviews.llvm.org/D58845" target="_blank">https://reviews.llvm.org/D58845</a></span></div>
                            <div style="font-family:Arial,Helvetica,sans-serif"><br>
                            </div>
                          </div>
                        </div>
                        <div class="gmail_quote">
                          <div dir="ltr" class="gmail_attr">On Fri, Mar
                            1, 2019 at 12:54 PM Fedor Sergeev via
                            llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
                            wrote:<br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div bgcolor="#FFFFFF"> On 2/28/19 12:47 AM,
                              Hiroshi Yamauchi via llvm-dev wrote:<br>
                              <blockquote type="cite">
                                <div dir="ltr">
                                  <div dir="ltr"><span class="gmail_default" style="font-family:arial,helvetica,sans-serif">Hi
                                      all,</span>
                                    <div><br>
                                    </div>
                                    <div>To implement more
                                      profile-guided optimizations, we’d
                                      like to use ProfileSummaryInfo
                                      (PSI) and BlockFrequencyInfo (BFI)
                                      from more passes of various types<span class="gmail_default" style="font-family:arial,helvetica,sans-serif">,
                                        under the new pass manager.</span></div>
                                    <div><br>
                                    </div>
                                    <div>
                                      <div style="font-family:arial,helvetica,sans-serif">The
                                        following is what we came up
                                        with. Would appreciate feedback.
                                        Thanks.</div>
                                      <div style="font-family:arial,helvetica,sans-serif"><br>
                                      </div>
                                      Issue<br>
                                      <br>
                                      It’s not obvious (to me) how to
                                      best do this, given that we cannot
                                      request an outer-scope analysis
                                      result from an inner-scope pass
                                      through analysis managers [1] and
                                      that we might unnecessarily
                                      running some analyses unless we
                                      conditionally build pass pipelines
                                      for PGO cases.<br>
                                    </div>
                                  </div>
                                </div>
                              </blockquote>
                              Indeed, this is an intentional restriction
                              in new pass manager, which is more or less
                              a reflection of a fundamental property of
                              outer-inner IRUnit relationship<br>
                              and transformations/analyses run on those
                              units. The main intent for having those
                              inner IRUnits (e.g. Loops) is to run local
                              transformations and save compile time<br>
                              on being local to a particular small piece
                              of IR. Loop Pass manager allows you to run
                              a whole pipeline of different
                              transformations still locally, amplifying
                              the save.<br>
                              As soon as you run function-level analysis
                              from within the loop pipeline you
                              essentially break this pipelining.<br>
                              Say, as you run your loop transformation
                              it modifies the loop (and the function)
                              and potentially invalidates the analysis,<br>
                              so you have to rerun your analysis again
                              and again. Hence instead of saving on
                              compile time it ends up increasing it.<br>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>
                            <div style="font-family:arial,helvetica,sans-serif">Exactly.</div>
                          </div>
                          <div style="font-family:arial,helvetica,sans-serif">
                            <div><br>
                            </div>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div bgcolor="#FFFFFF"> <br>
                              I have hit this issue somewhat recently
                              with dependency of loop passes on
                              BranchProbabilityInfo.<br>
                              (some loop passes, like IRCE can use it
                              for profitability analysis). </div>
                          </blockquote>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div bgcolor="#FFFFFF"> The only solution
                              that appears to be reasonable there is to
                              teach all the loops passes that need to be
                              pipelined<br>
                              to preserve BPI (or any other
                              module/function-level analyses) similar to
                              how they preserve DominatorTree and<br>
                              other "LoopStandard" analyses.<br>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>
                            <div style="font-family:arial,helvetica,sans-serif">Is
                              this implemented - do the loop passes
                              preserve BPI?</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    Nope, not implemented right now.<br>
    One of the problems is that even loop canonicalization passes run at
    the start of loop pass manager dont preserve it<br>
    (and at least LoopSimplifyCFG does change control flow).<br>
    <blockquote type="cite">
      <div dir="ltr">
        <div dir="ltr">
          <div dir="ltr">
            <div dir="ltr">
              <div dir="ltr">
                <div dir="ltr">
                  <div dir="ltr">
                    <div dir="ltr">
                      <div dir="ltr">
                        <div class="gmail_quote">
                          <div style="font-family:arial,helvetica,sans-serif"><br>
                          </div>
                          <div>
                            <div style="font-family:arial,helvetica,sans-serif"><font face="arial, helvetica, sans-serif">In
                                buildFunctionSimplificationPipeline
                                (where LoopFullUnrollPass is added as in
                                the sketch), </font>LateLoopOptimizationsEPCallbacks
                              and LoopOptimizerEndEPCallbacks seem to
                              allow some arbitrary loop passes to be
                              inserted into the pipelines (via flags)?</div>
                          </div>
                          <div><br>
                          </div>
                          <div>
                            <div><span style="font-family:arial,helvetica,sans-serif">I
                                wonder <span class="gmail_default" style="font-family:arial,helvetica,sans-serif">how
                                  hard </span>it'd be to teach all the <span class="gmail_default" style="font-family:arial,helvetica,sans-serif">relevant
                                </span>loop passes to preserve BFI<span class="gmail_default" style="font-family:arial,helvetica,sans-serif">
                                  (or BPI)</span><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"></span><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"></span><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"></span><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"></span><span class="gmail_default" style="font-family:arial,helvetica,sans-serif">..</span></span></div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    Well, each time you restructure control flow around the loops you
    will have to update those extra analyses,<br>
    pretty much the same way as DT is being updated through
    DomTreeUpdater.<br>
    The trick is to design a proper update interface (and then implement
    it ;) ).<br>
    And I have not spent enough time on this issue to get a good idea of
    what that interface would be.<br></div></blockquote><div><br></div><div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Hm, sounds non-trivial :) noting BFI depends on BPI.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
    <br>
    regards,<br>
      Fedor.<br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div dir="ltr">
          <div dir="ltr">
            <div dir="ltr">
              <div dir="ltr">
                <div dir="ltr">
                  <div dir="ltr">
                    <div dir="ltr">
                      <div dir="ltr">
                        <div class="gmail_quote">
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div bgcolor="#FFFFFF"> <br>
                              <blockquote type="cite">
                                <div dir="ltr">
                                  <div dir="ltr">
                                    <div>It seems that for different
                                      types of passes to be able to get
                                      PSI and BFI, we’d need to ensure
                                      PSI is cached for a non-module
                                      pass, and PSI, BFI and the
                                      ModuleAnalysisManager proxy are
                                      cached for a loop pass in the pass
                                      pipelines. This may mean
                                      potentially needing to insert
                                      BFI/PSI in front of many passes
                                      [2]. It seems not obvious how to
                                      conditionally insert BFI for PGO
                                      pipelines because there isn’t
                                      always a good flag to detect PGO
                                      cases [3] or we tend to build pass
                                      pipelines before examining the
                                      code (or without propagating
                                      enough info down) [4].<br>
                                      <br>
                                      Proposed approach<br>
                                      <br>
                                      - Cache PSI right after the
                                      profile summary in the IR is
                                      written in the pass pipeline [5].
                                      This would avoid the need to
                                      insert RequiredAnalysisPass for
                                      PSI before each non-module pass
                                      that needs it. PSI can be
                                      technically invalidated but
                                      unlikely. If it does, we insert
                                      another RequiredAnalysisPass<span class="gmail_default" style="font-family:arial,helvetica,sans-serif"> <span style="font-family:Arial,Helvetica,sans-serif">[6].</span></span><br>
                                      <br>
                                      - Conditionally insert
                                      RequireAnalysisPass for BFI, if
                                      PGO, right before each loop pass
                                      that needs it. This doesn't seem
                                      avoidable because BFI can be
                                      invalidated whenever the CFG
                                      changes. We detect PGO based on
                                      the command line flags and<span class="gmail_default" style="font-family:arial,helvetica,sans-serif">/or</span>
                                      whether the module has the profile
                                      summary info (we may need to pass
                                      the module to more functions.)<br>
                                      <br>
                                      - Add a new proxy
                                      ModuleAnalysisManagerLoopProxy for
                                      a loop pass to be able to get to
                                      the ModuleAnalysisManager in one
                                      step and PSI through it.<br>
                                      <br>
                                      Alternative approaches<br>
                                      <br>
                                      Dropping BFI and use PSI only<br>
                                      We could consider not using BFI
                                      and solely relying on PSI and
                                      function-level profiles only (as
                                      opposed to block-level), but
                                      profile precision would suffer.<br>
                                      <br>
                                      Computing BFI in-place<br>
                                      We could consider computing BFI
                                      “in-place” by directly running BFI
                                      outside of the pass manager [7].
                                      This would let us avoid using the
                                      analysis manager constraints but
                                      it would still involve running an
                                      outer-scope analysis from an
                                      inner-scope pass and potentially
                                      cause problems in terms of pass
                                      pipelining and concurrency.
                                      Moreover, a potential downside of
                                      running analyses in-place is that
                                      it won’t take advantage of cached
                                      analysis results provided by the
                                      pass manager.<br>
                                      <br>
                                      Adding inner-scope versions of PSI
                                      and BFI<br>
                                      We could consider adding a
                                      function-level and loop-level PSI
                                      and loop-level BFI, which
                                      internally act like their
                                      outer-scope versions but provide
                                      inner-scope results only. This
                                      way, we could always call
                                      getResult for PSI and BFI.
                                      However, this would still involve
                                      running an outer-scope analysis
                                      from an inner-scope pass.<br>
                                      <br>
                                      Caching the FAM and the MAM
                                      proxies<br>
                                      We could consider caching the
                                      FunctionalAnalysisManager and the
                                      ModuleAnalysisManager proxies once
                                      early on instead of adding a new
                                      proxy. But it seems to not likely
                                      work well because the analysis
                                      cache key type includes the
                                      function or the module and some
                                      pass may add a new function for
                                      which the proxy wouldn’t be
                                      cached. We’d need to write and
                                      insert a pass in select locations
                                      to just fill the cache. Adding the
                                      new proxy would take care of these
                                      with a three-line change.<br>
                                      <br>
                                      Conditional BFI<br>
                                      We could consider adding a
                                      conditional BFI analysis that is a
                                      wrapper around BFI and computes
                                      BFI only if profiles are available
                                      (either checking the module has
                                      profile summary or depend on the
                                      PSI.) With this, we wouldn’t need
                                      to conditionally build pass
                                      pipelines and may work for the new
                                      pass manager. But a similar
                                      wouldn’t work for the old pass
                                      manager because we cannot
                                      conditionally depend on an
                                      analysis under it.<br>
                                    </div>
                                  </div>
                                </div>
                              </blockquote>
                              There is LazyBlockFrequencyInfo.<br>
                              Not sure how well it fits this idea.<br>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>
                            <div><font face="arial, helvetica, sans-serif">Good
                                point. LazyBlockFrequencyInfo seems
                                usable with the old pass manager (save
                                unnecessary BFI/BPI) and would work for
                                function passes. I think t</font><span style="font-family:arial,helvetica,sans-serif">he </span><span style="font-family:arial,helvetica,sans-serif">restriction still applies
                                - </span><span style="font-family:arial,helvetica,sans-serif">a
                                loop pass cannot still request
                                (outer-scope) BFI, lazy or not, new or
                                old (pass manager). Another assumption
                                is that </span><span style="font-family:arial,helvetica,sans-serif">it'd
                                be cheap and safe to unconditionally
                                depend on PSI or check the module's
                                profile summary.</span></div>
                          </div>
                          <div><br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div bgcolor="#FFFFFF"> <br>
                              regards,<br>
                                Fedor.<br>
                              <br>
                              <blockquote type="cite">
                                <div dir="ltr">
                                  <div dir="ltr">
                                    <div><br>
                                      <br>
                                      [1] We cannot call
                                      AnalysisManager::getResult for an
                                      outer scope but only
                                      getCachedResult. Probably because
                                      of potential pipelining or
                                      concurrency issues.<br>
                                      [2] For example, potentially
                                      breaking up multiple pipelined
                                      loop passes and insert
                                      RequireAnalysisPass<BlockFrequencyAnalysis>
                                      in front of each of them.<br>
                                      [3] For example,
                                      -fprofile-instr-use and
                                      -fprofile-sample-use aren’t
                                      present in ThinLTO post link
                                      builds.<br>
                                      [4] For example, we could check
                                      whether the module has the profile
                                      summary metadata annotated when
                                      building pass pipelines but we
                                      don’t always pass the module down
                                      to the place where we build pass
                                      pipelines.<br>
                                      [5] By inserting
                                      RequireAnalysisPass<ProfileSummaryInfo>
                                      after the PGOInstrumentationUse
                                      and the SampleProfileLoaderPass
                                      passes (and around the
                                      PGOIndirectCallPromotion pass for
                                      the Thin LTO post link pipeline.)<br>
                                      [6] For example, the
                                      context-sensitive PGO<span class="gmail_default" style="font-family:arial,helvetica,sans-serif">.</span><br>
                                      [7] Directly calling its
                                      constructor along with the
                                      dependent analyses results, eg.
                                      the jump threading pass.</div>
                                  </div>
                                </div>
                                <br>
                                <fieldset class="gmail-m_-4863800615731490931m_3084483561612773275gmail-m_5869029522365402437mimeAttachmentHeader"></fieldset>
                                <pre class="gmail-m_-4863800615731490931m_3084483561612773275gmail-m_5869029522365402437moz-quote-pre">_______________________________________________
LLVM Developers mailing list
<a class="gmail-m_-4863800615731490931m_3084483561612773275gmail-m_5869029522365402437moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a class="gmail-m_-4863800615731490931m_3084483561612773275gmail-m_5869029522365402437moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
                              </blockquote>
                              <br>
                            </div>
_______________________________________________<br>
                            LLVM Developers mailing list<br>
                            <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
                            <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
                          </blockquote>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </div>

</blockquote></div></div>