<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On May 21, 2014, at 3:05 PM, Philip Reames <<a href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">

  
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  
  <div bgcolor="#FFFFFF" text="#000000">

    <br>

    <div class="moz-cite-prefix">On 05/15/2014 11:12 AM, Akira Hatanaka

      wrote:<br>

    </div>

    <blockquote cite="mid:CAB297QwaU1SB4ENicf6gTzV_Hkw5sJHPaG_wFHgTjL76rZoh8Q@mail.gmail.com" type="cite">

      <div dir="ltr">On Thu, May 15, 2014 at 9:31 AM, Philip Reames <span dir="ltr"><<a moz-do-not-send="true" href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>

        wrote:<br>

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

              <div text="#000000" bgcolor="#FFFFFF">

                <div class="">

                  <div>On 05/14/2014 06:02 PM, Akira Hatanaka wrote:<br>

                  </div>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div>I would like to get feedback from the

                        community on how I can speed up the compilation

                        of a function that has one huge basic block

                        consisting of over 150K instructions. It takes

                        about 10 minutes for clang to produce the object

                        file and the function that is taking up the

                        majority of the time is

                        LoadAndStorePromoter::run, which is called when

                        SROA is run:</div>

                      <div><br>

                      </div>

                      <div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 

                            // Otherwise, we have mixed loads and stores

                          (or just a bunch of stores).</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 

                            // Since SSAUpdater is purely for

                          cross-block values, we need to determine</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 

                            // the order of these instructions in the

                          block.  If the first use in the</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 

                            // block is a load, then it uses the live in

                          value.  The last store defines</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 

                            // the live out value.  We handle this by

                          doing a linear scan of the block.</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 

                            Value *StoredValue = nullptr;</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 

                            for (BasicBlock::iterator II =

                          BB->begin(), E = BB->end(); II != E;

                          ++II) {</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"><br>

                        </div><div style="margin: 0px;"> If I understand this code

                          correctly. LoadAndStorePromoter::run is called

                          once per every promotable alloca and iterates

                          over the whole list to determine the order of

                          loads and stores in the basic block that

                          access the alloca.</div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                I'm a bit confused about the cost here.  Based on the

                code, I see us walking each basic block at most once per

                alloca.  Even at 150K instructions, I wouldn't expect

                the loop to take *that* long.  How many allocas are you

                dealing with?<br>

                <br>

                I do see an outer loop in SROA::performPromotion.  Are

                you possibly triggering a large number of iterations

                there?  That would make the inner loop appear much more

                expensive.  <br>

                <div class=""> <br>

                </div>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>The unoptimized bitcode (compiled with clang -O0) has a

              large number of allocas (over 7000), which I think is also

              contributing to the long compilation time.</div>

          </div>

        </div>

      </div>

    </blockquote>

    I suspect that many things in the optimizer aren't going to react

    well to that many allocas.  :)  Where are they all coming from?  And

    why hasn't mem2reg removed most of them?  I suspect you'd be better

    off investing effort to address this issue then optimizing SROA.  <br>

    <blockquote cite="mid:CAB297QwaU1SB4ENicf6gTzV_Hkw5sJHPaG_wFHgTjL76rZoh8Q@mail.gmail.com" type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>

              <br>

            </div>

            <div>But making LoadAndStorePromoter::run faster is

              sufficient to reduce the compilation time (at least in my

              case).</div>

          </div>

        </div>

      </div>

    </blockquote>

    I'm not objecting to this.  I'd be happy to see something along

    these lines land, I'm just not sure it solves the root cause of your

    issues.  <br>

    <blockquote cite="mid:CAB297QwaU1SB4ENicf6gTzV_Hkw5sJHPaG_wFHgTjL76rZoh8Q@mail.gmail.com" type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> ...<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <blockquote cite="mid:CAB297QwaU1SB4ENicf6gTzV_Hkw5sJHPaG_wFHgTjL76rZoh8Q@mail.gmail.com" type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>

            </div>

            <div><br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

              <div text="#000000" bgcolor="#FFFFFF">

                <div class="">

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div><div style="margin: 0px;">4. Change the definition

                          of class Instruction in IR/Instruction.h so

                          that determining whether one instruction

                          precedes another becomes more efficient. My

                          current implementation brings down the

                          compilation time to just 6s.</div><div style="margin: 0px;"><br>

                        </div><div style="margin: 0px;">I won't go into much

                          detail about the change I made, but it

                          basically replaces Instruction's pointer to

                          its parent BasicBlock with another data

                          structure. I haven't done much investigation

                          into how this will affect the common case.</div><div style="margin: 0px;"><br>

                        </div>

                        <div>This will also help speed up the execution

                          of code in other places. For example, these

                          two functions are called when GVN::processLoad

                          is called and iterate over a basic block's

                          instruction list to determine the order of two

                          (load and stores) instructions.:</div><div style="margin: 0px;"><br>

                        </div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"><span style="font-family:arial;font-size:small">llvm::isPotentiallyReachable</span></div><div style="margin: 0px; font-size: 11px; font-family: Menlo;">

                          <br>

                        </div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 

                            // Linear scan, start at 'A', see whether we

                          hit 'B' or the end first.</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 

                            for (BasicBlock::const_iterator I = A, E =

                          BB->end(); I != E; ++I) {</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 

                              if (&*I == B)</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;"> 

                                return true;</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;">

                              }</div>

                        <div><br>

                        </div>

                        <div>DominatorTree::dominates</div>

                        <div><br>

                        </div>

                        <div><div style="margin: 0px; font-size: 11px; font-family: Menlo;">  //


                            Loop through the basic block until we find

                            Def or User.</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;">

                              BasicBlock::const_iterator I =

                            DefBB->begin();</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;">  for


                            (; &*I != Def && &*I !=

                            User; ++I)</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;">

                                /*empty*/;</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                I'd be curious to see a proposal here.  I've been bitten

                by the DT::dominates issue myself.  <br>

                <div class="">

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div>

                        <div><br>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

              </div>

            </blockquote>

            <div>I was hesitant to make changes in classes such as

              BasicBlock and Instruction just to make it faster to

              compile a basic block with 150K+ instructions (which I

              think is a pathological case). If there are other use

              cases that need discovering orders of instructions in a

              basic block, it might be worth considering fixing those

              classes.</div>

            <div> </div>

            <div>The attached patch is the current implementation I

              have. Basically I am trying to make function

              Instruction::precede fast and use it to speed up functions

              that are currently iterating over the whole instruction

              list to determine the order of instructions in a basic

              block. <br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    I get where you're going with this, but I don't think this is the

    right approach.  In particular, you're introducing extra

    indirections along some fairly common paths and the caching scheme

    around sublist is non-obvious.  <br></div></blockquote><div><br></div>Akira, I think this is worth solving, but probably not for this specific issue. I don't think it's uncommon for optimizations to want to understand the relatively ordering of two instructions.</div><div><br></div><div>I don't claim to know the right solution. Philip, is your concern of Akira's approach about performance in the common case? Or is is about the complexity of the scheme? Akira, if would be good to have some analysis of the efficiency of various operations related to instructions. It's good to make the 'proceeds' query fast, but not at the expensive of insertion, deletion, etc.</div><div><br></div><div>Evan</div><div><br></div><div><br><blockquote type="cite"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    I do like the extraction of the 'preceeds' interface.  It gives us a

    single place to optimize.  <br>

    <br>

    Looking at the problem, I don't see an easy answer here.  If we

    weren't using linked lists, this would be easy, but with that

    constraint, everything I come up with seems nasty.  <br>

    <br>

    One somewhat less nasty (but still bad) scheme I came up with would

    be to use the traits of the InstList in basic block to create a

    listener which ignores all updates except when order is interesting

    (i.e. inside the SROA code).  This seems really bug prone though.  I

    am not in favour of actually implementing such a thing.  <br>

    <br>

    Another scheme might be to have an Analysis which caches the answer

    "does X proceed Y".  By default, nothing would preserve this, and we

    could teach certain passes to be aware of it.  Again, this feels

    dirty.  <br>

    <br>

    I don't really have a good solution here.  <br>

    <br>

    <blockquote cite="mid:CAB297QwaU1SB4ENicf6gTzV_Hkw5sJHPaG_wFHgTjL76rZoh8Q@mail.gmail.com" type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

              <div text="#000000" bgcolor="#FFFFFF">

                <div class="">

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div>

                        <div> </div><div style="margin: 0px;">5. Simply give up doing

                          SROA (and GVN::processLoad) if there is a huge

                          basic block that can potentially take a long

                          time to compile.</div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                Until we have something better, this might be a

                necessary hack.  Whether it's actually worth introducing

                depends on how long Chandler thinks the mem2reg version

                will take to land.  <br>

                <span class=""><font color="#888888"> <br>

                  </font></span></div>

            </blockquote>

            <div><br>

            </div>

            <div>I tried this idea today, and unfortunately it didn't

              reduce the compilation time. Skipping SROA leaves a lot of

              loads/stores in the code that could have been promoted,

              and that places the burden on other passes that are run

              later, such as GVN and DSE.</div>

          </div>

        </div>

      </div>

    </blockquote>

    Makes some sense.  See my point above about lots of allocas

    stressing the optimizer.  :)<br>

    <br>

    p.s. Any chance of bugs being filed with test cases for each of

    those?  :)<br>

    <br>

    Philip<br>

  </div>


_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br></blockquote></div><br></body></html>