<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <br>
    <div class="moz-cite-prefix">On 03/15/2016 04:22 PM, <a class="moz-txt-link-abbreviated" href="mailto:escha@apple.com">escha@apple.com</a>
      wrote:<br>
    </div>
    <blockquote
      cite="mid:616CBBA3-78B3-4E87-B6B5-86CF223CCE4E@apple.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <br class="">
      <div>
        <blockquote type="cite" class="">
          <div class="">On Mar 15, 2016, at 4:09 PM, Philip Reames <<a
              moz-do-not-send="true"
              href="mailto:listmail@philipreames.com" class=""><a class="moz-txt-link-abbreviated" href="mailto:listmail@philipreames.com">listmail@philipreames.com</a></a>>
            wrote:</div>
          <br class="Apple-interchange-newline">
          <div class="">
            <meta content="text/html; charset=utf-8"
              http-equiv="Content-Type" class="">
            <div text="#000000" bgcolor="#FFFFFF" class=""> <br
                class="">
              <br class="">
              <div class="moz-cite-prefix">On 03/15/2016 03:07 PM, via
                llvm-dev wrote:<br class="">
              </div>
              <blockquote
                cite="mid:B463FEBC-9600-4BFA-8451-F02A648E6274@apple.com"
                type="cite" class="">
                <meta http-equiv="Content-Type" content="text/html;
                  charset=utf-8" class="">
                There’s a few passes in LLVM that make heavy use of a
                big DenseMap, one that potentially gets filled with up
                to 1 entry for each instruction in the function.
                EarlyCSE is the best example, but Reassociate and
                MachineCSE have this to some degree as well (there might
                be others?). To put it simply: at least in my profile,
                EarlyCSE spends ~1/5 of its time growing DenseMaps. This
                is kind of… bad.
                <div class=""><br class="">
                </div>
                <div class="">grow() is inherently slow because it needs
                  to rehash and reinsert everything. This means growing
                  a DenseMap costs much, much more than growing, for
                  example, a vector. I talked about this with a few
                  people and here are some possibilities we’ve come up
                  with to improve this (some of which probably aren’t
                  what we want):</div>
                <div class=""><br class="">
                </div>
                <div class="">1. Use a map that doesn’t require
                  rehashing and reinsertion to grow. Chaining lets you
                  do this, but std::unordered_map is probably so much
                  slower than DenseMap we’d lose more than we gain.</div>
                <div class="">2. Include the hash code in the map so
                  that we don’t have to rehash. 32 bits more per entry
                  (or whatever), and it might not help that much, since
                  we still have to do the whole reinsertion routine.</div>
                <div class="">3. Pre-calculate an estimate as to the map
                  size we need. For example, in EarlyCSE, this is
                  possibly gross overestimate of size needed:</div>
                <div class=""><br class="">
                </div>
                <div class="">
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo;" class="">  <span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #bb2ca2" class="">unsigned</span>
                    InstCount = <span style="font-variant-ligatures:
                      no-common-ligatures; color: #272ad8" class="">0</span>;</div>
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo;" class="">  <span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #bb2ca2" class="">unsigned</span>
                    LoadCount = <span style="font-variant-ligatures:
                      no-common-ligatures; color: #272ad8" class="">0</span>;</div>
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo;" class="">  <span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #bb2ca2" class="">unsigned</span>
                    CallCount = <span style="font-variant-ligatures:
                      no-common-ligatures; color: #272ad8" class="">0</span>;</div>
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo;" class="">  <span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #bb2ca2" class="">for</span>
                    (<span style="font-variant-ligatures:
                      no-common-ligatures; color: #4f8187" class="">inst_iterator</span>
                    FI = <span style="font-variant-ligatures:
                      no-common-ligatures; color: #31595d" class="">inst_begin</span>(<span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #4f8187" class="">F</span>),
                    FE = <span style="font-variant-ligatures:
                      no-common-ligatures; color: #31595d" class="">inst_end</span>(<span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #4f8187" class="">F</span>);
                    FI != FE; ++FI) {</div>
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo; color: rgb(49, 89, 93);"
                    class=""><span style="" class="">    </span><span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #bb2ca2" class="">if</span><span
                      style="" class=""> (FI-></span>mayReadOrWriteMemory<span
                      style="" class="">())</span></div>
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo;" class="">     
                    ++LoadCount;</div>
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo;" class="">    <span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #bb2ca2" class="">else</span>
                    <span style="font-variant-ligatures:
                      no-common-ligatures; color: #bb2ca2" class="">if</span>
                    (<span style="font-variant-ligatures:
                      no-common-ligatures; color: #31595d" class="">isa</span><<span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #4f8187" class="">CallInst</span>>(*FI))</div>
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo;" class="">     
                    ++CallCount;</div>
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo;" class="">    <span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #bb2ca2" class="">else</span></div>
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo;" class="">     
                    ++InstCount;</div>
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo;" class="">  }</div>
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo;" class="">  <span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #4f8187" class="">AvailableValues</span>.<span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #3d1d81" class="">resize</span>(InstCount);</div>
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo;" class="">  <span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #4f8187" class="">AvailableLoads</span>.<span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #3d1d81" class="">resize</span>(LoadCount);</div>
                  <div style="margin: 0px; font-size: 11px; line-height:
                    normal; font-family: Menlo;" class="">  <span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #4f8187" class="">AvailableCalls</span>.<span
                      style="font-variant-ligatures:
                      no-common-ligatures; color: #3d1d81" class="">resize</span>(CallCount);</div>
                </div>
                <div class=""><br class="">
                </div>
                <div class="">But it does the job, and saves ~20% of
                  time in EarlyCSE on my profiles. Yes, iterating over
                  the entire function is way cheaper than grow().
                  Downsides are that while it’s still bounded by
                  function size, it could end up allocating a good bit
                  more depending on — in EarlyCSE’s case — the control
                  flow/dominator structure.</div>
                <div class=""><br class="">
                </div>
                <div class="">Any thoughts on this, or other less ugly
                  alternatives? I estimate that, at least in our pass
                  pipeline, we’re losing at least ~1% of total time to
                  avoidable DenseMap::grow() operations, which feels a
                  little bit… unnecessary.</div>
              </blockquote>
              <br class="">
              Slightly OT, but it looks like the LoadValue (value type
              of the AvailableLoads structure) is relatively memory
              inefficient.  At minimum, we could get rid of the IsAtomic
              space by using a tagged pointer.  That would at least
              bring us down to a 128 bits (a nice power of two).  That
              might help speed up some of the copying.  </div>
          </div>
        </blockquote>
        <br class="">
      </div>
      <div>Just to note, it looks like I was testing on a somewhat older
        LLVM version that didn’t have LoadValue at all, so my guess is
        this means it’s even -worse- in trunk.</div>
    </blockquote>
    Er, LoadValue's been around for a while (6 months).  How far back
    are you testing?  I'd strongly suggest switching to something more
    recent.<br>
    <br>
    Philip<br>
  </body>
</html>