[all-commits] [llvm/llvm-project] b0142c: [ADT] Add CoalescingBitVector, implemented using I...

Thu Feb 27 12:39:55 PST 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: b0142cd9867d720375008969fd9555cc1a17c098
      https://github.com/llvm/llvm-project/commit/b0142cd9867d720375008969fd9555cc1a17c098
  Author: Vedant Kumar <vsk at apple.com>
  Date:   2020-02-27 (Thu, 27 Feb 2020)

  Changed paths:
    M llvm/docs/ProgrammersManual.rst
    A llvm/include/llvm/ADT/CoalescingBitVector.h
    M llvm/unittests/ADT/CMakeLists.txt
    A llvm/unittests/ADT/CoalescingBitVectorTest.cpp
    M llvm/unittests/ADT/IntervalMapTest.cpp

  Log Message:
  -----------
  [ADT] Add CoalescingBitVector, implemented using IntervalMap [1/3]

Add CoalescingBitVector to ADT. This is part 1 of a 3-part series to
address a compile-time explosion issue in LiveDebugValues.

---

CoalescingBitVector is a bitvector that, under the hood, relies on an
IntervalMap to coalesce elements into intervals.

CoalescingBitVector efficiently represents sets which predominantly
contain contiguous ranges (e.g.  the VarLocSets in LiveDebugValues,
which are very long sequences that look like {1, 2, 3, ...}). OTOH,
CoalescingBitVector isn't good at representing sets with lots of gaps
between elements. The first N coalesced intervals of set bits are stored
in-place (in the initial heap allocation).

Compared to SparseBitVector, CoalescingBitVector offers more predictable
performance for non-sequential find() operations. This provides a
crucial speedup in LiveDebugValues.

Differential Revision: https://reviews.llvm.org/D74984

  Commit: 210c4853de20c36da207dfc5f524e0d7f013eec4
      https://github.com/llvm/llvm-project/commit/210c4853de20c36da207dfc5f524e0d7f013eec4
  Author: Vedant Kumar <vsk at apple.com>
  Date:   2020-02-27 (Thu, 27 Feb 2020)

  Changed paths:
    M llvm/lib/CodeGen/LiveDebugValues.cpp

  Log Message:
  -----------
  [LiveDebugValues] Encode a location in VarLoc IDs, NFC [2/3]

This is part 2 of a 3-part series to address a compile-time explosion
issue in LiveDebugValues.

---

Each VarLoc has a unique ID: this ID is used to look up a VarLoc in the
VarLocMap, and to virtually insert a VarLoc into a VarLocSet. Instead of
inserting the VarLoc /itself/ into the VarLocSet, we insert just the ID,
because this can be represented efficiently with a SparseBitVector.

This change introduces LocIndex, a layer of abstraction on top of VarLoc
IDs. Prior to this change, an ID was just an index into a vector. With
this change, an ID encodes both an index /and/ a register location. The
type-checker ensures that conversions to and from LocIndex are correct.

For the moment the register location is always 0 (undef). We have plenty
of bits left over to encode physregs, stack slots, and other locations
in the future.

Differential Revision: https://reviews.llvm.org/D74985

  Commit: a993720397ea316ac2866d2354d6fe6b4e97169a
      https://github.com/llvm/llvm-project/commit/a993720397ea316ac2866d2354d6fe6b4e97169a
  Author: Vedant Kumar <vsk at apple.com>
  Date:   2020-02-27 (Thu, 27 Feb 2020)

  Changed paths:
    M llvm/lib/CodeGen/LiveDebugValues.cpp
    M llvm/test/DebugInfo/MIR/X86/entry-values-diamond-bbs.mir
    M llvm/test/DebugInfo/MIR/X86/multiple-param-dbg-value-entry.mir

  Log Message:
  -----------
  [LiveDebugValues] Encode register location within VarLoc IDs [3/3]

This is part 3 of a 3-part series to address a compile-time explosion
issue in LiveDebugValues.

---

Start encoding register locations within VarLoc IDs, and take advantage
of this encoding to speed up transferRegisterDef.

There is no fundamental algorithmic change: this patch simply swaps out
SparseBitVector in favor of CoalescingBitVector. That changes iteration
order (hence the test updates), but otherwise this patch is NFCI.

The only interesting change is in transferRegisterDef. Instead of doing:

```
KillSet = {}
for (ID : OpenRanges.getVarLocs())
  if (DeadRegs.count(ID))
    KillSet.add(ID)
```

We now do:

```
KillSet = {}
for (Reg : DeadRegs)
  for (ID : intervalsReservedForReg(Reg, OpenRanges.getVarLocs()))
    KillSet.add(ID)
```

By not visiting each open location every time we visit an instruction,
this eliminates some potentially quadratic behavior. The new
implementation basically does a constant amount of work per instruction
because the interval map lookups are very fast.

For a file in WebKit, this brings the time spent in LiveDebugValues down
from ~2.5 minutes to 4 seconds, reducing compile time spent in that pass
from 28% of the total to just over 1%.

Before:

```
2.49 min   27.8%	0 s	LiveDebugValues::process
2.41 min   27.0%	5.40 s	LiveDebugValues::transferRegisterDef
1.51 min   16.9%	1.51 min LiveDebugValues::VarLoc::isDescribedByReg() const
32.73 s    6.1%		8.70 s	 llvm::SparseBitVector<128u>::SparseBitVectorIterator::operator++()
```

After:

```
4.53 s	1.1%	0 s	LiveDebugValues::process
3.00 s	0.7%	107.00 ms		LiveDebugValues::transferRegisterCopy
892.00 ms	0.2%	406.00 ms	LiveDebugValues::transferSpillOrRestoreInst
404.00 ms	0.1%	32.00 ms	LiveDebugValues::transferRegisterDef
110.00 ms	0.0%	2.00 ms		  LiveDebugValues::getUsedRegs
57.00 ms	0.0%	1.00 ms		  std::__1::vector<>::push_back
40.00 ms	0.0%	1.00 ms		  llvm::CoalescingBitVector<>::find(unsigned long long)
```

FWIW, I tried the same approach using SparseBitVector, but got bad
results. To do that, I had to extend SparseBitVector to support 64-bit
indices and expose its lower bound operation. The problem with this is
that the performance is very hard to predict: SparseBitVector's lower
bound operation falls back to O(n) linear scans in a std::list if you're
not /very/ careful about managing iteration order. When I profiled this
the performance looked worse than the baseline.

You can see the full CoalescingBitVector-based implementation here:

  https://github.com/vedantk/llvm-project/commits/try-coalescing

You can see the full SparseBitVector-based implementation here:

  https://github.com/vedantk/llvm-project/commits/try-sparsebitvec-find

Depends on D74984 and D74985.

Differential Revision: https://reviews.llvm.org/D74986

Compare: https://github.com/llvm/llvm-project/compare/1d8fad44d301...a993720397ea