[llvm] [CodeGen] Speed up ReachingDefAnalysis (NFC) (PR #100913)

Tue Aug 13 12:49:36 PDT 2024

================
@@ -33,35 +32,156 @@ namespace llvm {
 class MachineBasicBlock;
 class MachineInstr;
 
-/// Thin wrapper around "int" used to store reaching definitions,
-/// using an encoding that makes it compatible with TinyPtrVector.
-/// The 0th LSB is forced zero (and will be used for pointer union tagging),
-/// The 1st LSB is forced one (to make sure the value is non-zero).
-class ReachingDef {
-  uintptr_t Encoded;
-  friend struct PointerLikeTypeTraits<ReachingDef>;
-  explicit ReachingDef(uintptr_t Encoded) : Encoded(Encoded) {}
-
+// An implementation of multimap from (MBBNumber, Unit) to reaching definitions.
+//
+// This implementation only supports modification operations just enough
+// to serve our needs:
+//
+// - addDef
+// - prependDef
+// - replaceFront
+//
+// Internally, the multimap is implemented as a collection of singly linked
+// lists represented on top of a single array.  Each singly-linked list
+// contains reaching definitions for a given pair of MBBNumber and Unit.
+//
+// This design has the following highlights:
+//
+// - Unlike SparseMultiset or other maps, we do not store keys as part of values
+//   or anywhere else in the data structure.
+//
+// - The single array design minimizes malloc traffic.
+//
+// - Reaching definitions share one array.  This means that if one pair of
+//   (MBBNumber, Unit) has multiple reaching definitions while another pair of
+//   (MBBNumber, Unit) has none, they cancel each other to some extent.
+class MBBReachingDefsInfo {
 public:
-  ReachingDef(std::nullptr_t) : Encoded(0) {}
-  ReachingDef(int Instr) : Encoded(((uintptr_t) Instr << 2) | 2) {}
-  operator int() const { return ((int) Encoded) >> 2; }
-};
+  MBBReachingDefsInfo() = default;
+  MBBReachingDefsInfo(const MBBReachingDefsInfo &) = delete;
+  MBBReachingDefsInfo &operator=(const MBBReachingDefsInfo &) = delete;
+
+  // Initialize the multimap with the number of basic blocks and the number of
+  // register units.
+  void init(unsigned BBs, unsigned Regs) {
+    assert(NumBlockIDs == 0 && "can initialize only once");
+    assert(NumRegUnits == 0 && "can initialize only once");
+    assert(Storage.empty() && "can initialize only once");
+    NumBlockIDs = BBs;
+    NumRegUnits = Regs;
+    unsigned NumIndexes = NumBlockIDs * NumRegUnits;
----------------
kazutakahirata wrote:

Wow!  That's a lot.  Thank you for the data point.  Do you happen to know how many reg units are typically in use in a function?  I see about 25% or less for X86.  (Of course, we might see more utilization if you use 512-bit vectors like AVX-512.)

FWIW, I have a local patch that takes an idea of @nikic's where we only allocate arrays for reg units in use.

https://github.com/llvm/llvm-project/pull/100913