[PATCH] LowerBitSets: Use byte arrays instead of bit sets to represent in-memory bit sets.

Peter Collingbourne peter at pcc.me.uk
Mon Mar 2 15:34:59 PST 2015


================
Comment at: include/llvm/Transforms/IPO/LowerBitSets.h:34
@@ -35,1 +33,3 @@
+  // The indices of the set bits in the bitset.
+  std::set<uint64_t> Bits;
 
----------------
kcc wrote:
> Is this better than unordered_set?
> (I don't know, just asking)
Maybe, not sure though. DenseSet may be even better here. But I'd rather wait until this pass isn't a small fraction of the overall compile time before trying to optimize our data structures.

================
Comment at: include/llvm/Transforms/IPO/LowerBitSets.h:174
@@ +173,3 @@
+  /// The number of bytes allocated so far for each of the bits.
+  uint64_t BitAllocs[8];
+
----------------
jfb wrote:
> Could you add a symbolic constant for 8? Also, why 8? Is it good for x86 but now as good for ARM?
Are you asking why I am using 8 bits per array element? Well, for one thing it gives us better packing (the more bins there are, the less evenly they will be filled). For another, the instruction sequences can be slightly shorter, both on x86 and ARM. For example, this code:

```
typedef unsigned long T;
typedef void (*F)(void);
T table;

unsigned char bitset[1000];
void FOO(T **obj) {
  T *vptr = *obj;
  T ptr_offset = (T)vptr - (T)table;
  T t1 = ptr_offset << (sizeof(void*)*8 - 3);
  T t2 = ptr_offset >> 3;
  T t3 = t1 | t2;
  unsigned bits = bitset[t3];
  if ((t3 >= TABLE_SIZE) || (bits & 1))
    __builtin_trap();
  F f = (F)vptr[4];
  f();
}
```

is one byte shorter (on x86) or one instruction shorter (on ARM) than this code:

```
typedef unsigned long T;
typedef void (*F)(void);
T table;

unsigned short bitset[1000];
void FOO(T **obj) {
  T *vptr = *obj;
  T ptr_offset = (T)vptr - (T)table;
  T t1 = ptr_offset << (sizeof(void*)*8 - 3);
  T t2 = ptr_offset >> 3;
  T t3 = t1 | t2;
  unsigned bits = bitset[t3];
  if ((t3 >= TABLE_SIZE) || (bits & 0x100))
    __builtin_trap();
  F f = (F)vptr[4];
  f();
}
```
I guess I could parameterise this but it seems like it would add more complexity than we need. Happy to document the choice of 8 bits per array element.

================
Comment at: include/llvm/Transforms/IPO/LowerBitSets.h:178
@@ +177,3 @@
+    for (unsigned I = 0; I != 8; ++I)
+      BitAllocs[I] = 0;
+  }
----------------
jfb wrote:
> `memset` or `std::fill`.
Done

================
Comment at: include/llvm/Transforms/IPO/LowerBitSets.h:183
@@ +182,3 @@
+  /// set. AllocByteOffset is set to the offset within the byte array and
+  /// AllocMask is set to the bitmask for those bits. This uses the LPT
+  /// multiprocessor scheduling algorithm to lay out the bits efficiently;
----------------
kcc wrote:
> LPT == longest process time? 
> Maybe mention the full name here? 
Done

================
Comment at: lib/Transforms/IPO/LowerBitSets.cpp:34
@@ -33,3 +33,3 @@
 
-STATISTIC(NumBitSetsCreated, "Number of bitsets created");
+STATISTIC(NumByteArraysCreated, "Number of byte arrays created");
 STATISTIC(NumBitSetCallsLowered, "Number of bitset calls lowered");
----------------
jfb wrote:
> kcc wrote:
> > It would be interesting to see more details stats here, 
> > e.g. how many bits were packed into how many bytes. 
> +1
> 
> How many bits are unused (didn't fit) would be interesting.
I've added some more stats here.

================
Comment at: lib/Transforms/IPO/LowerBitSets.cpp:152
@@ +151,3 @@
+  unsigned Bit = 0;
+  for (unsigned I = 1; I != 8; ++I)
+    if (BitAllocs[I] < BitAllocs[Bit])
----------------
jfb wrote:
> Update 8.
See my other comment.

================
Comment at: lib/Transforms/IPO/LowerBitSets.cpp:298
@@ +297,3 @@
+  // Create globals to stand in for byte arrays and masks. These never actually
+  // get initialized, we rauw and erase them later in allocateByteArrays() once
+  // we know the offset and mask to use.
----------------
jfb wrote:
> Capitalize RAUW.
Done

================
Comment at: lib/Transforms/IPO/LowerBitSets.cpp:360
@@ -268,1 +359,3 @@
+                                      ByteArrayInfo *&BAI, Value *BitOffset) {
+  if (BSI.BitSize <= 64) {
     // If the bit set is sufficiently small, we can avoid a load by bit testing
----------------
jfb wrote:
> Just to be sure I understand: this inlines small (<= 64 bit) bitsets. The current patch only benefits from co-locating larger bitsets, right? It would probably be good for the docs to specify this double approach to reducing bitset footprint.
We already document the technique for inlining <=64 bits:

http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html#short-inline-bit-vectors

and I do plan to document byte arrays on that page once this lands.

http://reviews.llvm.org/D7954

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list