[llvm-commits] [PATCH] SmallEnumSet

Sat Jan 10 04:52:02 PST 2009

Hi Talin,

Thanks for this submission. I have some comments and questions.

> This patch contains a new container class, SmallEnumSet, and unit  
> tests for it.
>
> SmallEnumSet is inspired by SmallVector and the other "small" sets  
> in ADT.
>
 From looking at the code, it seems like this is very similar to the  
BitVector. The difference being that the size is known at compile  
time. Is it possible either to use the BitVector or modify BitVector's  
implementation so that you know the size at compile time (if that's  
important enough)?

> The keys of the set are enumeration constants, which range from 0 to  
> some maximum value which is known at compile time. Internally,  
> SmallEnumSet maintains a fixed-length bit vector, so that membership  
> in the set is represented by a single bit per possible key.
>
Enumerated values can go from negative to positive. They don't have to  
start from 0.

> Traditionally in C and C++ this sort of thing is done via a "flags"  
> field. SmallEnumSet has a number of advantages over using flags:
> 	• It's type safe.

An enum is its own type.

> 	• It has a set-like interface, so you can use "add", "contains" and  
> other set methods.

The std::set doesn't use "add" or "contains". If you're going to call  
this a set, you should use the same nomenclature: e.g. "insert",  
"find". You should also have an "iterator" associated with this class  
to keep things consistent.

> 	• It isn't limited to 32 flags or whatever the size of an integer  
> may be.

Okay.

> 	• The keys can be sequentially-numbered enumeration constants, such  
> as produced by an enum statement with default initialization - you  
> don't have to initialize your enum constants to single-bit values  
> such as "(1 << 5)" or "0x00000800", nor do you have to update all of  
> the initializers if you want to insert a new constant in the middle  
> of the sequence.

> At the same time, SmallEnumSet retains many of the advantages of an  
> integer flags field - it allocates no memory, and can efficiently be  
> passed or returned by value between functions.

This is only true if the size of the enum set is small enough that the  
array holding the values can be passed back efficiently (without  
calling memcpy or something similar).

> All of the methods of the class can be optimized down to essentially  
> the same code you would write if you were using flags.
>

A few comments inline:

+  SmallEnumSet(const SmallEnumSet & src) {
+    for (unsigned i = 0; i < NumWords; ++i) {
+      bits[i] = src.bits[i];
+    }
+  }
+

BitVector uses "std::copy" to do this. Maybe you should too?

+  bool empty() const {
+    for (unsigned i = 0; i < NumWords; ++i) {
+      if (bits[i] != 0) {
+        return false;
+      }
+    }
+
+    return true;
+  }

empty() should really execute in constant time. Maybe you could rename  
it. BitVector uses "none()" and "any()".

+  bool containsAll(const SmallEnumSet & in) const {
+    for (unsigned i = 0; i < NumWords; ++i) {
+      if ((~bits[i] & in.bits[i]) != 0) {
+        return false;
+      }
+    }
+
+    return true;
+  }

How is this different from "operator==()"?

+  SmallEnumSet & add(EnumType val) {
+    bits[wordForIndex(val)] |= bitForIndex(val);
+    return * this;
+  }

I prefer "insert" to "add" if we're calling this a "set". :-)

+  SmallEnumSet & addAll(const SmallEnumSet & in) {
+    for (unsigned i = 0; i < NumWords; ++i) {
+      bits[i] |= in.bits[i];
+    }
+    return * this;
+  }

Instead of calling this "addAll", just call it "insert", but when it's  
passed a SmallEnumSet&, it will add the whole thing.

+  SmallEnumSet & addRange(EnumType first, EnumType last) {
+    unsigned firstWord = wordForIndex(first);
+    unsigned lastWord = wordForIndex(last);
+    if (firstWord == lastWord) {
+      bits[firstWord] |= maskForIndex(first) & ~maskForIndex(last);
+    } else {
+      bits[firstWord] |= maskForIndex(first);
+      for (unsigned i = firstWord + 1; i < lastWord; ++i) {
+        bits[i] = -1;
+      }
+      bits[lastWord] |= ~maskForIndex(last);
+    }
+
+    return * this;
+  }

Same comment above. Call this "insert", etc. This should probably take  
iterators.

+  SmallEnumSet & remove(EnumType val) {
+    bits[wordForIndex(val)] &= ~bitForIndex(val);
+    return * this;
+  }
+
+  SmallEnumSet & removeAll(const SmallEnumSet & in) {
+    for (unsigned i = 0; i < NumWords; ++i) {
+      bits[i] &= ~in.bits[i];
+    }
+    return * this;
+  }
+
+  SmallEnumSet & removeRange(EnumType first, EnumType last) {
+    unsigned firstWord = wordForIndex(first);
+    unsigned lastWord = wordForIndex(last);
+    if (firstWord == lastWord) {
+      bits[firstWord] &= ~maskForIndex(first) | maskForIndex(last);
+    } else {
+      bits[firstWord] &= ~maskForIndex(first);
+      for (unsigned i = firstWord + 1; i < lastWord; ++i) {
+        bits[i] = 0;
+      }
+      bits[lastWord] &= maskForIndex(last);
+    }
+
+    return * this;
+  }

Similar comments for the "remove" statements. "erase" is more  
consistent with a "set". And you should name these all "erase", and  
have them act accordingly based on what's passed in. The last one  
should take iterators.

+  SmallEnumSet & intersectWith(const SmallEnumSet & in) {
+    for (unsigned i = 0; i < NumWords; ++i) {
+      bits[i] &= in.bits[i];
+    }
+    return * this;
+  }

It's probably sufficient to call this "intersect" instead of  
"intersectWith".

+  static SmallEnumSet noneOf() {
+    return SmallEnumSet();
+  }

This name isn't very informative. What does this mean?

+  static SmallEnumSet of(EnumType v0) {
+    return SmallEnumSet().add(v0);
+  }
+
...
+  static SmallEnumSet of(
+      EnumType v0, EnumType v1, EnumType v2, EnumType v3, EnumType v4,
+      EnumType v5, EnumType v6, EnumType v7, EnumType v8, EnumType  
v9) {
+    return  
SmallEnumSet().add(v0).add(v1).add(v2).add(v3).add(v4).add(v5)
+        .add(v6).add(v7).add(v8).add(v9);
+  }

Yikes! Please don't do this. Use "inserts" instead that loop over a  
vector of values or something similar.

+  friend SmallEnumSet operator |(
+      const SmallEnumSet & a, const SmallEnumSet & b) {
+    return unionOf(a, b);
+  }
+
+  friend SmallEnumSet operator &(
+      const SmallEnumSet & a, const SmallEnumSet & b) {
+    return SmallEnumSet(a).intersectWith(b);
+  }

Is it necessary to make these "friend" methods? They seem to use  
publicly accessible methods. (I haven't tried to compile them without  
the friend keyword, so I don't know.)

-bw