[llvm-commits] Final SwitchInst for PR1255: case-ranges.

Sat Jul 21 01:08:26 PDT 2012

Hi Stepan, thanks for the patch, it looks pretty good.

> Please find the patch that converts current SwitchInst into one that supports
> case-ranges. This patch is not compilable, since I removed all backward
> compatibility layer. This is how I see the final SwitchInst.

I'm not going to comment on IntegersSubsetMapping as it is only used as a
helper for codegen (it is #included in Instructions.h but this is probably
just a mistake) so isn't relevant to the general switch design.

General design comments:

(1) is there any point in having the notion of a default case?  Since cases are
now arbitrary sets of integers, there doesn't seem to be a useful distinction
between the default case and the others, you might as well say that the union
of the sets should cover every possible value.  For example, rather than saying
that the cases are:  -2..0 -> destA, 1,2 -> destB, 4 -> destC, default -> destD,
you may as well say -2..0 -> destA, 1,2 -> destB, 4 -> destC, 3, 5..-3 -> destD.

I think this is an important point to decide on.

That said, it may be convenient internally, as an implementation detail, to have
an "everything else" case.  I mean, since most cases will just be single values
or small ranges of values, while the complement of those cases will be a more
baroque set, you could just not bother storing the complement.  In the above
example you could store -2..0 (destA), 1,2 (destB) and 4 (destC) and mark destD
as corresponding to "everything else".  This distinction wouldn't be visible to
users though.

(2) users will see a list of cases, each case will have a corresponding set of
integers, each set consisting of ranges that can be iterated over.  Thus it is
natural to expose a range type (IntRange class) and a set type (IntegersSubset).
In order to not build in any notion of signedness/unsignedness into case values,
the range type should be wrapping [in C switch values are always signed, but in
other languages like Ada they can be signed or unsigned].  For example, consider
the following unsigned i8 switch:
   0 .. 120 -> destA
   121 .. 250 -> destB
   251 .. 255 -> destC
Here the ranges should be [0, 121), [121, 251), [251, 0), and not artificially
broken up into [0, 121), [121, 128) union [128, 251), [251, 0) because of some
notion that they should be non-wrapping when viewed as signed numbers (the
range [121, 251) wraps as signed numbers since 127 is positive and 128 is
negative).  The ConstantRange class is already designed to deal with this kind
of thing, and I think you should just drop IntRange and use ConstantRange
instead.

(3) storing values.  If there is no default case, then the union of the case
sets covers every value.  As Chris pointed out (well, maybe this is not what
he pointed out), that means that the end of one range must be the start of
another range, so there is no point in storing [Lo, Hi) pairs for each range,
you can just store the start point of each range.  In the example from (1),
rather than storing <-2, 0> for destA and <1, 2> for destB you can store
<-2, 1, 2> and have destA, destB store indices into this.  The complete set
of APInt values you would store for the above example is:
   -2, 1, 3, 4, 5

Of course you would want to store them ordered.  I suggest ordering them as
signed numbers.  This isn't the same as considered switch cases to be signed,
it is just an internal convenience and ensures that two switch instructions
with the same ranges will have exactly the same list of APInt values stored
in them.

Note that with this scheme if cases have only one value, then you have to
store only one value if cases are consecutive, but you have to store two
values if they are not consecutive.  For example:
   1 -> A
   2 -> B
   3 -> C
   5 -> D
complement -> E
You need to store: 1, 2, 3, 4, 5
The 4 is the lower endpoint of one of the E case ranges.

Do you have some statistics on whether case values are typically consecutive,
or whether they tend to be lots of gaps between them?

Note that if you go this route then you would probably want to abandon the idea
of exposing a real set type (IntegersSubset) directly to the user.  Instead the
user could query the case iterator for information about the "set" for that
case, like the number of ranges in it, and get an iterator for visiting the
ranges.  All of this would just be referencing and indexing into the above list
of APInt values stored in the switch.

If the decision is that there should be a default case then this scheme is less
interesting (there are also several other possibilities for storing values).

Maybe the best thing to do is to only provide users with methods that are
agnostic about internal storage.  That can be done by only exporting ways of
querying information about sets without exporting the sets themselves (see
above), so methods like this
   const IntegersSubset *findCaseDest(BasicBlock *BB) const {
and
   IntegersSubsetTy &getCaseSubset() {
would have to go.  Note that you could still use a type like IntegersSubset for
adding new cases, i.e. as a way of passing a set to the SwitchInst code, but you
wouldn't ever get an IntegersSubset by querying SwitchInst methods.

Choosing a good SwitchInst API is extremely important and needs some discussion.
As well as the above "storage agnostic" consideration, there should also be
thought about what case manipulation methods need to be built in.

For example, maybe a method for saying "replace destination X with destination
Y" should be built in.  If Y is not already a destination for the switch then
this would be trivial.  If it is already a destination then the case sets for
X and Y would be joined (union).  In a world with no default cases, this would
replace removeCase (which wouldn't mean anything anymore - right now it means:
merge the set of values for this case into the default case).  What I'm saying
is that maybe there should be a set of high level manipulation methods built
into SwitchInst rather than exposing sets and requiring users to do all kinds
of set manipulations.  But what should these methods be?  As well as the above
method, which is essentially "union", there should probably be an intersection
method too, for splitting a case set into two subsets with different
destinations.  However really there needs to be an examination of what kinds
of switch manipulations are done in practice, and an appropriate set of methods
distilled out of that.

Ciao, Duncan.

PS: I noticed some mistakes, for example

   unsigned getNumCases() const {
     return getNumOperands()/2 - 1;
   }

is now wrong (shouldn't divide by 2 any more), but these kind of things can be
fixed later.