[PATCH] Simplify switch table lookup if a linear mapping is possible

Thu Nov 13 13:29:12 PST 2014

On Thu, Nov 13, 2014 at 12:20 PM, Erik Eckstein <eeckstein at apple.com> wrote:
> Thanks for reviewing!
>
>> Do you have any numbers on how often this fires in a Clang bootstrap
>> or other benchmark?
>
> 29 in the test-suite (including External).

Nice.

>> This enum is sorted in a "cheapest first" order, which matches the
>> order that the SwitchLookupTable tries things. LinearMapKind should go
>> before ArrayKind.
>
> In this case I would put it even before BitMapKind.

Sounds good.

>> I think undef is probably the only possible case here, right?
>
> I'm not sure.
>
>> For undef, it shouldn't matter what result we return (someone please
>> correct me if I'm wrong here), so it would be nice if we could ignore
>> undef values when trying to find the linear relationship. If it makes
>> things more complicated we could do it in a follow-up patch.
>
> Let's do it in a follow-up patch. I'll add a TODO.
>
>> Should we handle the cases where the multiplier is 1 or offset is 0
>> and generate simpler code for that?
>
> I didn't do it to keep the code simpler. But if there is a good reason to simplify it here and not in follow-up optimizations, it's not a problem.

Yeah, I think we should do it here.

A few more thoughts about the patch:

> +  // Check if we can derive the value with a linear transformation from the
> +  // table index.
> +  IntegerType *ValueIntType = dyn_cast<IntegerType>(ValueType);
> +  if (ValueIntType) {
> +    bool LinearMappingPossible = true;
> +    APInt PrevVal;
> +    APInt DistToPrev;
> +    // Check if there is the same distance between two consecutive values.
> +    for (uint64_t I = 0; I < TableSize; ++I) {
> +      ConstantInt *ResConst = dyn_cast<ConstantInt>(TableContents[I]);
> +      if (!ResConst) {
> +        // E.g. if it is an undef.
> +        LinearMappingPossible = false;
> +        break;
> +      }
> +      APInt Val = ResConst->getValue();
> +      if (I > 0) {

Not a big deal, but checking "I != 0" seems like a more clear way to
check that it's not the first iteration.

> +        APInt Dist = Val - PrevVal;
> +        if (I == 1) {
> +          DistToPrev = Dist;
> +        } else if (Dist != DistToPrev) {
> +          LinearMappingPossible = false;
> +          break;
> +        }
> +      }
> +      PrevVal = Val;
> +    }

The way I would do this is I would first look at two values to figure
out the offset and multiplier, and then iterate through the rest of
the values and see if they fit.

I think that might be a little easier to read, and it would also make
it more obvious that all the results are correct even in the case of
overflows etc., because we've checked the same computation that will
happen at run-time. It will also make handling undefs easier, i.e. we
can just ignore them.

Thanks,
Hans

>> On 13 Nov 2014, at 19:40, Hans Wennborg <hans at chromium.org> wrote:
>>
>> On Thu, Nov 13, 2014 at 7:26 AM, Erik Eckstein <eeckstein at apple.com> wrote:
>>> Hi Hans,
>>>
>>> this patch adds a simple optimization for switch table lookup: It computes the output value directly with an (optional) mul and add if there is a linear mapping between index and output. Example:
>>
>> Thanks! This has been on my mental todo list for switch transformation
>> stuff for a while, so I'm very happy to review it.
>>
>> Do you have any numbers on how often this fires in a Clang bootstrap
>> or other benchmark?
>>
>>> Index: lib/Transforms/Utils/SimplifyCFG.cpp
>>> ===================================================================
>>> --- lib/Transforms/Utils/SimplifyCFG.cpp (revision 221882)
>>> +++ lib/Transforms/Utils/SimplifyCFG.cpp (working copy)
>>> @@ -70,6 +70,7 @@
>>>     cl::desc("Hoist conditional stores if an unconditional store precedes"));
>>>
>>> STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
>>> +STATISTIC(NumLinearMaps, "Number of switch instructions turned into linear mapping");
>>> STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
>>> STATISTIC(NumLookupTablesHoles, "Number of switch instructions turned into lookup tables (holes checked)");
>>> STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
>>> @@ -3663,7 +3664,12 @@
>>>
>>>       // The table is stored as an array of values. Values are retrieved by load
>>>       // instructions from the table.
>>> -      ArrayKind
>>> +      ArrayKind,
>>> +
>>> +      // For tables where there is a linear relationship between table index
>>> +      // and values. We calculate the result with a simple multiplication
>>> +      // and addition instead of a table lookup.
>>> +      LinearMapKind
>>
>> This enum is sorted in a "cheapest first" order, which matches the
>> order that the SwitchLookupTable tries things. LinearMapKind should go
>> before ArrayKind.
>>
>>
>>>     } Kind;
>>>
>>>     // For SingleValueKind, this is the single value.
>>> @@ -3673,6 +3679,10 @@
>>>     ConstantInt *BitMap;
>>>     IntegerType *BitMapElementTy;
>>>
>>> +    // For LinearMapKind, these are the constants used to derive the value.
>>> +    Constant *ValueOffset;
>>> +    Constant *Multiplier;
>>
>> Maybe LinearOffset and LinearMultiplier, to tie the names even tighter
>> to the LinearMapKind?
>>
>>> +
>>>     // For ArrayKind, this is the array.
>>>     GlobalVariable *Array;
>>>   };
>>> @@ -3685,7 +3695,7 @@
>>>                                      Constant *DefaultValue,
>>>                                      const DataLayout *DL)
>>>     : SingleValue(nullptr), BitMap(nullptr), BitMapElementTy(nullptr),
>>> -      Array(nullptr) {
>>> +      ValueOffset(nullptr), Multiplier(nullptr), Array(nullptr) {
>>>   assert(Values.size() && "Can't build lookup table without values!");
>>>   assert(TableSize >= Values.size() && "Can't fit values in table!");
>>>
>>> @@ -3730,6 +3740,42 @@
>>>     return;
>>>   }
>>>
>>> +  // Check if we can derive the value with a linear transformation from the
>>> +  // table index.
>>> +  IntegerType *ValueIntType = dyn_cast<IntegerType>(ValueType);
>>> +  if (ValueIntType) {
>>
>> In LLVM style it's more common to do:
>>
>>  if (IntegerType *ValueIntType = dyn_cast<...
>>
>> This saves a line and reduces the scope of ValueIntType to only the if
>> statement, which is where it's used.
>>
>>> +    bool LinearMappingPossible = true;
>>> +    APInt PrevVal;
>>> +    APInt DistToPrev;
>>> +    // Check if there is the same distance between two consecutive values.
>>> +    for (uint64_t I = 0; I < TableSize; ++I) {
>>> +      ConstantInt *ResConst = dyn_cast<ConstantInt>(TableContents[I]);
>>
>> ResConst feels a little backwards to me. Why not ConstRes, as in
>> "constant result"?
>>
>>> +      if (!ResConst) {
>>> +        // E.g. if it is an undef.
>>
>> I think undef is probably the only possible case here, right?
>>
>> For undef, it shouldn't matter what result we return (someone please
>> correct me if I'm wrong here), so it would be nice if we could ignore
>> undef values when trying to find the linear relationship. If it makes
>> things more complicated we could do it in a follow-up patch.
>>
>>> +        LinearMappingPossible = false;
>>> +        break;
>>> +      }
>>> +      APInt Val = ResConst->getValue();
>>> +      if (I > 0) {
>>> +        APInt Dist = Val - PrevVal;
>>> +        if (I == 1) {
>>> +          DistToPrev = Dist;
>>> +        } else if (Dist != DistToPrev) {
>>> +          LinearMappingPossible = false;
>>> +          break;
>>> +        }
>>> +      }
>>> +      PrevVal = Val;
>>> +    }
>>> +    if (LinearMappingPossible) {
>>> +      ValueOffset = TableContents[0];
>>> +      Multiplier = ConstantInt::get(ValueIntType, DistToPrev);
>>> +      Kind = LinearMapKind;
>>> +      ++NumLinearMaps;
>>> +      return;
>>> +    }
>>> +  }
>>> +
>>>   // If the type is integer and the table fits in a register, build a bitmap.
>>>   if (WouldFitInRegister(DL, TableSize, ValueType)) {
>>>     IntegerType *IT = cast<IntegerType>(ValueType);
>>> @@ -3802,6 +3848,14 @@
>>>                                              "switch.gep");
>>>       return Builder.CreateLoad(GEP, "switch.load");
>>>     }
>>> +    case LinearMapKind: {
>>> +      // Derive the result value from the input value.
>>> +      Value *Result = Builder.CreateIntCast(Index, Multiplier->getType(), false,
>>> +                                            "switch.idx.cast");
>>> +      Result = Builder.CreateMul(Result, Multiplier, "switch.idx.mult");
>>> +      Result = Builder.CreateAdd(Result, ValueOffset, "switch.offset");
>>
>> Should we handle the cases where the multiplier is 1 or offset is 0
>> and generate simpler code for that? It seems like it would be easy to
>> handle here, so we might as well do it and save some churn.
>