[llvm-commits] [Patch] Use bitmaps for switch lookup tables that fit in registers

Thu Sep 20 18:37:20 PDT 2012

>> I know this is unrelated to this patch, which is just a bit of
>> refactoring, but is it really necessary to single out the 1-case
>> switch like this? Will a 1-case lookup table not get optimized away
>> somewhere else? It just seems weird to single it out like this:
>> wouldn't it be equally reasonable to single out when there are 2 cases
>> (convert to `select`)?
>
> It's not a 1-case switch, it's an n-case switch where all cases result
> in a single value.

Ok, I see what was confusing me. A function called BuildLookup, which
is in a class called SwitchLookupTable, which is in a code path which
satisfies a predicate ShouldBuildLookupTable(), ends up not building a
lookup table.

Please factor this logic into two clear parts. One which "analyzes"
the switch and determines the best way to handle it, and another which
actually handles transforming the code. Also, for each different
"kind" of switch that the analysis determines, please document (above
the enum constant for the kind would be a good place):
* A human description of that kind of switch.
* An example of that kind of switch.
* A human description of what it gets transformed into.

--Sean Silva

On Thu, Sep 20, 2012 at 2:06 PM, Hans Wennborg <hans at chromium.org> wrote:
> Thank you for the comments! Updated patch attached.
>
> On Thu, Sep 20, 2012 at 6:01 PM, Sean Silva <silvas at purdue.edu> wrote:
>> +  switch (Kind) {
>> +    case SingleValueKind:
>> +      return SingleValue;
>> +    case ArrayKind: {
>> +      Value *GEPIndices[] = { Builder.getInt32(0), Index };
>> +      Value *GEP = Builder.CreateInBoundsGEP(Array, GEPIndices,
>> +                                             "switch.gep");
>> +      return Builder.CreateLoad(GEP, "switch.load");
>> +    }
>> +  }
>> +  llvm_unreachable("Unknown lookup table kind!");
>>
>> I may be wrong, but my impression is that for fully-determined
>> switches the unreachable typically gets put in a default case.
>
> I think the idea is to avoid default cases in fully-determined
> switches, because if a new enumerator is added later, the compiler
> will warn about a missing case. If there is a default case, the
> compiler doesn't issue that warning.
>
>>    // If all values in the table are equal, this is that value.
>> -  Constant *SameResult = Results.begin()->second;
>> +  SingleValue = Values.begin()->second;
>>
>> I know this is unrelated to this patch, which is just a bit of
>> refactoring, but is it really necessary to single out the 1-case
>> switch like this? Will a 1-case lookup table not get optimized away
>> somewhere else? It just seems weird to single it out like this:
>> wouldn't it be equally reasonable to single out when there are 2 cases
>> (convert to `select`)?
>
> It's not a 1-case switch, it's an n-case switch where all cases result
> in a single value.
>
>> +      else if (IndexTy->getBitWidth() > MapTy->getBitWidth())
>> +        ShiftAmt = Builder.CreateTrunc(ShiftAmt, MapTy, "switch.trunc");
>>
>> Please comment why it is safe to truncate the index here.
>
> It's safe because the index is <= the number of elements in the table,
> and therefore also less than the bits in the bitmap. I'll add a
> comment in the code.
>
>> +      // Multiply the shift amount by the element width.
>> +      ShiftAmt = Builder.CreateMul(ShiftAmt,
>> +                      ConstantInt::get(MapTy, BitMapElementTy->getBitWidth()),
>> +                                   "switch.shiftamt");
>>
>> If the bitmap type has "room to spare" for widening the element types,
>> it will probably be profitable to pad them to a power-of-2 size so
>> that this multiply will become a shift. Just something to think about
>> for a future patch. I doubt that it's a big deal, but maybe
>
> I'll keep this in mind when doing the narrowing eventually. For now, I
> expect the elements to be power-of-2 wide anyway.
>
>> +bool SwitchLookupTable::WouldFitInRegister(const TargetData *TD,
>> +                                           uint64_t TableSize,
>> +                                           const Type *ElementType) {
>> +  if (!TD)
>> +    return false;
>> +  const IntegerType *IT = dyn_cast<IntegerType>(ElementType);
>> +  if (!IT)
>> +    return false;
>> +  return TD->fitsInLegalInteger(TableSize * IT->getBitWidth());
>> +}
>>
>> You have a FIXME about possibly narrowing the types. I think that that
>> FIXME should be here.
>
> Right, I've moved it.
>
>> Basically you can take the minimum of the bits
>> needed to represent each value. I'm pretty sure that in practically
>> all cases, that minimum is going to be smaller than the actual integer
>> type of the elements, possibly by a lot.
>>
>> In fact, computing that minimum is sufficiently simple that I think it
>> could be included in this patch as well (it might require some code
>> reorganization though, but I think that computing this minimum is a
>> sufficiently important thing to do that you might as well reorganize
>> now).
>
> I don't think it's this easy, and that's why I'd like to not do it in
> the same patch. Computing the minimum required bitwidth gets trickier
> when there are negative values involved. Also, we'd need to remember
> whether to zero- or sign-extend the values as they are extracted from
> the bitmap.
>
> Those zext and sext operations may be from uncommon widths, e.g. sext
> from i3 to i8, and I'm not sure what the cost for that code will be.
> Therefore the logic for deciding when to do this gets more
> complicated; maybe we only want to do narrowing if the table is
> sparse, but would fit in a register only if we did narrowing? Maybe we
> want to try to narrow to certain types if possible, e.g. we'd narrow
> from i16 to i8 even if 7 bits would be possible?
>
> I think there's enough to take into consideration regarding the
> narrowing that it warrants a patch of its own.
>
>
>> On Thu, Sep 20, 2012 at 5:31 AM, Hans Wennborg <hans at chromium.org> wrote:
>>> Thanks for the comments! New version attached, split into two patches.
>>>
>>>  - Hans
>>>
>>> On Thu, Sep 20, 2012 at 2:08 AM, Sean Silva <silvas at purdue.edu> wrote:
>>>>> The patch also refactors the code to put the table-building
>>>>> functionality in a separate class. I'm happy to break that out into a
>>>>> separate patch if preferred.
>>>>
>>>> Please. It's extremely hard to pick out just the new bitmask-table
>>>> building code or just the refactoring to a class. It's OK to send two
>>>> patches in an email (just make sure they have numbered prefixes so
>>>> that it is unambiguous in what order to apply them).
>>>
>>> Done.
>>>
>>>> A couple comments:
>>>>
>>>> +            const SmallVector<std::pair<ConstantInt*, Constant*>, 4>& Results,
>>>>
>>>> With a name like "Results" this could easily be confused (read: it
>>>> confused me) with being an out parameter.
>>>
>>> RIght, let's call it Values instead.
>>>
>>>> +/// SwitchLookupTable - This class represents a lookup table that can
>>>> be used to
>>>> +/// replace a switch.
>>>> +class SwitchLookupTable {
>>>>
>>>> Please wrap this class in an anonymous namespace as per the coding
>>>> standards <http://llvm.org/docs/CodingStandards.html#anonymous-namespaces>.
>>>
>>> Done.
>>>
>>>> +    std::vector<Constant*> TableContents(TableSize);
>>>>
>>>> Why vector and not SmallVector?
>>>
>>> You're right, SmallVector will work fine here.
>>>
>>>> +    SwitchLookupTable Table;
>>>> +    Table.Init(Mod, TableSize, MinCaseVal, ResultLists[PHI],
>>>> +               DefaultResults[PHI], TD);
>>>> [...]
>>>> +    Value *Result = Table.BuildLookup(TableIndex, Builder);
>>>>
>>>> What's the point of having Init? It just seems like an error-prone
>>>> constructor that might not be called. Maybe better to factor it as one
>>>> class that "analyzes" the switch, and another that does the building;
>>>> that way, the type system enforces things that the current approach
>>>> leaves up to the programmer.
>>>
>>> Right, the Init function looks pretty unnecessary here. Removed.
>>>
>>>
>>>
>>>> On Wed, Sep 19, 2012 at 1:32 PM, Hans Wennborg <hans at chromium.org> wrote:
>>>>> Hi all,
>>>>>
>>>>> The attached batch updates the switch-to-lookup table transformation
>>>>> in SimplifyCFG to put the lookup tables in a bitmap when they would
>>>>> fit in a register on the target. This saves some space, and it also
>>>>> allows for building tables that would otherwise be too sparse.
>>>>>
>>>>> One interesting case that this hits is example 7 from
>>>>> http://blog.regehr.org/archives/320. We currently generate good code
>>>>> for this when lowering the switch to the selection DAG: we build a
>>>>> bitmask to decide whether to jump to one block or the other. My patch
>>>>> will result in the same bitmask, but it removes the need for the jump,
>>>>> as the return value can just be retrieved from the mask.
>>>>>
>>>>> This was part of my original lookup table patch [1], but it was
>>>>> suggested in review that we should do it separately.
>>>>>
>>>>> I tried this on a bootstrap build of Clang, and it reduced the binary
>>>>> size with 8328 bytes (0.03%).
>>>>>
>>>>> The patch also refactors the code to put the table-building
>>>>> functionality in a separate class. I'm happy to break that out into a
>>>>> separate patch if preferred.
>>>>>
>>>>> Please take a look.
>>>>>
>>>>> Thanks,
>>>>> Hans
>>>>>
>>>>>  [1]. http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120903/149605.html