Switch containing holes via table lookup

Hans Wennborg hans at chromium.org
Tue Feb 18 16:11:42 PST 2014


Trying to cc the right list this time..

On Tue, Feb 18, 2014 at 3:46 PM, Hans Wennborg <hans at chromium.org> wrote:
> (Please include the list in your replies.)
>
> There's a lot of debug printouts and commented-out code in your patch,
> so this is not a proper review, but I did try to find the bug you
> mentioned.
>
> On Fri, Feb 14, 2014 at 2:54 PM, Jasper Neumann <jn at sirrida.de> wrote:
>>>> +static cl::opt<bool> CheckHoles(
>>>> +    "simplifycfg-check-holes", cl::Hidden, cl::init(true),
>>>> +    cl::desc("Allow holes in a value table by testing a mask"));
>>
>>> Having this command-line option is just for testing, right?
>>
>> With this switch you can easily (de)activate the optimization.
>> Doesn't this make sense?
>
> It makes sense during debugging, but not for committing I think.
>
>>>> +  unsigned NeededMaskSize = 0;
>>>> +  if (CheckHoles) {
>>>> +    if (TableSize <= 32 && TD->fitsInLegalInteger(32))
>>>> +      NeededMaskSize = 32;
>>>> +    else if (TableSize <= 64 && TD->fitsInLegalInteger(64))
>>>> +      NeededMaskSize = 64;
>>
>>> What if the table is of size 12? We don't want to have to use a 32-bit
>>> immediate for the mask in that case. I think we should generate the
>>> bitmask of the same size as the table, as long as it fits in a legal
>>> register. This is what the current bitmap-building code for lookup
>>> tables does (see the SwitchLookupTable constructor above).
>>
>> Well, it should be a register type where the bit test can be easily
>> performed. On e.g. an x86 there is no BT instruction for 8 bit operands;
>> however there is a variant of BT using memory, hence there is theoretically
>> no practical upper limit on the mask size (but needs more effort).
>> Also, the overhead of up to 3 bytes should be negligible; we have no array
>> of masks. For other processors things might be different.
>> ...but you are probably right and using the smallest register size will
>> produce the intended code.
>
> Right, hopefully the back-end should figure that out.
>
>>>> +  // If the table has holes, [...]
>>
>>
>>> I wonder if we could simplify the code here. What if we start with
>>> [...]
>>
>> That's essentially the same; the && operator is replaced by an if.
>
> Sure, it's semantically the same but I think it's easier for the
> reader to understand.
>
>>> ConstantInt *TableMask = 0;
>>> if (TableHasHoles && !HasDefaultResults &&
>>>     TD->fitsInLegalInteger(TableSize)) {
>>>    // Build a bitmask to check for valid table indices.
>>>    APInt MaskInt(TableSize, 0);
>>>    APInt One(TableSize, 1);
>>>    const ResultListTy &ResultList = ResultLists[PHIs[0]];
>>>    for (size_t I = 0, E = ResultList.size(); I != E; ++I) {
>>>      uint64_t Idx = (ResultList[I].first->getValue() -
>>>                      MinCaseVal->getValue()).getLimitedValue();
>>>      MaskInt |= One << Idx;
>>>    }
>>>    TableMask = ConstantInt::get(M.getContext(), MaskInt);
>>> }
>>
>> APInt allows for any register sizes. That should be better than uint64_t.
>>
>>
>>> Now we can take all these things together and decide if we want to
>>> bail out or not:
>>>
>>> if (TableHasHoles && !HasDefaultResults && !TableMask)
>>>   return false;
>>
>> We could get the same result with
>> if (TableHasHoles && !HasDefaultResults) {
>>    if (!TD->fitsInLegalInteger(TableSize))
>>      return false;
>>   ...
>> and bail out earlier without repeating the conditions.
>> On the other hand the return is more deeply nested.
>
> The point of my if statement is that it spells out *why* we have to
> bail out. From yours the reader has to go figure out why it matters
> wither TableSize fits in a register or not. One could put a comment
> there of course, but it's nicer if the code can spell it out.
>
>>>> +    // We need any value to fill the table; the first one suffices.
>>>> +    SwitchInst::CaseIt CI = SI->case_begin();
>>>> +    GetCaseResults(SI, CI.getCaseValue(), CI.getCaseSuccessor(),
>>>> +                   &CommonDest, DefaultResultsList, TD);
>>
>>> Ouch, calling GetCaseResults more than necessary is something I'd like
>>> to avoid. We already have all the results in ResultLists, so let's use
>>> that.
>>
>> How would you do that? As far as I can see the ResultList for all non-hole
>> cases have already been requested but were thrown away. Nevertheless I
>> re-request only one case.
>> On the other hand we need no default value for the hole-checked case.
>
> The results for all cases are stored in ResultLists, they're not
> thrown away. Anyway, it looks like we don't need this and you
> commented it out in your latest patch?
>
>>>>     // Populate the BB that does the lookups.
>>>>     Builder.SetInsertPoint(LookupBB);
>>>> +
>>>> +  if (NeedMask) {
>>>> +    uint64_t UseMask = 0;
>>>> +    for (SwitchInst::CaseIt CI = SI->case_begin(), E = SI->case_end();
>>>> +         CI != E; ++CI) {
>>>> +      ConstantInt *CaseVal = CI.getCaseValue();
>>>> +      UseMask |= 1ULL <<
>>>> +        (CaseVal->getValue() -
>> MinCaseVal->getValue()).getLimitedValue();
>>>> +    }
>>>> +
>>>> +    BasicBlock *MaskBB = BasicBlock::Create(Mod.getContext(),
>>>> +                                            "switch.lookup2",
>>>> +                                            CommonDest->getParent(),
>>>> +                                            CommonDest);
>>
>>> I think we should create this BB earlier (and with a better name!), so
>>> that we can use it in the conditional branch we create after checking
>>> that Index < TableSize above. I also like to have the mask built
>>> before we start generating IR.
>>
>> You are right. This should definitely be done. I found out how to do it.
>
> It would be nice if we only created this BB if we need it (i.e. when
> we use the "hole bitmask"). I know the optimizers will clean it up for
> us, but it would be good to avoid generating unnecessary code.
>
>>> I wonder if this mask checking code could be simpler. Look at
>>> SwitchLookupTable::BuildLookup for the BitMapKind case.
>>
>> The mentioned code indeed looks much simpler since we can get rid of the
>> special types.
>
> I think it can be simpler still :) There should be no need for the And
> and ICmpNE. It should be possible to do just a right-shift and
> truncation.
>
>> I prepared a new patch with more debug output and your remarks included but
>> unfortunately the compiler crash is the same (for LLParser.cpp):
>> ==>
>>
>> clang: /home/jane/wrk/llvm-trunk/include/llvm/IR/Instructions.h:2155:
>> llvm::Value* llvm::PHINode::getIncomingValueForBlock(const
>> llvm::BasicBlock*) const: Assertion `Idx >= 0 && "Invalid basic block
>> argument!"' failed.
>> [...]
>> <==
>
> I might have an idea.
>
> With your patch, we're potentially adding a new predecessor to the
> default bb: the "hole check" bb. If the default bb has a phi node in
> it, that's a problem, because it has no value for this predecessor.
>
>  - Hans



More information about the llvm-commits mailing list