Switch containing holes via table lookup

Tue Feb 18 15:46:52 PST 2014

(Please include the list in your replies.)

There's a lot of debug printouts and commented-out code in your patch,
so this is not a proper review, but I did try to find the bug you
mentioned.

On Fri, Feb 14, 2014 at 2:54 PM, Jasper Neumann <jn at sirrida.de> wrote:
>>> +static cl::opt<bool> CheckHoles(
>>> +    "simplifycfg-check-holes", cl::Hidden, cl::init(true),
>>> +    cl::desc("Allow holes in a value table by testing a mask"));
>
>> Having this command-line option is just for testing, right?
>
> With this switch you can easily (de)activate the optimization.
> Doesn't this make sense?

It makes sense during debugging, but not for committing I think.

>>> +  unsigned NeededMaskSize = 0;
>>> +  if (CheckHoles) {
>>> +    if (TableSize <= 32 && TD->fitsInLegalInteger(32))
>>> +      NeededMaskSize = 32;
>>> +    else if (TableSize <= 64 && TD->fitsInLegalInteger(64))
>>> +      NeededMaskSize = 64;
>
>> What if the table is of size 12? We don't want to have to use a 32-bit
>> immediate for the mask in that case. I think we should generate the
>> bitmask of the same size as the table, as long as it fits in a legal
>> register. This is what the current bitmap-building code for lookup
>> tables does (see the SwitchLookupTable constructor above).
>
> Well, it should be a register type where the bit test can be easily
> performed. On e.g. an x86 there is no BT instruction for 8 bit operands;
> however there is a variant of BT using memory, hence there is theoretically
> no practical upper limit on the mask size (but needs more effort).
> Also, the overhead of up to 3 bytes should be negligible; we have no array
> of masks. For other processors things might be different.
> ...but you are probably right and using the smallest register size will
> produce the intended code.

Right, hopefully the back-end should figure that out.

>>> +  // If the table has holes, [...]
>
>
>> I wonder if we could simplify the code here. What if we start with
>> [...]
>
> That's essentially the same; the && operator is replaced by an if.

Sure, it's semantically the same but I think it's easier for the
reader to understand.

>> ConstantInt *TableMask = 0;
>> if (TableHasHoles && !HasDefaultResults &&
>>     TD->fitsInLegalInteger(TableSize)) {
>>    // Build a bitmask to check for valid table indices.
>>    APInt MaskInt(TableSize, 0);
>>    APInt One(TableSize, 1);
>>    const ResultListTy &ResultList = ResultLists[PHIs[0]];
>>    for (size_t I = 0, E = ResultList.size(); I != E; ++I) {
>>      uint64_t Idx = (ResultList[I].first->getValue() -
>>                      MinCaseVal->getValue()).getLimitedValue();
>>      MaskInt |= One << Idx;
>>    }
>>    TableMask = ConstantInt::get(M.getContext(), MaskInt);
>> }
>
> APInt allows for any register sizes. That should be better than uint64_t.
>
>
>> Now we can take all these things together and decide if we want to
>> bail out or not:
>>
>> if (TableHasHoles && !HasDefaultResults && !TableMask)
>>   return false;
>
> We could get the same result with
> if (TableHasHoles && !HasDefaultResults) {
>    if (!TD->fitsInLegalInteger(TableSize))
>      return false;
>   ...
> and bail out earlier without repeating the conditions.
> On the other hand the return is more deeply nested.

The point of my if statement is that it spells out *why* we have to
bail out. From yours the reader has to go figure out why it matters
wither TableSize fits in a register or not. One could put a comment
there of course, but it's nicer if the code can spell it out.

>>> +    // We need any value to fill the table; the first one suffices.
>>> +    SwitchInst::CaseIt CI = SI->case_begin();
>>> +    GetCaseResults(SI, CI.getCaseValue(), CI.getCaseSuccessor(),
>>> +                   &CommonDest, DefaultResultsList, TD);
>
>> Ouch, calling GetCaseResults more than necessary is something I'd like
>> to avoid. We already have all the results in ResultLists, so let's use
>> that.
>
> How would you do that? As far as I can see the ResultList for all non-hole
> cases have already been requested but were thrown away. Nevertheless I
> re-request only one case.
> On the other hand we need no default value for the hole-checked case.

The results for all cases are stored in ResultLists, they're not
thrown away. Anyway, it looks like we don't need this and you
commented it out in your latest patch?

>>>     // Populate the BB that does the lookups.
>>>     Builder.SetInsertPoint(LookupBB);
>>> +
>>> +  if (NeedMask) {
>>> +    uint64_t UseMask = 0;
>>> +    for (SwitchInst::CaseIt CI = SI->case_begin(), E = SI->case_end();
>>> +         CI != E; ++CI) {
>>> +      ConstantInt *CaseVal = CI.getCaseValue();
>>> +      UseMask |= 1ULL <<
>>> +        (CaseVal->getValue() -
> MinCaseVal->getValue()).getLimitedValue();
>>> +    }
>>> +
>>> +    BasicBlock *MaskBB = BasicBlock::Create(Mod.getContext(),
>>> +                                            "switch.lookup2",
>>> +                                            CommonDest->getParent(),
>>> +                                            CommonDest);
>
>> I think we should create this BB earlier (and with a better name!), so
>> that we can use it in the conditional branch we create after checking
>> that Index < TableSize above. I also like to have the mask built
>> before we start generating IR.
>
> You are right. This should definitely be done. I found out how to do it.

It would be nice if we only created this BB if we need it (i.e. when
we use the "hole bitmask"). I know the optimizers will clean it up for
us, but it would be good to avoid generating unnecessary code.

>> I wonder if this mask checking code could be simpler. Look at
>> SwitchLookupTable::BuildLookup for the BitMapKind case.
>
> The mentioned code indeed looks much simpler since we can get rid of the
> special types.

I think it can be simpler still :) There should be no need for the And
and ICmpNE. It should be possible to do just a right-shift and
truncation.

> I prepared a new patch with more debug output and your remarks included but
> unfortunately the compiler crash is the same (for LLParser.cpp):
> ==>
>
> clang: /home/jane/wrk/llvm-trunk/include/llvm/IR/Instructions.h:2155:
> llvm::Value* llvm::PHINode::getIncomingValueForBlock(const
> llvm::BasicBlock*) const: Assertion `Idx >= 0 && "Invalid basic block
> argument!"' failed.
> [...]
> <==

I might have an idea.

With your patch, we're potentially adding a new predecessor to the
default bb: the "hole check" bb. If the default bb has a phi node in
it, that's a problem, because it has no value for this predecessor.

 - Hans