[PATCH] Inverse post-order traversal for LiveVariables analysis, to recover the performance after r214064

Mon Aug 11 23:28:35 PDT 2014

On Aug 9, 2014, at 8:28 AM, Artyom Skrobov <Artyom.Skrobov at arm.com> wrote:
> 
> Thanks for your comments Ted!
> 
>> It looks like we will do a complete iteration of the CFG blocks
>> many times, each time we mark a skipped block.  Can we invert
>> this loop?
>> 
>>   for (CFG::const_iterator I = cfg->begin(), E = cfg->end(); I != E; ++I) {
>>     if (Blocks.size() == size)
>>       break;
>> 
>> That seems far more efficient.  We have an invariant that once
>> we visit a block we don't need to visit it again.  Inverting
>> the loop makes this an O(n) operation, instead of possibly O(n^2).
> 
> My original motivation was that in most cases, all blocks would be reachable to the po_iterator, so that "while (Blocks.size() < size)" loop wouldn't be executed at all. That said, now I see how your suggestion allows simplifying this whole loop to
> 
>  // It may be that some blocks are inaccessible going from the CFG exit upwards
>  // (e.g. infinite loops); we still need to add them.
>  for (CFG::const_iterator I = cfg->begin(), E = cfg->end();
>       (Blocks.size() < size) && (I != E); ++I) {
>    const CFGBlock* block = *I;
>    // Add a chain going upwards.
>    while (!BlockOrder.count(block)) {
>      Blocks.push_back(block);
>      BlockOrder[block] = Blocks.size();
>      CFGBlock::const_pred_iterator PI = block->pred_begin(),
>                                    PE = block->pred_end();
>      for (; PI != PE; ++PI) {
>        const CFGBlock* pred = *PI;
>        if (pred && !BlockOrder.count(pred)) {
>          block = pred;
>          break;
>        }
>      }
>      // Chain ends when we couldn't find an unmapped pred.
>      if (PI == PE) break;
>    }
>  }

Makes sense.

> 
> 
>> Also minor, but I'd prefer this:
>> 
>>     BlockOrder[block] = Blocks.size() + 1;
>>     Blocks.push_back(block);
>> 
>> to be written as:
>> 
>>     Blocks.push_back(block);
>>     BlockOrder[block] = Blocks.size();
>> 
>> That's more consistent (I think) with how we do these things elsewhere.
> 
> The first sequence is the same as how it was done in the original PostOrderCFGView constructor; but I don't mind changing the sequence in both places.

Got it.

> 
> 
>> I was thinking about this some more, and I'm wondering if enqueuing
>> the unreachable blocks is the wrong approach.  The reality is that
>> since these blocks are unreachable we don't even need to bother
>> computing their dataflow values.  They will simply never be used.
> 
> They are "inverse-unreachable", i.e. the CFG exit is not reachable from them.
> 
> An illustrative example is rdar10308201() in test/Analysis/misc-ps.c
> 
> If we don't analyze such blocks (e.g. infinite loops), then variables defined outside of an infinite loop, and used inside the infinite loop, would be erroneously reported as dead.
> 

Yes, this makes sense.

> 
>> I'm also wondering about the following code in your patch:
>> 
>> class PostOrderInverseCFGView : public PostOrderCFGView {
>>  typedef llvm::po_iterator<const CFG*, CFGBlockSet, true,
>>                            llvm::GraphTraits<llvm::Inverse<const CFG*> >
>>> po_iterator;
>> 
>> It is true that this inverts the order, but I don't think this
>> definition of po_iterator is used anywhere except the constructor
>> for PostOrderInverseCFGView.
> 
> You're right, it's not; but equally the definition of PostOrderCFGView::po_iterator isn't used anywhere except the constructor for PostOrderCFGView: it's declared privately, and it's not used by any PostOrderCFGView consumers.

Ok, that makes sense now, but it makes me feel the APIs themselves are a bit misleading/confusing.  (see below)

> 
> 
>> There is no virtual definition of the 'dequeue' method, so
>> 'dequeue' (which is declared in DataflowWorklistBase) uses
>> the po_iterator defined in DataflowWorklistBase, which is
>> the wrong definition for reverse analysis.
> 
> No, dequeue() uses PostOrderCFGView::iterator, which iterates over the Blocks vector. So, the only factor affecting the order of iteration is how the Blocks vector is initialized in the PostOrderInverseCFGView constructor.

I see it now.  From an API perspective it is very confusing, because the name "PostOrderCFGView" implies the iteration always is in that order.  Perhaps if DataflowWorklistBase used something more generic than PostOrderCFGView, with subclasses or specializations defining the precise order, that would make it clearer from an API perspective.  Or something that implies the ordering might be different depending on whether it is a forward or backwards analysis.  Also the comment is very misleading if we are doing a reverse analysis:

  // Next dequeue from the initial reverse post order.  This is the
  // theoretical ideal in the presence of no back edges.

Technically we'd be doing the opposite thing if this was a reverse analysis.  The comment if anything is what hung me up on thinking it was doing the wrong thing.