[PATCH] Inverse post-order traversal for LiveVariables analysis, to recover the performance after r214064

Fri Aug 8 22:22:51 PDT 2014

Hi Artyom,

I was thinking about this some more, and I'm wondering if enqueuing the unreachable blocks is the wrong approach.  The reality is that since these blocks are unreachable we don't even need to bother computing their dataflow values.  They will simply never be used.

I'm also wondering about the following code in your patch:

> class PostOrderInverseCFGView : public PostOrderCFGView {
>   typedef llvm::po_iterator<const CFG*, CFGBlockSet, true,
>                             llvm::GraphTraits<llvm::Inverse<const CFG*> >
>                            > po_iterator;

It is true that this inverts the order, but I don't think this definition of po_iterator is used anywhere except the constructor for PostOrderInverseCFGView.  There is no virtual definition of the 'dequeue' method, so 'dequeue' (which is declared in DataflowWorklistBase) uses the po_iterator defined in DataflowWorklistBase, which is the wrong definition for reverse analysis.  For reverse analysis, we want to escalate values *down* back edges so that they update the dataflow values deeper in the CFG, which then bubble up in the reverse analysis.

I'm wondering if we need PostOrderInverseCFGView at all.  Instead, I maybe its just a matter of "virtualizing" the following code in DataflowWorklist::dequeue():

  // Next dequeue from the initial reverse post order.  This is the
  // theoretical ideal in the presence of no back edges.
  else if (PO_I != PO_E) {
    B = *PO_I;
    ++PO_I;
  }

Here you are using an iterator interface.  That's fine, but it seems like we can just add one layer of abstraction.  Instead, we could have some kind of object that wraps the two kind of iterators, and uses dynamic dispatch to provide the indirection.  For example:

class DataflowBlockIterator {
public:
  virtual bool isDone() = 0;
  virtual void advance();
}

// something like this
class DataflowForwardBlockIterator : DataflowBlockIterator {
  typedef llvm::po_iterator<const CFG*, CFGBlockSet, true>  po_iterator;
  po_iterator PO_I, PO_E;
public:
  virtual bool isDone() { PO_I == PO_E; }
  virtual CFGBlock *advance() { CFGBlock *B = *PO_I; ++PO_I; return B; }
}

// something like this
class DataflowReverseBlockIterator : DataflowBlockIterator {
  typedef llvm::po_iterator<const CFG*, CFGBlockSet, true,
                            llvm::GraphTraits<llvm::Inverse<const CFG*> >
                           > po_iterator;
  po_iterator PO_I, PO_E;
public:
  virtual bool isDone() { PO_I == PO_E; }
  virtual CFGBlock *advance() { CFGBlock *B = *PO_I; ++PO_I; return B; }
}

and then do the following in dequeue():

  // Next dequeue from the initial [reverse] post order.  This is the
  // theoretical ideal in the presence of no back edges.
  // Note: blockIterator is of type DataflowBlockIterator*	
  else if (!blockIterator->isDone()) {
   B = blockIterator->advance();
  }

Maybe that's all you need.

> On Aug 8, 2014, at 6:32 PM, Ted Kremenek <kremenek at apple.com> wrote:
> 
> Hi Artyom,
> 
> I have a question about this code:
> 
>>   // It may be that some blocks are inaccessible going from the CFG exit upwards
>>   // (e.g. infinite loops); we still need to add them.
>>   while (Blocks.size() < size) {
>>     // Find a block that we skipped
>>     CFG::const_iterator I = cfg->begin(), E = cfg->end();
>>     for (; I != E; ++I)
>>       if (!BlockOrder.count(*I))
>>         break;
>>     assert(I != E);
>>     const CFGBlock* block = *I;
>>     // Add a chain going upwards
>>     while (!BlockOrder.count(block)) {
>>       BlockOrder[block] = Blocks.size() + 1;
>>       Blocks.push_back(block);
>>       CFGBlock::const_pred_iterator PI = block->pred_begin(),
>>                                     PE = block->pred_end();
>>       for (; PI != PE; ++PI) {
>>         const CFGBlock* pred = *PI;
>>         if (pred && !BlockOrder.count(pred)) {
>>           block = pred;
>>           break;
>>         }
>>       }
>>       // Chain ends when we couldn't find an unmapped pred
>>       if (PI == PE) break;
>>     }
>>   }
> 
> It looks like we will do a complete iteration of the CFG blocks many times, each time we mark a skipped block.  Can we invert this loop?
> 
>     for (CFG::const_iterator I = cfg->begin(), E = cfg->end(); I != E; ++I) {
>       if (Blocks.size() == size)
>         break;
> 
> That seems far more efficient.  We have an invariant that once we visit a block we don't need to visit it again.  Inverting the loop makes this an O(n) operation, instead of possibly O(n^2).
> 
> Also minor, but I'd prefer this:
> 
>       BlockOrder[block] = Blocks.size() + 1;
>       Blocks.push_back(block);
> 
> to be written as:
> 
>       Blocks.push_back(block);
>       BlockOrder[block] = Blocks.size();
> 
> That's more consistent (I think) with how we do these things elsewhere.
> 
> Note that even this iteration of skipped blocks isn't completely precise; we may hit a block first that dominates other blocks that are all dead.
> 
> Perhaps a better approach is to include all unreachable blocks by querying the CFG for unreachable edges.  We now try and build the CFG so that all unreachable edges aren't hard pruned, but that you can access the pruned edge via an API call:
> 
>     /// Get the potentially unreachable block.
>     CFGBlock *getPossiblyUnreachableBlock() const {
>       return UnreachableBlock.getPointer();
>     }
> 
> That said, we may not have converted the entire CFG to record these "ghost edges".  However, if you include the possibly unreachable blocks when you do your initial construction of Blocks, there will be less blocks that are skipped in the first go, and everything will be in the proper order in the worklist.
> 
> Also minor, please make all comments be sentences that end with a '.'
> 
> For expediency, besides the comments being slightly tweaked, I'm fine with the patch going in as is.  That would hopefully unblock Alex's current performance problem.  I do think we should invert the above loop though in a subsequent patch.
> 
> Ted
> 
> 
>> On Aug 8, 2014, at 8:48 AM, Artyom Skrobov <Artyom.Skrobov at arm.com <mailto:Artyom.Skrobov at arm.com>> wrote:
>> 
>> Thank you for the patience – indeed it took me some time to prepare the proper patch, which is now ready for review.
>>  
>> It comes without a regression test, as the only observable change is performance improvement of the analysis.
>>  
>>  
>> From: Alexander Kornienko [mailto:alexfh at google.com <mailto:alexfh at google.com>] 
>> Sent: 07 August 2014 23:35
>> To: Artyom Skrobov
>> Cc: cfe-commits at cs.uiuc.edu <mailto:cfe-commits at cs.uiuc.edu> Commits
>> Subject: Re: r214064 - Factoring DataflowWorklist out of LiveVariables and UninitializedValues analyses
>>  
>> Any news here? This keeps biting us.
>>  
>> Maybe you can apply this patch first and work on a better solution after that?
>> 
>> On Tue, Aug 5, 2014 at 6:23 PM, Artyom Skrobov <Artyom.Skrobov at arm.com <mailto:Artyom.Skrobov at arm.com>> wrote:
>> So, the simplest way to restore the original behaviour seems to be
>>  
>> Index: lib/Analysis/LiveVariables.cpp
>> ===================================================================
>> --- lib/Analysis/LiveVariables.cpp   (revision 214183)
>> +++ lib/Analysis/LiveVariables.cpp   (working copy)
>> @@ -451,6 +451,7 @@
>>    // Construct the dataflow worklist.  Enqueue the exit block as the
>>    // start of the analysis.
>>    DataflowWorklist worklist(*cfg, AC);
>> +  worklist.quick_hack();
>>    llvm::BitVector everAnalyzedBlock(cfg->getNumBlockIDs());
>>    // FIXME: we should enqueue using post order.
>> Index: include/clang/Analysis/Analyses/DataflowWorklist.h
>> ===================================================================
>> --- include/clang/Analysis/Analyses/DataflowWorklist.h     (revision 214183)
>> +++ include/clang/Analysis/Analyses/DataflowWorklist.h     (working copy)
>> @@ -53,6 +53,7 @@
>>    const CFGBlock *dequeue();
>>    void sortWorklist();
>> +  void quick_hack() { enqueuedBlocks.reset(); PO_I = PO_E; }
>> };
>>  } // end clang namespace
>>  
>> This has the effect of disabling the fallback post-order iteration.
>>  
>> The reason for the performance regression, however, is related to the iteration order over the CFG: UninitializedValues expects to use the default (i.e. reverse) PostOrderCFGView iteration, whereas LiveVariables expects the opposite order, corresponding to using llvm::GraphTraits<llvm::Inverse<const CFG*> > > as the last parameter in typedef po_iterator.
>>  
>> I’m currently thinking of a proper solution which would allow using PostOrderCFGView for iteration in either direction.
>>  
>>  
>>  
>> From: Alexander Kornienko [mailto:alexfh at google.com <mailto:alexfh at google.com>] 
>> Sent: 05 August 2014 12:30
>> 
>> To: Artyom Skrobov
>> Cc: cfe-commits at cs.uiuc.edu <mailto:cfe-commits at cs.uiuc.edu> Commits
>> Subject: Re: r214064 - Factoring DataflowWorklist out of LiveVariables and UninitializedValues analyses
>>  
>> On another file from the same project the difference in my setup is 2 minutes before the patch vs. 50+ hours (I terminated the analyzer before it finished) after the patch. So it's a rather bad performance regression.
>>  
>> 
>> <DataflowWorklistInverse.patch>_______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu <mailto:cfe-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140808/d4cc1720/attachment.html>