[LLVMdev] Profiling in LLVM Patch

Wed Jul 1 02:36:23 PDT 2009

Hi Daniel,

Daniel Dunbar wrote:
> Hi Andreas,
> 
> First, thanks again for undertaking this work and submitting it back. There is a
> lot of good stuff here and it would be great to see it get back into the tree.
Thanks for taking the time to review this, I know its a huge patch. I still have a few questions on how you would like this patch to be re-factored and split up.

> [...]
> 
> 1. The intent of ProfileInfo is that it is the public interface used by the rest
> of LLVM for all "profiling" related analyses. As such, we want it to have a
> thin, well-defined interface which can be implemented by multiple analysis
> providers, just like we have with the current alias analysis interface. Your
> current patch has a large number of changes to ProfileInfo which should instead
> be inside a subclass of ProfileInfo, where most of the implementation details
> are private.
> 
> I think the first task should be to separate out the changes which are
> appropriate for the ProfileInfo class itself and apply those to the tree. I
> believe that the important changes which are worth reflecting in the API are
> changed the weights to return a double instead of an int, and defining -1 as a
> sentinel, "don't know", value.
So you want the ProfileInfo to be untouched except for the double weights and the MissingValue? Then mark all methods virtual and do a subclass that has this sort of heavy implementation that I am using? Do you have any name suggestions for this subclass?
One thing I would like to patch in the current ProfileInfo implementation is the consistent use of the (0,entry) edge which is necessary to profile functions with only one block (namely the entry block).

> This is the most important thing to separate out, and to do early, because it is
> the main interface that is shared with LLVM. It is also important for my GSoC
> student, who was going to work on static profiling for LLVM and will depend on
> any changes to this interface.
Yes, I thought so, I will try to do this quick.

> The current ProfileInfo functions should also become virtual, and all other
> implementation details should be left to the subclass.
> 
> Finally, I don't think the ignoreMissing parameter to getExecutionCount makes
> sense, instead the provider should always return -1 if it doesn't have the
> value, and clients should "do the right thing.
Maybe, its of course always possibile to have custom wrappers that perform this sort of task if necessary...

> 2. We shouldn't worry too much about preserving the functionality of the current
> llvm-prof tool, which uses the ProfileInfoLoader directly. llvm-prof should be
> rewritten to just use the public API provided by ProfileInfo. This should be
> done early so there are less dependencies on the ProfileInfoLoader interface.
Okay, thats no problem. I guess it makes sense to rewrite llvm-prof but I was not sure if this sort of changes would be appreciated.

> In fact, it might even make sense to just get rid of llvm-prof and instead
> implement a module level pass which consumes profiling information and prints it
> out as part of the pass. This would allow us to debug the preservation of
> profiling information once we start modifying optimization passes to update it.
Good point.

> 3. The static profiling estimator should be implemented as just another
> ProfileInfo implementation. I believe this can be a separate patch. Similarly
> the profile verifier is just another consumer of ProfileInfo and can be a
> separate patch.
The ProfileEstimator is already a subclass of ProfileInfo and implements that interface. Yes, these two are easily split into separate patches.

> 4. Finally, the new optimal edge instrumentation & ProfileInfo implementation
> can be brought in.
> 
> Does this sound like a reasonable plan? Although it is a bit of work, I don't
> think it will be that difficult and I am happy to help out with splitting up the
> patch / refactoring the existing code.
Yes, sounds good. I will try to split it up myself first and ask when there are decisions to be made I'm not sure about.

> In addition, I have a number of "nitpicks" mostly related to LLVM style:
Well, they were to be expected :-) One never gets those right the first time contributing to a new project...

>  [...]

>> +  /// getExitEdges - Return all pairs of (_inside_block_,_outside_block_).
>> +  /// (Modelled after getExitingBlocks().)
>> +  typedef std::pair<BlockT*,BlockT*> Edge;
>> +  void getExitEdges(SmallVectorImpl<Edge> &ExitEdges) const {
>> +    // Sort the blocks vector so that we can use binary search to do quick
>> +    // lookups.
>> +    SmallVector<BlockT*, 128> LoopBBs(block_begin(), block_end());
>> +    std::sort(LoopBBs.begin(), LoopBBs.end());
> 
> Would it be better to use a DenseSet for this lookup?
Well, its the same that is used in getExitBlocks() where I got the implementation from. I really don't know but I think the current implementation is okay from what I gather form the LLVM Programmer's Manual.

>> --- llvm-van/include/llvm/Analysis/MaximumSpanningTree.h      1970-01-01 01:00:00.000000000 +0100
>> +++ llvm-c7/include/llvm/Analysis/MaximumSpanningTree.h       2009-06-26 16:47:01.000000000 +0200
> ...
>> +#include "llvm/Support/Streams.h"
> 
> Please don't use Support/Streams.h in a header, it pollutes any translation unit
> which imports this header.
Okay.

>> +    static void print(MaxSpanTree MST) {
>> +      cerr<<"{";
>> +      for ( MaxSpanTree::iterator ei = MST.begin(), ee = MST.end();
>> +            ei!=ee; ++ei ) {
>> +        cerr<<"("<<((*ei).first?(*ei).first->getName():"0")<<",";
>> +        cerr<<(*ei).second->getName()<<")";
>> +      }
>> +      cerr<<"}\n";
>> +    }
>> +  };
> 
> To be more in line with LLVM I would recommend making this a nullary const
> method on the MST, and rename it to dump. Also, please use Support/raw_ostream.h
> for output instead of C++ iostreams.
Okay.

> 
>> --- llvm-van/include/llvm/Analysis/Passes.h   2009-06-29 13:49:13.000000000 +0200
>> +++ llvm-c7/include/llvm/Analysis/Passes.h    2009-06-26 16:48:02.000000000 +0200
>>    // createProfileLoaderPass - This pass loads information from a profile dump
>> -  // file.
>> +  // file. Since its possible to use the ProfileInfoLoaderPass not only from
>> +  // opt but also e.g. from llvm-prof and with different profile info filenames
>> +  // a creator is provided where the toolname and profile info filename can be
>> +  // provided. (The toolname is used to create proper error messages.)
>>    //
>>    ModulePass *createProfileLoaderPass();
>> +  ModulePass *createProfileLoaderPass(const std::string &, const std::string &);
> 
> Please name the arguments so it is clear which is the toolname and which is the
> profile info name (bonus points for switching to doxygen comments which
> reference the parameters).
I guess I will go for the bonus points then.

>> --- llvm-van/include/llvm/Analysis/ProfileInfo.h      2009-05-12 10:37:52.000000000 +0200
>> +++ llvm-c7/include/llvm/Analysis/ProfileInfo.h       2009-06-26 16:47:01.000000000 +0200
> 
> Keep in mind that most comments here are assuming this functionality is moved to
> a subclass specific to optimal edge profiling.
> 
>>    protected:
>> -    // EdgeCounts - Count the number of times a transition between two blocks is
>> -    // executed.  As a special case, we also hold an edge from the null
>> -    // BasicBlock to the entry block to indicate how many times the function was
>> -    // entered.
>> -    std::map<std::pair<BasicBlock*, BasicBlock*>, unsigned> EdgeCounts;
>> +    // EdgeInformation - Count the number of times a transition between two
>> +    // blocks is executed.  As a special case, we also hold an edge from the
>> +    // null BasicBlock to the entry block (henceforth (0,entry) to indicate
>> +    // how many times the function was entered. (This is especially necessary
>> +    // if there is only the entry block.)
>> +    std::map<Function*, EdgeCounts> EdgeInformation;
>> +
>> +    // BlockInformation - Count the number of times a block is executed
>> +    std::map<Function*, BlockCounts> BlockInformation;
>> +
>> +    // FunctionInformation - Count the number of times a function is executed
>> +    std::map<Function*, double> FunctionInformation;
> 
> Instead of multiple maps from Function* ->, would it make sense to collect these
> three (or four) maps inside a per-function structure?
Yes, makes sense, its then easier to factor out the retrieval and handling of this structure.

>> +    // getFunctionForEdge() - This returns the Function for a given Edge. This
>> +    // requires special handling for (0,entry) so its centralised here.
>> +    static inline Function* getFunctionForEdge(Edge E) {
>> +      BasicBlock *BB = E.first;
>> +      if ( BB == 0 ) { BB = E.second; };
>> +      assert ( BB != 0 && "both BasicBlocks of Edge are NULL, can not "
>> +                          "determine function");
>> +      return BB->getParent();
>> +    }
> 
> The edge destination should always be non-null, correct? Why not just
> --
>    static inline Function* getFunctionForEdge(Edge E) {
>      assert(E.second && "Invalid profile edge!");
>      return E.second->getParent();
>    }
> --
Nice one. Thanks!

>> +    // getFunctionForBlock() - Same as for Edges without special handling, for
>> +    // the sake of cleaner code.
>> +    static inline Function* getFunctionForBlock(BasicBlock* BB) {
>> +      return BB->getParent();
>> +    }
> 
> I would prefer this wasn't used, its "common knowledge" in the LLVM code that a
> basic blocks parent is its Function.
I was not sure about this one anyways, so yes, it will go.

>> +    EdgeCounts getEdgeCounts(Function *F) {
>> +      std::map<Function*, EdgeCounts>::const_iterator J =
>> +        EdgeInformation.find(F);
>> +      if ( J != EdgeInformation.end() ) {
>> +        return J->second;
>> +      }
>> +      return EdgeCounts(); // Return empty EdgeCounts in case none found.
>> +    }
> 
> This is way too expensive, returning a deep copy of the edge counts map isn't
> necessary. The simplest alternative is to return a const reference and insert
> into the map for missing functions. Something like:
> --
>> +    const EdgeCounts& getEdgeCounts(Function *F) {
>> +      return EdgeInformation[F];
>> +    }
> --
> Unless we really care about not inserting empty entries this seems reasonable to
> me.
Here my somewhat laking C++ skills failed me. I didn't see that this is essentially doing a copy, of course this is not necessary!
> 
>> +    void forceSynthesis(void) {
>> +      for (std::map<Function*, EdgeCounts>::iterator
>> +           FI = EdgeInformation.begin(), FIE = EdgeInformation.end();
>> +           FI != FIE; ++FI) {
>> +        Function *F = FI->first;
>> +        for (Function::iterator BB = F->begin(), BBE = F->end();
>> +             BB != BBE; ++BB) {
>> +          getExecutionCount(BB,false);
>> +        }
>> +        getExecutionCount(F,false);
>> +      }
>> +    }
> 
> It doesn't make sense to have this function here, clients should just do the
> right thing if the count isn't present. It may be worth providing a ProfileInfo
> level API to query whether any information is present for a particular function
> so that clients don't perform needless queries.
I have to look into this one, I guess its possible to do this kind of check on demand and calculate BlockCounts from EdgeCounts and FunctionCounts form BlockCounts when necessary.

>> --- llvm-van/lib/Analysis/MaximumSpanningTree.cpp     1970-01-01 01:00:00.000000000 +0100
>> +++ llvm-c7/lib/Analysis/MaximumSpanningTree.cpp      2009-06-26 16:47:01.000000000 +0200
> ...
>> +#define WOOD_REP(M,V1,V2) \
> 
> Please use "forest" instead of "wood".
Oh, sorry, thats a false friend, in German "forest" is "Wald".

>> +  // compare two weighted edges
>> +  struct VISIBILITY_HIDDEN EdgeWeightCompare {
>> +    bool operator()(const ProfileInfo::EdgeWeight X,
>> +                    const ProfileInfo::EdgeWeight Y) const {
>> +      if ( X.second > Y.second ) return true;
>> +      if ( X.second < Y.second ) return false;
>> +      if ( X.first.first != 0 && Y.first.first == 0 ) return true;
>> +      if ( X.first.first == 0 && Y.first.first != 0 ) return false;
>> +      if ( X.first.first == 0 && Y.first.first == 0 ) return false;
>> +      int c1 =  X.first.first->getName().compare(Y.first.first->getName());
>> +      if ( c1 > 0 ) return true;
>> +      if ( c1 < 0 ) return false;
>> +      int c2 =  X.first.second->getName().compare(Y.first.second->getName());
>> +      if ( c2 > 0 ) return true;
>> +      return false;
>> +    }
>> +  };
> 
> Comparing edges based on the basic block name is not a good idea, in some common
> cases these will all be empty. If the ordering really needs to be deterministic
> then you will need to number the basic blocks, but is there a compelling reason
> to not order based on the basic block address?
This slipped my notice and shouldn't have made it into production code. Its sufficient to order based on weights.
(This implementation was necessary when the MST was also needed for reading the profiling information, but since now I handle this with the Constant-Initializer that provides this information directly in the profile, its only necessary to compare the weight.)

> [...]
>> +// countSuccessors() - This gives an estimate of the number of successors of a
>> +// basic block. Returns 0, 1 or 2 depending on the block having no, one or more
>> +// than one successors respectively.
>> +static int countSuccessors(BasicBlock* BB) {
>> +  succ_iterator SI = succ_begin(BB), SE = succ_end(BB);
>> +  if ( SI != SE ) {       // at least one successor
>> +    ++SI;
>> +    if ( SI == SE ) {     // exactly on successor
>> +      return 1;
>> +    } else {              // more than one successor
>> +      return 2;
>> +    }
>> +  } else {
>> +    return 0;
>> +  }
>> +}
> 
> This routine is only used in one place, and its result is compared against 0. I
> think it should be removed and the code should just check (succ_begin(BB) ==
> succ_end(BB)), or for better readability add a succ_empty to CFG.h and use that.
Okay.

>> --- llvm-van/lib/Analysis/ProfileEstimatorPass.cpp    1970-01-01 01:00:00.000000000 +0100
>> +++ llvm-c7/lib/Analysis/ProfileEstimatorPass.cpp     2009-06-26 16:47:01.000000000 +0200
> ...
>> +bool OptimalEdgeProfiler::runOnModule(Module &M) {
>> +  Function *Main = M.getFunction("main");
>> +  if (Main == 0) {
>> +    cerr << "WARNING: cannot insert edge profiling into a module"
>> +         << " with no main function!\n";
>> +    return false;  // No main, no instrumentation!
>> +  }
>> +
>> +  std::set<BasicBlock*> BlocksToInstrument;
>> +  unsigned NumEdges = 0;
>> +  for (Module::iterator F = M.begin(), E = M.end(); F != E; ++F) {
>> +    if (F->isDeclaration()) continue;
>> +    // use one counter for the edge (0,entry) for each function
>> +    ++NumEdges;
>> +    for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB) {
>> +      // Keep track of which blocks need to be instrumented.  We don't want to
>> +      // instrument blocks that are added as the result of breaking critical
>> +      // edges!
>> +      BlocksToInstrument.insert(BB);
>> +      NumEdges += BB->getTerminator()->getNumSuccessors();
>> +    }
>> +  }
>> +
>> +  const ArrayType *ATy = ArrayType::get(Type::Int32Ty, NumEdges);
>> +  GlobalVariable *Counters =
>> +    new GlobalVariable(ATy, false, GlobalValue::InternalLinkage,
>> +                       0, "OptimalEdgeProfCounters", &M);
>> +
>> +  std::vector<Constant*> Initializer = std::vector<Constant*>(NumEdges);
> 
> This unnecessarily copies the vector, please use:
>  std::vector<Constant*> Initializer(NumEdges);
Okay.

>> +  APInt zero(32,0);      Constant* zeroc = ConstantInt::get(zero);
>> +  APInt minusone(32,-1); Constant* minusonec = ConstantInt::get(minusone);
> 
> Please use:
>> +  Constant* zeroc = ConstantInt::get(llvm::Type::Int32Ty, 0);
>> +  Constant* onec = ConstantInt::getSigned(llvm::Type::Int32Ty, -1);
Okay.

Thank you so much for taking the time and reviewing this giving me very valuable comments.

Andi