[LLVMdev] Profiling in LLVM Patch

Wed Jul 8 06:59:45 PDT 2009

Hi Daniel,

I just want to make a few remarks to the things we discussed before I split up the patch from [1] to the patches, the last being [2]. Some of the code from the first patch [1] was rewritten in the process so some of the things we talked about changed a little, I want to give the rationale behind my new design decisions.

Andreas Neustifter wrote:
> Daniel Dunbar wrote:
> [...]
>> 1. The intent of ProfileInfo is that it is the public interface used by the rest
>> of LLVM for all "profiling" related analyses. As such, we want it to have a
>> thin, well-defined interface which can be implemented by multiple analysis
>> providers, just like we have with the current alias analysis interface. Your
>> current patch has a large number of changes to ProfileInfo which should instead
>> be inside a subclass of ProfileInfo, where most of the implementation details
>> are private.
>>
>> I think the first task should be to separate out the changes which are
>> appropriate for the ProfileInfo class itself and apply those to the tree. I
>> believe that the important changes which are worth reflecting in the API are
>> changed the weights to return a double instead of an int, and defining -1 as a
>> sentinel, "don't know", value.
> So you want the ProfileInfo to be untouched except for the double weights and the MissingValue? Then mark all methods virtual and do a subclass that has this sort of heavy implementation that I am using? Do you have any name suggestions for this subclass?
> One thing I would like to patch in the current ProfileInfo implementation is the consistent use of the (0,entry) edge which is necessary to profile functions with only one block (namely the entry block).
 >
 > [...]
 >> The current ProfileInfo functions should also become virtual, and all other
 >> implementation details should be left to the subclass.
 >>
 >> Finally, I don't think the ignoreMissing parameter to getExecutionCount makes
 >> sense, instead the provider should always return -1 if it doesn't have the
 >> value, and clients should "do the right thing.
 > Maybe, its of course always possibile to have custom wrappers that perform this sort of task if necessary...
 >
The so called "heavy implementation" has resolved itself partially, I changed in ProfileInfo only the synthesis that was there already to cache the synthesised values, for the ProfileInfo to be complete I also added BasicBlock and Function profiling. So the interface is just slightly bigger than before (MissingValue, Block and Function profiling) but now usable for (I guess) all ProfileInfo tasks.

During this I killed the synthesis part of ProfileInfoLoader (which was duplicated code from ProfileInfo) and I also had to rewrite llvm-prof to use ProfileInfo instead of ProfileInfoLoader. (I think llvm-prof is now quite cleanly implemented...).

>> 2. We shouldn't worry too much about preserving the functionality of the current
>> llvm-prof tool, which uses the ProfileInfoLoader directly. llvm-prof should be
>> rewritten to just use the public API provided by ProfileInfo. This should be
>> done early so there are less dependencies on the ProfileInfoLoader interface.
> Okay, thats no problem. I guess it makes sense to rewrite llvm-prof but I was not sure if this sort of changes would be appreciated.
See my comments to point 1., its rewritten now and works as before but with a interface towards ProfileInfo.

> [...]
>>> --- llvm-van/include/llvm/Analysis/ProfileInfo.h      2009-05-12 10:37:52.000000000 +0200
>>> +++ llvm-c7/include/llvm/Analysis/ProfileInfo.h       2009-06-26 16:47:01.000000000 +0200
>>> [...]
>> Instead of multiple maps from Function* ->, would it make sense to collect these
>> three (or four) maps inside a per-function structure?
> Yes, makes sense, its then easier to factor out the retrieval and handling of this structure.
I tried this, it cluttered up the code with another redirection, I kept the 3 maps and I think this works good enough.

 > [...]
>>> --- llvm-van/lib/Analysis/MaximumSpanningTree.cpp     1970-01-01 01:00:00.000000000 +0100
>>> +++ llvm-c7/lib/Analysis/MaximumSpanningTree.cpp      2009-06-26 16:47:01.000000000 +0200
>> ...
>>> +#define WOOD_REP(M,V1,V2) \
>> Please use "forest" instead of "wood".
> Oh, sorry, thats a false friend, in German "forest" is "Wald".
All the wood is not combined into an forest :-)

 > [...]
>>> +  // compare two weighted edges
>>> +  struct VISIBILITY_HIDDEN EdgeWeightCompare {
>>> +    bool operator()(const ProfileInfo::EdgeWeight X,
>>> +                    const ProfileInfo::EdgeWeight Y) const {
>>> +      if ( X.second > Y.second ) return true;
>>> +      if ( X.second < Y.second ) return false;
>>> +      if ( X.first.first != 0 && Y.first.first == 0 ) return true;
>>> +      if ( X.first.first == 0 && Y.first.first != 0 ) return false;
>>> +      if ( X.first.first == 0 && Y.first.first == 0 ) return false;
>>> +      int c1 =  X.first.first->getName().compare(Y.first.first->getName());
>>> +      if ( c1 > 0 ) return true;
>>> +      if ( c1 < 0 ) return false;
>>> +      int c2 =  X.first.second->getName().compare(Y.first.second->getName());
>>> +      if ( c2 > 0 ) return true;
>>> +      return false;
>>> +    }
>>> +  };
>> Comparing edges based on the basic block name is not a good idea, in some common
>> cases these will all be empty. If the ordering really needs to be deterministic
>> then you will need to number the basic blocks, but is there a compelling reason
>> to not order based on the basic block address?
> This slipped my notice and shouldn't have made it into production code. Its sufficient to order based on weights.
> (This implementation was necessary when the MST was also needed for reading the profiling information, but since now I handle this with the Constant-Initializer that provides this information directly in the profile, its only necessary to compare the weight.)
I have regression scripts that test the debug output of the various passes against a reference output to have a quick check if everything works as expected. For this its necessary to have a deterministic sorting of the edges. Since then block name is not good in this case (I knew, but I couldn't find another solution at that time) I use the block size now, this should stay the same if code is translated by the regression script. But I include the "advanced sorting" only in debug builds, in regular builds its still only sorting by edge weight.

> [...]

The patches are still not as small as possible, I tried to go between reasonable sized patches and flooding the list with small patches, I hope its okay.

I have no or limited EMail this and next week, I will be back on the 20th of July.

Thanks, Andi

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-June/023427.html

[2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023716.html