[PATCH] Add framework for iterative compilation to llvm

Fri Aug 1 11:57:19 PDT 2014

>>! In D4723#7, @grosser wrote:
> I am a little surprised that there is not a _single_ comment added in this patch. I believe a couple of comments that document higher level design choices would be very helpful.

Agreed. Will do.

> I looked through the older emails, but some questions are still open:
> 
> What exactly forms the decision tree? If we have three points at which decisions are taken, will we have three nodes in the tree or depends the size of the tree on the module/program size? 
> 
> What happens if a decision is reached multiple times? E.g. the inliner run on different functions? Can different decisions be taken at each point?
> 
> Similarly, is the decision tree statically known or can it change dynamically depending on what kind of decisions have been taken. E.g. if we decide to run dead code elimination, we may not even have code that could be unrolled such that the corresponding unroller decision points may become invalid?
> 
> Do you plan to support non-binary decisions. E.g. different unroll factors?
> 
> How does all this fit together. Assuming I would like to run this in a JIT compiler, what are the pieces that need to be plugged together to try to iteratively optimize a certain function? 
> 

Here is a short explanation of the forming of the decision tree.

First, some passes need to be modified to call getDecision function.
If this function returns false, a pass should do exactly as the code without modification. If the return value is true, then the alternative path is taken.

Function getDecison always returns false in the first iteration.
And the compiler should work exactly as if there was no iterrative framework.

Let us assume that getDecision function is called four times in the first iteration of iterative compilation.
After the end of the first iteration, the decision tree will look like this:

        o
          \
            o
              \
                o
                  \
                    o
                      \
                        o

In the second iteration, exactly one decision is changed and the compiler works as in the first iteration until it reaches this particular decision. Then alternative path is taken in this point.
Further, getDecision function returns false until compilation iteration is finished. Let us assume that we took alternative decision at the third decision and that there are three more calls
of getDecision function. Note that different compilation paths may have different number of calls of getDecision function.

        o
          \
            o
              \
                o
              /   \
             o      o
               \      \
                 o      o
                   \
                     o

Every new iteration adds one new branch to the decision tree.

At the end of each iteration fitness of the generated code is evaluated. In the last iteration, the compiler takes the path with the best fitness.

To augment the selection of a node where alternative decision would be taken, getDecision tree takes one parameter.
This parameter is interpreted as priority and the node with highest priory is selected for next alternative decision.

Machine learning approach

Formed decision tree can be used to train a binary classifier which can be used to facilitate existing heuristics. By collecting nodes where branching of decision tree occurred, we have training examples for the classifier. We could replace existing heuristics with this trained classifier and potentially get better code even without iterative approach.

This may be a good approach for jit compiler but it requires adding this machine learning approach which is planed for the future.

N-ary decisions are not planed for now because every n-ary decision can be replaced with n-1 binary decision. For example if we have to decide for a number from zero to three we can set three yes/no questions:

     (number is zero or greater then zero?)
         /                            \
       0             (number is one or greater then one?)
                       /                      \
                      1               (number is 2 or 3?)
                                         /          \
                                        2            3

If some decisions are made relevant by some future decisions this is only some inefficiency of the compilation process but they stay in the decision tree because nodes are never deleted. The nodes represent some decision points in the history, and path from the root of the tree to the leaf has enough information to exactly replay compilation iteration.

http://reviews.llvm.org/D4723