[LLVMdev] [RFC] Iterrative compilation framework for Clang/LLVM

Mon Dec 16 09:31:21 PST 2013

This is a first step towards to full iterative compilation framework for
Clang/LLVM. The attached patch is work in progress and the idea of sending
it to the list earlier rather than later is to open up a discussion on
iterative compilation in Clang/LLVM. All comments are welcome, especially
those related to integration of the framework into Clang/LLVM.

Current compilers have pipeline structure in which data from one phase is
passed to the following one. But these compilation phases can interfere in
unpredictable way so compilers use conservative heuristics on which they
make decisions during the compilation process. Because of unpredictability
of these interferences, compilers sometimes make decisions which can degrade
code quality. Iterative compilation could partially resolve these problems
by allowing compiler to explore some alternative decisions, generate code
for these new decision paths and take the generated code that is the best
for a particular criteria eventually.

This approach is not new in the literature, we can mention Milepost/CTuning
[http://ctuning.org] among other projects. In this approach, in comparison
to Milepost/CTuning, the emphasis is on much finer granularity by changing
arbitrary compiler decisions, and not only those based on values of command
line options, at any point in translation process.

This framework implements iterative approach in which in every iteration
slightly different code is generated. At the end of compilation process the
best emitted code is chosen.

All points at which relevant decisions are made should be identified.
Access to framework data structure (DecisionTree) should be provided at
these points. Based on the data from DecisionTree, obtained by calling
getDecision member function, at these points decisions are made whether to
execute default actions (as compiler would do without this framework) or to
follow alternative paths.

Let us consider Inliner example:

Original code:

   if (!IC) {
     DEBUG(dbgs() << "    NOT Inlining: cost=" << IC.getCost()
           << ", thres=" << (IC.getCostDelta() + IC.getCost())
           << ", Call: " << *CS.getInstruction() << "\n");
     return false;
   }

Code after instrumentalization:

   bool doInlinging = (bool)IC;
   DecisionTreeProxy *decisionTreeProxy;
   decisionTreeProxy =
moduleDecisionTreeProxies->getCurrFnDecisionTreeProxy();
   int t = MaxDtPriority - abs(IC.getCostDelta());
   bool decision = false;
   if (t >= 0)
     decision = decisionTreeProxy->getDecision(call);
   if (decision)
     doInlinging = !doInlinging;

   if (!doInlinging) {
     DEBUG(dbgs() << "    NOT Inlining: cost=" << IC.getCost()
           << ", thres=" << (IC.getCostDelta() + IC.getCost())
           << ", Call: " << *CS.getInstruction() << "\n");
     return false;
   }

Function getDecision returns true or false. If false is returned than this
means to take default path and true means to take alternative one.

Compilation process can be represented by sequence of binary values where
every value determines decision made at some point. Value FALSE means to
follow normal compilation process as original compiler would do without
presence of iterative compilation framework. TRUE value means to take
alternative path or to make opposite decision at that point. This naturally
forms tree like structure which we named Decision Tree.
Traversing the tree from the root to the leaf of the tree provides all the
decisions made during one iteration in compilation process.

At the end of each iteration, quality of generated code is estimated by
executing newly introduced target dependent pass. Based on results path for
the following iteration is calculated. At the moment, this has been proved
for MIPS only and it is based on code size.

Initially iterative compilation framework was implemented in CodeGen phase
but large number of code transformations was already performed in earlier
phases. This reduces possibilities to improve quality of generated code.
Thus iterative compilation framework is applied to Clang driver where it can
influence every single compiler phase. This relocation resulted in some
implementation issues mostly related to the fact that driver and the
compiler itself (-cc1) are located in different processes. This is resolved
by introducing DecisionTreeProxy class that implements communication between
these two processes through temporary files.

At this point mechanism is applied to two optimizations: Inliner and Loop
Invariant Code Motion. Supported platform is MIPS32r2.

Despite small number of instrumented points we got improvements on some test
files from CSiBE benchmark. Here are the promising results:

jpeg-6b/jidctflt.c       -10.2%
OpenTCP-1.0.4/system.c    -6.7%
libmspack/test/chmd_md5.c -5.8%
flex-2.5.31/regex.c       -4.8%

We plan to apply this framework to more compiler phases and to some other
target architectures. It is expected to give better results on some non
orthogonal architectures like microMIPS. Another direction is to apply
machine learning based on some input program features (number of basic
blocks, register pressure at some point etc.) and it can be used for
selecting compilation path for the next iteration or estimating the best
path for compiling program without using iterative approach (with only one
iteration).

            o
           / \
          /   \
         o     \
        /       \
       /         \
      o           o
     / \         /
    /   \       /
   L1    o     o
        /     /
       /     /
      L2    L3

Figure 1: Decision Tree Example - Symbol 'o' represent decision points. If
getDecision function returns FALSE the left branch is taken (default path),
otherwise the right branch is taken.
'L1' compilation iteration can be represented as a string
(FALSE,FALSE,FALSE), 'L2' as (FALSE,FALSE,TRUE,FALSE) and 'L3'
as (TRUE, FALSE, FALSE).

Clearly, we see huge potential in this direction. At the same time, we are
aware this would cause compiler to be slower which everyone is against, but
this penalty can be taken when we are in need for most optimized code.

What does everyone think?
Could this approach be considered as a valid mode for Clang/LLVM?

Please use this post and the attached patch (it can be applied to 3.3
release - clang should be inside llvm/tools folder) as a reference and
not the patch for submission.

Example command line:
> clang -target mipsel-unknown-linux -Oz -S -fiterative-comp=3 example.c

Thank you. Looking forward to hearing your comments on this.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131216/96777819/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ic.patch
Type: text/x-patch
Size: 62469 bytes
Desc: ic.patch
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131216/96777819/attachment.bin>