[LLVMdev] [RFC] Iterrative compilation framework for Clang/LLVM

Mon Dec 16 16:26:10 PST 2013

Radovan,

Thanks for posting this! I would really like to have this kind of functionality available. There are a lot of things to think through here; comments inline...

----- Original Message -----
> From: "Radovan Obradovic" <Radovan.Obradovic at imgtec.com>
> To: llvmdev at cs.uiuc.edu
> Sent: Monday, December 16, 2013 11:31:21 AM
> Subject: [LLVMdev] [RFC] Iterrative compilation framework for Clang/LLVM
> 
> 
> 
> 
> This is a first step towards to full iterative compilation framework
> for
> Clang/LLVM. The attached patch is work in progress and the idea of
> sending
> it to the list earlier rather than later is to open up a discussion
> on
> iterative compilation in Clang/LLVM. All comments are welcome,
> especially
> those related to integration of the framework into Clang/LLVM.

I'm glad you're doing this, and I hope that you'll be willing to stick though what might be a long design discussion :)

> 
> Current compilers have pipeline structure in which data from one
> phase is
> passed to the following one. But these compilation phases can
> interfere in
> unpredictable way so compilers use conservative heuristics on which
> they
> make decisions during the compilation process. Because of
> unpredictability
> of these interferences, compilers sometimes make decisions which can
> degrade
> code quality. Iterative compilation could partially resolve these
> problems
> by allowing compiler to explore some alternative decisions, generate
> code
> for these new decision paths and take the generated code that is the
> best
> for a particular criteria eventually.
> 
> This approach is not new in the literature, we can mention
> Milepost/CTuning
> [http://ctuning.org] among other projects. In this approach, in
> comparison
> to Milepost/CTuning, the emphasis is on much finer granularity by
> changing
> arbitrary compiler decisions, and not only those based on values of
> command
> line options, at any point in translation process.
> 
> This framework implements iterative approach in which in every
> iteration
> slightly different code is generated. At the end of compilation
> process the
> best emitted code is chosen.
> 
> All points at which relevant decisions are made should be identified.
> Access to framework data structure (DecisionTree) should be provided
> at
> these points. Based on the data from DecisionTree, obtained by
> calling
> getDecision member function, at these points decisions are made
> whether to
> execute default actions (as compiler would do without this framework)
> or to
> follow alternative paths.
> 
> Let us consider Inliner example:
> 
> Original code:
> 
> if (!IC) {
> DEBUG(dbgs() << " NOT Inlining: cost=" << IC.getCost()
> << ", thres=" << (IC.getCostDelta() + IC.getCost())
> << ", Call: " << *CS.getInstruction() << "\n");
> return false;
> }
> 
> Code after instrumentalization:
> 
> bool doInlinging = (bool)IC;
> DecisionTreeProxy *decisionTreeProxy;
> decisionTreeProxy =
> moduleDecisionTreeProxies->getCurrFnDecisionTreeProxy();
> int t = MaxDtPriority - abs(IC.getCostDelta());
> bool decision = false;
> if (t >= 0)
> decision = decisionTreeProxy->getDecision(call);
> if (decision)
> doInlinging = !doInlinging;
> 
> if (!doInlinging) {
> DEBUG(dbgs() << " NOT Inlining: cost=" << IC.getCost()
> << ", thres=" << (IC.getCostDelta() + IC.getCost())
> << ", Call: " << *CS.getInstruction() << "\n");
> return false;
> }

Talking a quick look over the patch, this seems to introduce a lot of boilerplate code. Can you think of a design that simplifies this?

> 
> Function getDecision returns true or false. If false is returned than
> this
> means to take default path and true means to take alternative one.
> 
> Compilation process can be represented by sequence of binary values
> where
> every value determines decision made at some point. Value FALSE means
> to
> follow normal compilation process as original compiler would do
> without
> presence of iterative compilation framework. TRUE value means to take
> alternative path or to make opposite decision at that point. This
> naturally
> forms tree like structure which we named Decision Tree.
> Traversing the tree from the root to the leaf of the tree provides
> all the
> decisions made during one iteration in compilation process.
> 
> At the end of each iteration, quality of generated code is estimated
> by
> executing newly introduced target dependent pass. Based on results
> path for
> the following iteration is calculated. At the moment, this has been
> proved
> for MIPS only and it is based on code size.

I think that, in general, we ought to be able to get an overall latency estimate from the instruction scheduler.

> 
> Initially iterative compilation framework was implemented in CodeGen
> phase
> but large number of code transformations was already performed in
> earlier
> phases. This reduces possibilities to improve quality of generated
> code.
> Thus iterative compilation framework is applied to Clang driver where
> it can
> influence every single compiler phase. This relocation resulted in
> some
> implementation issues mostly related to the fact that driver and the
> compiler itself (-cc1) are located in different processes. This is
> resolved
> by introducing DecisionTreeProxy class that implements communication
> between
> these two processes through temporary files.

I think that this can be solved by introducing some API for the necessary data flow. There is obviously already a binding here.

> 
> At this point mechanism is applied to two optimizations: Inliner and
> Loop
> Invariant Code Motion. Supported platform is MIPS32r2.
> 
> Despite small number of instrumented points we got improvements on
> some test
> files from CSiBE benchmark. Here are the promising results:
> 
> jpeg-6b/jidctflt.c -10.2%
> OpenTCP-1.0.4/system.c -6.7%
> libmspack/test/chmd_md5.c -5.8%
> flex-2.5.31/regex.c -4.8%

Very interesting. Out of curiosity, any slowdowns?

> 
> We plan to apply this framework to more compiler phases and to some
> other
> target architectures. It is expected to give better results on some
> non
> orthogonal architectures like microMIPS. Another direction is to
> apply
> machine learning based on some input program features (number of
> basic
> blocks, register pressure at some point etc.) and it can be used for
> selecting compilation path for the next iteration or estimating the
> best
> path for compiling program without using iterative approach (with
> only one
> iteration).
> 
> 
> o
> / \
> / \
> o \
> / \
> / \
> o o
> / \ /
> / \ /
> L1 o o
> / /
> / /
> L2 L3
> 
> Figure 1: Decision Tree Example - Symbol 'o' represent decision
> points. If
> getDecision function returns FALSE the left branch is taken (default
> path),
> otherwise the right branch is taken.
> 'L1' compilation iteration can be represented as a string
> (FALSE,FALSE,FALSE), 'L2' as (FALSE,FALSE,TRUE,FALSE) and 'L3'
> as (TRUE, FALSE, FALSE).
> 
> 
> Clearly, we see huge potential in this direction. At the same time,
> we are
> aware this would cause compiler to be slower which everyone is
> against, but
> this penalty can be taken when we are in need for most optimized
> code.

What is the overhead of the current implementation? I suspect that, in order to control the large number of decision points from large regions, we will want some way to limit fine-grained decisions in the tree in large regions (and only including coarse ones, like the inlining of relatively large functions). In small blocks, on the other hand, we might even include instruction scheduling decisions (for in-order cores).

I'd been thinking of experimenting with some iterative scheduling for my in-order PPC cores (for small blocks in loops), and integrating it into something like this might be even better.

> 
> What does everyone think?
> Could this approach be considered as a valid mode for Clang/LLVM?
> 
> Please use this post and the attached patch (it can be applied to 3.3
> release - clang should be inside llvm/tools folder) as a reference
> and
> not the patch for submission.

I see in the patch some (seemingly small) changes to the pass manager setup. Can you describe exactly what you've done. Chandler is rewriting the pass infrastructure, and I'd like to understand how what you've done will or won't fit with the new setup.

Thanks again,
Hal

> 
> Example command line:
> > clang -target mipsel-unknown-linux -Oz -S -fiterative-comp=3
> > example.c
> 
> Thank you. Looking forward to hearing your comments on this.
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory